******
Pioneering a State-of-the-Art Advertisement Recommendation Framework with LangChain and Qdrant
******

In this comprehensive exploration, we embark on the exciting journey of crafting an innovative advertisement recommendation system by seamlessly integrating LangChain and Qdrant. LangChain, a versatile library designed for constructing language models (LLMs) and chains of LLMs, joins forces with Qdrant, a robust vector database engineered for the efficient storage and retrieval of high-dimensional vectors.

****Installation and Library Imports****

Let's commence this journey by installing essential packages and importing the necessary libraries. The subsequent code segment takes care of these foundational steps:

In [None]:
!pip install jq
!pip install unstructured
!pip install ctransformers
!pip install qdrant_client
!pip install rapidocr-onnxruntime
!pip install langchain sentence_transformers

Import the Necessary Libraries for our Work

In [None]:
import json
from pathlib import Path
from pprint import pprint
from langchain import PromptTemplate
from qdrant_client import QdrantClient
from langchain.llms import CTransformers#to get llm 
from langchain.vectorstores import Qdrant #vector database
from langchain.chains import RetrievalQA#building Retrieval chain
from langchain.embeddings import HuggingFaceEmbeddings#to get embeddings
from langchain_community.document_loaders import JSONLoader #to read pdfs, urls
from langchain.text_splitter import RecursiveCharacterTextSplitter#splitting text into chunks

****Set up the Credentilas****

Before delving into the core of the code, it is crucial to establish the necessary credentials. Ensure accurate input of the path to the advertisement data file (advertisement_data_path), Qdrant URL (qdrant_url), and Qdrant API key (qdrant_api_key). Customize the following variables accordingly:

In [None]:
advertisement_data_path = './advertisement.json'
qdrant_url = ""
qdrant_api_key = ""

****Loading and Exploring the Dataset****

Commence by loading the advertisement dataset and gaining valuable insights into its content. The following code achieves this:



In [None]:
data = json.loads(Path(advertisement_data_path).read_text())
pprint(data[0:4])

****Splitting Texts into Chunks and Creating Embeddings****

For optimal feeding of our language model, it is crucial to split the texts into manageable chunks and create embeddings. Hugging Face embeddings will be leveraged for this purpose:

In [None]:
loader = JSONLoader(
    file_path=advertisement_data_path,
    jq_schema='.[]',
    text_content=False)

data = loader.load()
pprint(data[0:4])

Now lets split the texts in chunks to feed our llm. we will use hugging face embeddings

In [None]:
# by using RecursiveCharacterTextSplitter we try to split text by chunk size
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                                   chunk_overlap=10)
texts = text_splitter.split_documents(data)

#embedding choice here is all-MiniLM-L6-v2, based on your hardware you can choose smaller size one or bigger size one.
#embedding will help you to create vector space out of your text
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2',
                                       model_kwargs={'device': 'cpu'})

****Setting up Qdrant Vector Database****

Efficiently storing and retrieving document vectors using the Qdrant vector database is crucial. The following code initializes the Qdrant setup:

In [None]:
client = QdrantClient(url=qdrant_url,api_key=qdrant_api_key)
client.delete_collection(collection_name="my_documents")#if document exist delete it
    
    
qdrant = Qdrant.from_documents(
    texts,
    embeddings,
    url=qdrant_url,
    api_key=qdrant_api_key,
    collection_name="my_documents",
)

****Creating a Custom Prompt for the Language Model****

Defining a unique prompt template guides the language model in providing relevant answers:

In [None]:
# a custom prompt help us to assist our agent with better answer and make sure to not make up answers
custom_prompt_template = """Use the following pieces of information to find the most appropriate advertisement to show the given user.
If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.

Relevent_Advertisements: {context}
User_data: {question}

Only return the helpful answer and nothing else.
Helpful answer:
"""

prompt = PromptTemplate(template=custom_prompt_template,
                            input_variables=['context', 'question'])

****Loading the Language Model****

Load the language model for the recommendation system:

In [None]:
# Load the locally downloaded model here
llm = CTransformers(
    model = "TheBloke/Llama-2-7B-Chat-GGML",
    model_type="llama",
    temperature = 0.2
)

****Testing with a Sample User****

Evaluate the system's performance by testing it with a sample user and their preferences:

In [None]:
user_info = {
    "user_id": 1,
    "preferences": ["hats", "scarves", "winter fashion"]
}

Now feed this user_info to the chain

In [None]:

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",
                                   model_kwargs={'device': 'cpu'})
#connect to the vector database
client = QdrantClient(url=qdrant_url,api_key=qdrant_api_key)
    
doc_store = Qdrant(
    client=client,
    collection_name="my_documents",
    embeddings=embeddings
)

qa = RetrievalQA.from_chain_type(llm=llm,
                                 chain_type='stuff',
                                 retriever=doc_store.as_retriever(search_kwargs={'k': 2}),
                                 return_source_documents=True,
                                 chain_type_kwargs={'prompt': prompt}
                                )

response = qa({'query': user_info})

****Viewing the Response****

Lastly, examine the response from the recommendation system:

In [None]:
response

This exhaustive guide comprehensively covers the entire process of constructing an advertisement recommendation system using LangChain and Qdrant. It encompasses dataset loading, text chunking, embeddings creation, Qdrant setup, prompt definition, language model loading, and testing with a sample user. Feel free to tailor the code based on the specific requirements of your application and dataset.|