# Conversational bot for Madkudu support using RAG

RAG stands for Retrieval Augmented Generation, a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other systems and inserting them into the LLM’s context window via a prompt. 

It’s cheaper than fine tunning: fine-tuning models is expensive because the weights of the model parameters themselves must be adjusted. RAG is simply a series of vector/SQL queries and API calls, which cost tiny fractions of a cent.

1. Importing open ai key and the chat gpt model 

In [9]:
import os 
os.environ["OPENAI_API_KEY"] = "

In [10]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    #chose a cheap model to save credits
    model = "gpt-3.5-turbo-0125",
    #the closer to 0, the more factual the results are going to be
    temperature = 0.4
)

In [11]:
llm.invoke("give me one synonym of phone")

AIMessage(content='telephone', response_metadata={'token_usage': {'completion_tokens': 1, 'prompt_tokens': 13, 'total_tokens': 14}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None})

# 2. Loading the madkudu support data into the model

In [12]:
from langchain.document_loaders import TextLoader
#usinf uft 8 since it has special characters
loader = TextLoader('all.txt', encoding='utf-8')
documents = loader.load()

# 3. Splitting the data into chunks

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
#Alternative option
#text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)

text_splitter = RecursiveCharacterTextSplitter(
    #i chose 500 because the data is not that numerous
    # number of characters
    chunk_size = 500, 
    #How many chars are going to be overlapped in between each chunk
    chunk_overlap = 250,
    
    length_function = len,
    
    #includes index of character where the separation between each chunk is done
    add_start_index = True
    
    )

chunks = text_splitter.split_documents(documents)

In [14]:
len(chunks)

113

In [15]:
chunks[15].page_content

"Static enrichment: MadKudu partners with Clearbit, HG data, and PredictLeads to enrich your data. Are you already purchasing through ZoomInfo, Bombora, Datafox? No problem, MadKudu can also use this data. \nCRM information: actually any data from your CRM (Leads, Contacts, Accounts, Opportunities), be it provided by a 3rd party or from your Marketing and Sales input, can be pulled to MadKudu's platform."

# 4. Using hugging face to embed the chunks

In [17]:
! pip install sentence-transformers

Defaulting to user installation because normal site-packages is not writeable
Collecting safetensors>=0.4.1
  Using cached safetensors-0.4.2-cp311-none-win_amd64.whl (269 kB)
Installing collected packages: safetensors
  Attempting uninstall: safetensors
    Found existing installation: safetensors 0.3.2
    Uninstalling safetensors-0.3.2:
      Successfully uninstalled safetensors-0.3.2
Successfully installed safetensors-0.4.2



[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [18]:
from langchain.embeddings import HuggingFaceEmbeddings
# Define the path to the pre-trained model you want to use
modelPath = "sentence-transformers/all-MiniLM-l6-v2"

# Create a dictionary with model configuration options, specifying to use the CPU for computations
model_kwargs = {'device':'cpu'}

# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': False}

# Initialize an instance of HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)

# 5. Store chunks in FAISS and embed them - Applying RAG

Here we are first creating our database with FAISS

RAG works better than fine-tuning the model because:

It’s cheaper: fine-tuning models is expensive because the weights of the model parameters themselves must be adjusted. RAG is simply a series of vector/SQL queries and API calls, which cost tiny fractions of a cent.

In [None]:
from langchain.vectorstores import FAISS

In [None]:
vectorstore = FAISS.from_documents(chunks, embeddings)

# 6. Create a chain (pipeline) for inferences

It allows you to process information in sequences. Could be use to create a a pipeline of 
- prompts
- models 
- memory buffers

Which makes it very powerful.

# 6.1 Create a **document chain** to pass documents 
This chain will allows us to pass documents to our final Chain

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

template = """"Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}
"""
prompt = ChatPromptTemplate.from_template(template)
#creating a chain on its simplest form
document_chain = create_stuff_documents_chain(llm, prompt)

6.2 Add the retriever to a chain

In [None]:

from langchain.chains import create_retrieval_chain

retriever = vectorstore.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [None]:
response = retrieval_chain.invoke({
    "input": "what is likelyhood to buy"
})

response

{'input': 'what is likelyhood to buy',
 'context': [Document(page_content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account. \nLift (Conversion lift): impact on conversion compared to the average population. \nA positive lift shows a positive impact on conversion of having this trait, or performing this action \nA negative life shows a negative impact on conversion of having this trait or performing this action', metadata={'source': 'all.txt', 'start_index': 25988}),
  Document(page_content='Likelihood to Buy: Looks at any interaction with your website, product, marketing team, sales team to identify patterns of your Leads and Accounts behaviors that lead to conversions. (What has this lead been doing recently?)\nLead Grade: Combines the Customer Fit and Likelihood to Buy into one score to help you simplify your workflows\nWe configure these models independently so if you’re not sure which ones are enabled in your a

# 6. Creating a memory chain - Adding conversational memory to our model

1. First we need to a new create a conversational retrieval chain

In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])

retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

Faking a conversation to create conversational memory

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [
    HumanMessage(content="Can i found here what likelihood to buy is?"),
    AIMessage(content="Yes!")
]
answer = retriever_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me more about it!"
})

In [None]:
answer

[Document(page_content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account. \nLift (Conversion lift): impact on conversion compared to the average population. \nA positive lift shows a positive impact on conversion of having this trait, or performing this action \nA negative life shows a negative impact on conversion of having this trait or performing this action', metadata={'source': 'all.txt', 'start_index': 25988}),
 Document(page_content='the Likelihood to Buy model (Behavioral)\nhelps answers "how are they using our product?", "what actions are most correlated to conversion?", "is now a good time to reach out?"\nThe third category of prediction models, the Lead Grade, is simply a combination of the two others.\nThe Customer Fit model (demographic)', metadata={'source': 'all.txt', 'start_index': 15612}),
 Document(page_content='Likelihood to Buy: Looks at any interaction with your website, product, marketing team,

# 6.2 Scaling the chain to store all the conversations with ** Document chain **
6.2.1 Add a document chain that will be able to store the history of the conversation in MessagePlaceholder objects


In [None]:
from langchain.chains import create_retrieval_chain

#prompt containing the past history
prompt = ChatPromptTemplate.from_messages(
    [
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    # this is the variable that should be received where the conversation is going to be stored
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}")
])

#this will store the conversation
document_chain = create_stuff_documents_chain(llm, prompt)



conversational_retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

6.2.2 Test it with a new history

In [None]:
chat_history = [
    HumanMessage(content="Can you tell me what likelihood to buy is?"),
    AIMessage(content="Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account ")
]

response = conversational_retrieval_chain.invoke({
    'chat_history': chat_history,
    "input": "What model do you use"
})

In [None]:
response

{'chat_history': [HumanMessage(content='Can you tell me what likelihood to buy is?'),
  AIMessage(content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account ')],
 'input': 'What mode! do you use',
 'context': [Document(page_content='the Likelihood to Buy model (Behavioral)\nhelps answers "how are they using our product?", "what actions are most correlated to conversion?", "is now a good time to reach out?"\nThe third category of prediction models, the Lead Grade, is simply a combination of the two others.\nThe Customer Fit model (demographic)', metadata={'source': 'all.txt', 'start_index': 15612}),
  Document(page_content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account. \nLift (Conversion lift): impact on conversion compared to the average population. \nA positive lift shows a positive impact on conversion of having this trait, or performing thi

In [None]:
response["answer"]

'We use a Likelihood to Buy model, which is a behavioral model that predicts the level of engagement of a person or account.'

In [None]:
conversational_retrieval_chain.invoke({
    'chat_history': chat_history,
    "input": "how is it related to MQA?"
})

{'chat_history': [HumanMessage(content='Can you tell me what likelihood to buy is?'),
  AIMessage(content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account ')],
 'input': 'how is it related to MQA?',
 'context': [Document(page_content='Likelihood to Buy: refers to the output of a behavioral model predicting the level of engagement of a person or account. \nLift (Conversion lift): impact on conversion compared to the average population. \nA positive lift shows a positive impact on conversion of having this trait, or performing this action \nA negative life shows a negative impact on conversion of having this trait or performing this action', metadata={'source': 'all.txt', 'start_index': 25988}),
  Document(page_content='the Likelihood to Buy model (Behavioral)\nhelps answers "how are they using our product?", "what actions are most correlated to conversion?", "is now a good time to reach out?"\nThe third category of 