# Background

This notebook is a demonstration of Q/A Chatbot with Retrieval Augmented Generation. <br>

You can download the sample documents from my [GitHub](https://github.com/harshdhamecha/qa-chatbot/tree/main/sample). I have referred 3 AiSensy blogs, and converted them into pdfs.


# Setup

In [47]:
# Remove unnecessary directory if exists

!rm -rf /content/sample_data

## Import Modules

In [1]:
!pip install --upgrade --quiet langchain-pinecone langchain-openai langchain langchain-community pinecone-client tiktoken

In [2]:
!pip install unstructured -q
!pip install unstructured[local-inference] -q

## API Keys

In [3]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter OPENAI API Key\n")
os.environ['PINECONE_API_KEY'] = getpass.getpass("Enter Pinecone API Key\n")

Enter OPENAI API Key
··········
Enter Pinecone API Key
··········


# Data Preparation

In [4]:
from langchain.document_loaders import DirectoryLoader

def load_documents(path):
    loader = DirectoryLoader(path)
    documents = loader.load()
    return documents

In [5]:
directory = '/content/'
documents = load_documents(directory)

In [6]:
len(documents)

3

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_documents(documents, chunk_size=400, chunk_overlap=30):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    chunks = text_splitter.split_documents(documents)
    return chunks

In [8]:
chunks = split_documents(documents)

## Embeddings

In [11]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

## Create Vector Database

In [12]:
import pinecone
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(
    api_key=os.environ.get("PINECONE_API_KEY")
)

index_name = 'qa-langchain'

if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric='euclidean',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-west-2'
        )
    )

index = pc.Index(index_name)

In [13]:
from langchain_pinecone import PineconeVectorStore

docsearch = PineconeVectorStore.from_documents(chunks, embeddings, index_name=index_name)

In [14]:
def get_similar_docs(query):
    return docsearch.similarity_search(query)

## Prompt Design

In [46]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that provides answers based on the following context: {context}.\
            Response politely and in a humorous way when any off-topic or out-of-context question is being asked."
        ),
        MessagesPlaceholder(variable_name="messages")
    ]
)

# Model

In [23]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model='gpt-4o')

In [35]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=200,
    strategy="last",
    token_counter=model,
    include_system=False,
    allow_partial=False,
    start_on="human",
)

In [36]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter

chain = (
    RunnablePassthrough.assign(messages=itemgetter("messages") | trimmer)
    | prompt
    | model
)

In [37]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

In [38]:
def get_answer(query, chain, messages):
    messages.append(HumanMessage(content=query))
    context = get_similar_docs(query)
    response = chain.invoke(
        {
            "messages": messages + [HumanMessage(content=query)],
            "context": context
        }
    )
    messages.append(AIMessage(content=response.content))
    return response.content, messages

# Test

In [39]:
messages = []
query = "What is WhatsApp and Email Marketing?"
answer, messages = get_answer(query, chain, messages)

In [40]:
print(answer)

Great question! 

**Email Marketing** is like the seasoned pro in the digital marketing world. It's been around for a while and is incredibly versatile and effective. Businesses use it to drive sales and enhance brand awareness by sending targeted emails to their audience. Think of it as a well-crafted letter delivered right to your inbox, filled with all the latest updates, offers, and important news from your favorite brands.

**WhatsApp Marketing**, on the other hand, is the cool new kid on the block. It's more casual and requires less effort in terms of design and content. With WhatsApp, you can share different types of content without worrying about file sizes. It's all about creating closer connections with your audience through more personal and instant conversations.

So, whether you prefer the classic charm of email or the instant, casual vibe of WhatsApp, both are fantastic platforms for engaging with your audience!


In [42]:
query = "Which among them is better?"
answer, messages = get_answer(query, chain, messages)

In [43]:
print(answer)

Ah, the classic showdown of WhatsApp vs. Email Marketing! According to the context, it really depends on your business goals and the nature of your company. However, why choose one when you can leverage the strengths of both? A balanced approach ensures swift engagement and long-term relationship-building. So, in this case, balance is the key to a robust marketing strategy! 🌟


Noticed How the context of previous messages has been maintained?

In [44]:
# Out-of-context question
query = "How to play cricket?"
answer, messages = get_answer(query, chain, messages)

In [45]:
answer

"Ah, cricket! The sport that has people glued to their screens for hours, if not days. But, as much as I'd love to dive into the nitty-gritty of cricket, our context today is all about mastering WhatsApp content marketing strategies. Perhaps, instead of hitting sixes on the field, we focus on hitting those marketing home runs? 🏏📈\n\nHowever, if you're keen on learning cricket, I'd recommend checking out some beginner guides online or maybe even YouTube tutorials. They can teach you all about wickets, runs, and those fancy cricket shots!"

Noticed How creatively it managed the Out-of-Context Question?