## Product QnA - Code Explanation

In [1]:
from tqdm.autonotebook import tqdm

import os
import pinecone
import yaml

from langchain.llms import OpenAI

# PDF loader
from langchain.document_loaders import PyPDFLoader

# Text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Pinecone database
from langchain.vectorstores import Pinecone

# Word embedding
from langchain.embeddings.openai import OpenAIEmbeddings

# QnA chains
from langchain.chains.question_answering import load_qa_chain

# Agent imports
from langchain.agents import initialize_agent

# Tool imports
from langchain.agents import Tool
from langchain import SerpAPIWrapper

# Prompt
from langchain import PromptTemplate

# LLMChain
from langchain import LLMChain

# Memory
from langchain.memory import ConversationBufferMemory

  from tqdm.autonotebook import tqdm


Its common when someone asks about the specific description or specification like "What is Samsung Galaxy s23?" or "What the type of processor that Samsung Galaxy s23 use?". So, here I asking to this QnA chatbot about "Who is Tailor Swift?".

In [2]:
question = 'Who is Tailor Swift?'

Load the credetial from params or `*.yml` file to makes it more easier to manage and accessible from other codes in this program.

In [3]:
f = open('../params/credentials.yml', 'r')
credential = yaml.load(f, Loader=yaml.SafeLoader)
f.close()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', credential['OPENAI_API_KEY'])

## Load PDF, Pinecode Database

Load the PDF file that already listed in params or `*.yml` file to make it more easier to manage and accessible from other codes in this program. The PDF document loaded with the Langchain module named `PyPDFLoader`.

In [4]:
f = open('../params/app.yml', 'r')
config = yaml.load(f, Loader=yaml.SafeLoader)
f.close()

loader = PyPDFLoader(config['PDF_SOURCE'][0])
data = loader.load()

print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[30].page_content)} characters in your document')

You have 196 document(s) in your data
There are 924 characters in your document


Chunk the data into more smaller data with Langchain module that called `RecursiveCharacterTextSplitter`. So, it can be more easier to embed and vectorize.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

Now you have 200 documents


In [6]:
texts[1]

Document(page_content='2\nTable of Contents\nGetting started\n5 D evice layout and functions\n11 Char\nging the battery\n16\n Nano-\nSIM card and eSIM\n18\n T\nurning the device on and off\n19\n Initial setup\n20\n Samsung ac\ncount\n21 T\nransferring data from your previous \ndevice (Smart Switch)\n23\n Und\nerstanding the screen\n31 Notif\nication panel\n33\n S\ncreen capture and screen record\n34\n E\nntering text\nApps and features\n37 Install ing or uninstalling apps38\n S P\nen (Galaxy S23 Ultra)\n54\n \nPhone\n57\n \nContacts\n59\n \nMessages\n60\n C\namera\n78\n Gall\nery\n82\n AR Z\none\n88\n \nBixby\n89\n Bix\nby Vision\n90\n Multi \nwindow (Using multiple apps \nat once)\n93\n Samsung I\nnternet94\n Samsung \nWallet\n96\n Samsung P\nay\n98\n Samsung Health\n99\n Samsung Not\nes\n104\n Samsung Memb\ners\n104\n Samsung K\nids\n105\n Samsung Gl\nobal Goals\n105\n Samsung TV\n Plus\n105\n Galax\ny Shop\n105\n Galax\ny Wearable\n106\n PENUP  (Galaxy S\n23 Ultra)\n106\n C\nalendar

Load the Pincone Database Key and environment from params or `*.yml` file.

In [7]:
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', credential['PINECONE_API_KEY'])
PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', credential['PINECONE_API_ENV'])

Call the `OpenAIEmbeddings` module, the embedding module from OpenAI and use `text-embedding-ada-002` as model to embed all document that already chunked.

In [8]:
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002', openai_api_key=OPENAI_API_KEY)

Initilaize the Pincone database.

In [9]:
# Initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)

index_name = credential['PINECONE_INDEX'] # put in the name of your pinecone index here

<span style="color:red">**Warning**:</span> Skip or comment this step inside the line if you already upsert the data to Pincone Database.

---

Upsert the embedded data as vector to the Pincone or vector database.

In [10]:
vectorstore = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Do query test by perform **similarity search** with `similarity_search` function that come up from `Pincone`, the Langchain module for data or vector query to the Pincone database.

In [11]:
query = f"{question}"
vectors = vectorstore.similarity_search(query)

Check vector or data content as part of the similarity search result.

In [12]:
print(vectors[0].page_content[:450])

---

## Indexing on Pincone database and QnA Chain

Load and call OpenAI module to perform **Question and Answer** by chaining all `similarity_seach` result with the user question or the human input. If there a variable called `temperature` and set as to `0`, it means the model output mostly will be more deterministic and have small amount of variability. Because of this QnA chatbot will do specific task as product informer.

In [13]:
llm = OpenAI(temperature=0, model_name='text-davinci-003', openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")

Query the vector from the Pincone or vector database.

In [14]:
pincone_index = pinecone.Index(index_name)
text_field = "text"
vectorstore = Pinecone(pincone_index, embeddings.embed_query, text_field)

Perform **similarity search** to the data or vector from the Pincone.

In [15]:
query = f'{question}'
vectors = vectorstore.similarity_search(query)

Perfom Question and Answer by chaining all `similarity_seach` result with the user question or the human input.

In [16]:
from_pdf_qna = chain.run(input_documents=vectors, question=query)
print(from_pdf_qna)

 I don't know.


## Agent

Setup environment for **SerpAPI** and set the certifi or SSL certified to grant access for python to access the internet and **perform Google Search with SerpAPI**.

In [17]:
os.environ['SERPAPI_API_KEY'] = credential['SERPAPI_API_KEY']
os.environ['REQUESTS_CA_BUNDLE'] = credential['REQUESTS_CA_BUNDLE_NOTEBOOK']

Load and call OpenAI module to perform search with langchain Agent through python.

In [18]:
llm = OpenAI(temperature=0, model_name = "text-davinci-003", openai_api_key=OPENAI_API_KEY)

Load and call SerpAPI library to perform Search.

In [19]:
search = SerpAPIWrapper()

Set the Tool module from Langchain, the Search tool will be use by OpenAI model to perform search.

In [20]:
search_toolkit = [
    Tool(
        name = 'Search',
        func=search.run,
        description='useful for when you need to search google to answer questions about current events'
    )
]

Initialize the agent by load the tool, load the model that will perform search, and also set the verbose to `True`. So, every searching step can be shown.

In [21]:
agent = initialize_agent(search_toolkit, llm, agent='zero-shot-react-description', verbose=True, return_intermediate_steps=True)

Perform search and get the search result.

In [22]:
response = agent({'input': f'{question}'})
from_google = response['output']
print(from_google)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out who Taylor Swift is
Action: Search
Action Input: "Taylor Swift"[0m
Observation: [36;1m[1;3mTaylor Alison Swift is an American singer-songwriter. She is a prominent cultural figure, widely recognized for her genre-spanning discography, songwriting and artistic reinventions.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Taylor Alison Swift is an American singer-songwriter. She is a prominent cultural figure, widely recognized for her genre-spanning discography, songwriting and artistic reinventions.[0m

[1m> Finished chain.[0m
Taylor Alison Swift is an American singer-songwriter. She is a prominent cultural figure, widely recognized for her genre-spanning discography, songwriting and artistic reinventions.


Perform search and get the search result, but in this case the correlation with the product will be search too. In case of the user searching something that don't relate with the product.

In [23]:
response = agent({'input': f"{question} and what is the correlation with {config['PRODUCT']}"})
from_product = response['output']
print(from_product)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out who Taylor Swift is and what her connection is to the Samsung Galaxy s23
Action: Search
Action Input: "Taylor Swift" "Samsung Galaxy s23"[0m
Observation: [36;1m[1;3mtaylor swift samsung s23 video · 7.6M views · Discover videos related to taylor swift samsung s23 video on TikTok.[0m
Thought:[32;1m[1;3m I need to find out more about the video
Action: Search
Action Input: "Taylor Swift Samsung Galaxy s23 video"[0m
Observation: [36;1m[1;3mtaylor swift samsung s23 video · 7.6M views · Discover videos related to taylor swift samsung s23 video on TikTok.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Taylor Swift released a video for her song "ME!" in collaboration with Samsung Galaxy s23. The video has been viewed over 7.6 million times on TikTok.[0m

[1m> Finished chain.[0m
Taylor Swift released a video for her song "ME!" in collaboration with Samsung Galaxy s23. The video has b

## Summarization with Prompt

Call the `PromptTemplate` module from Langchain to setup the instruction the model to perform summarization.

In [24]:
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Convert the size metric to centimeters if exist.

%TEXT:
{text}
"""

# Create a LangChain prompt template that we can insert values to later
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

Also take the search result from PDF content and Google with SerpAPI. The output from the PDF file and also Google search result will be make the OpenAI know more about the current product that officialy released in February 2023.

In [25]:
confusing_text = """{from_google}

{from_product}

{from_pdf_qna}
""".format(
    from_google = from_google.strip(),
    from_product = from_product.strip(),
    from_pdf_qna = from_pdf_qna.strip()
)

print(confusing_text)

Taylor Alison Swift is an American singer-songwriter. She is a prominent cultural figure, widely recognized for her genre-spanning discography, songwriting and artistic reinventions.

Taylor Swift released a video for her song "ME!" in collaboration with Samsung Galaxy s23. The video has been viewed over 7.6 million times on TikTok.

I don't know.



Here are the final prompt that contain instruction and text that need to be summarized.

In [26]:
summarize_prompt = prompt.format(text=confusing_text)
print(summarize_prompt)


%INSTRUCTIONS:
Please summarize the following piece of text.
Convert the size metric to centimeters if exist.

%TEXT:
Taylor Alison Swift is an American singer-songwriter. She is a prominent cultural figure, widely recognized for her genre-spanning discography, songwriting and artistic reinventions.

Taylor Swift released a video for her song "ME!" in collaboration with Samsung Galaxy s23. The video has been viewed over 7.6 million times on TikTok.

I don't know.




Perform summarization with OpenAI `text-davinci-003` model.

In [27]:
llm = OpenAI(temperature=0, model_name='text-davinci-003', openai_api_key=OPENAI_API_KEY)
summarize_output = llm(summarize_prompt)
print(summarize_output)


Taylor Swift is an American singer-songwriter who is well-known for her diverse music and creative reinventions. She recently released a video for her song "ME!" in collaboration with Samsung Galaxy s23, which has been viewed over 7.6 million times on TikTok.


## Chatbot with Chain and Memory

Create a prompt to instruct the model to perform as a chatbot also take the summarizaion result previously, take the human input and get the chat history. The chatbot will automatically using chain that connecting or relating the current human input and the previous chat as chat history with `ConversationBufferMemory` or well known as memory. So, `LLMChain` is the library by langchain will use to perform this chain to connects the current chato or human input with the previous chat.

In [31]:

# Setup prompt to perform as chatbot
command = """
%INSTRUCTIONS:
You are very helpful chatbot.
Your goal are help the users to know more about the {product}.
Please ignore if the question is not related with the Samsung Galaxy s23.
This following piece of text are some information that you need to consider too.

%TEXT: {text}
"""

prompt = PromptTemplate(
    input_variables=["product", "text"],
    template=command,
)

command_prompt = prompt.format(product=config['PRODUCT'], text=summarize_output)

# Setup prompt to take human imput and chat history.
chat = """
{chat_history}
Human: {human_input}
Chatbot:"""

template = command_prompt + chat

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)

memory = ConversationBufferMemory(memory_key="chat_history")

chat_chain = LLMChain(
    llm=OpenAI(model_name='text-davinci-003', openai_api_key=credential['OPENAI_API_KEY']), 
    prompt=prompt, 
    verbose=True,
    memory=memory
)

Performs chat by taking the human input.

In [32]:
chat_chain.predict(human_input=f'{question}')



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
%INSTRUCTIONS:
You are very helpful chatbot.
Your goal are help the users to know more about the Samsung Galaxy s23.
Please ignore if the question is not related with the Samsung Galaxy s23.
This following piece of text are some information that you need to consider too.

%TEXT: 
Taylor Swift is an American singer-songwriter who is well-known for her diverse music and creative reinventions. She recently released a video for her song "ME!" in collaboration with Samsung Galaxy s23, which has been viewed over 7.6 million times on TikTok.


Human: Who is tailor swift?
Chatbot:[0m

[1m> Finished chain.[0m


' Taylor Swift is an American singer-songwriter who is well-known for her diverse music and creative reinventions. She recently released a video for her song "ME!" in collaboration with Samsung Galaxy s23.'

## Test the Memory

Checking the memory by asking about the previous chat.

In [33]:
chat_chain.predict(human_input="What was I asked you about?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
%INSTRUCTIONS:
You are very helpful chatbot.
Your goal are help the users to know more about the Samsung Galaxy s23.
Please ignore if the question is not related with the Samsung Galaxy s23.
This following piece of text are some information that you need to consider too.

%TEXT: 
Taylor Swift is an American singer-songwriter who is well-known for her diverse music and creative reinventions. She recently released a video for her song "ME!" in collaboration with Samsung Galaxy s23, which has been viewed over 7.6 million times on TikTok.

Human: Who is tailor swift?
AI:  Taylor Swift is an American singer-songwriter who is well-known for her diverse music and creative reinventions. She recently released a video for her song "ME!" in collaboration with Samsung Galaxy s23.
Human: What was I asked you about?
Chatbot:[0m

[1m> Finished chain.[0m


' You asked me about Taylor Swift.'