# **Question Answering Application for Customer Support**
*                       <img src="https://drive.google.com/uc?id=13rkcrreujzgwY4uOnReoSnyyOqAiIojC" width="1000" height="500">



###Installing libraries



*   llama-index - a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data. https://www.llamaindex.ai/

*   pypdf - free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well. https://pypdf.readthedocs.io/en/stable/
*   google-generativeai - A suite of tools and services that enable developers to build and deploy generative AI applications. It is powered by Google AI's Pathways system, a collection of large language models (LLMs) that can be used for a variety of tasks, including generating text, translating languages, writing different kinds of creative text formats, and answering your questions in an informative way.
https://ai.google/discover/generativeai/

*   transformers -  Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. https://huggingface.co/docs/transformers/index






In [None]:
!pip install -q llama-index==0.9.9
!pip install -q pypdf==3.17.1
!pip install -q google-generativeai==0.2.2
!pip install -q transformers==4.35.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m914.3/914.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.4/221.4 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependenc

### Import necessary modules from llama-index


In [None]:
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.llms.palm import PaLM
from llama_index import ServiceContext
import os

### Creating a folder to add files.

In [None]:
! mkdir data

###  Add your txt or pdf files to the data folder.




### Loading Data...

####  SimpleDirectoryReader creates documents out of every file in a given directory. It is built in to LlamaIndex and can read a variety of formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video.

In [None]:
# Load text files from the 'data' folder
documents = SimpleDirectoryReader("./data").load_data()

In [None]:
documents

[Document(id_='50c15b08-41aa-4005-847c-296c1a72974a', embedding=None, metadata={'page_label': '1', 'file_name': 'products.pdf', 'file_path': 'data/products.pdf', 'file_type': 'application/pdf', 'file_size': 45566, 'creation_date': '2023-12-03', 'last_modified_date': '2023-12-03', 'last_accessed_date': '2023-12-03'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, hash='95be2ae1d59fb7d9777eb7b4e22406bbe6c5725dca2ce62edeaf2024101577ba', text='Products:\n●\nSmartphones:\n1.\niPhone\n14\nPro\nMax\n2.\nSamsung\nGalaxy\nS23\nUltra\n3.\nGoogle\nPixel\n7\nPro\n4.\nOnePlus\n11\nPro\n5.\nXiaomi\n13\nPro\nCommon\nCustomer\nSupport\nQuestions:\n1.\n"My\nsmartphone\'s\nbattery\ndrains\nquickly .\nCan\nyou\nhelp\nme\ntroubleshoot\nthe\nissue?"\nSupport\nAgent:\n"Sure,\nI\'d

### *PaLM2*

<img src="https://drive.google.com/uc?id=16AGWQnUczvUPJ_7H7ftpSFcSXYS7uXZo" width="600" height="400">




 PaLM2 excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation https://ai.google/discover/palm2/


Get the API key from PaLM, https://developers.generativeai.google/products/palm  

In [None]:
# Set the Google API key for PaLM
os.environ['GOOGLE_API_KEY'] = 'YOUR_API_KEY'


In [None]:
# Initialize the PaLM language model
llm = PaLM()

### Initialize the Hugging Face embedding model https://huggingface.co/BAAI/bge-large-en-v1.5



To learn more about embeddings
* video-lecture https://developers.google.com/machine-learning/crash-course/embeddings/
* https://www.tensorflow.org/text/tutorials/word2vec

In [None]:
# Initialize the Hugging Face embedding model
embed_model = HuggingFaceEmbedding(model_name='BAAI/bge-large-en-v1.5')

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

### Using service context to customize the llm, embeddings,  truncate and repack text chunks to fit in the model context window.

The ServiceContext is a bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application

https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/service_context.html

In [None]:
#Create a service context for the index.
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, chunk_size=800, chunk_overlap=20 )

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## Indexing

###Index is a data structure composed of Document objects, designed to enable querying by an LLM. Index is designed to be complementary to your querying strategy. The **VectorStoreIndex** takes your Documents and splits them up into Nodes. It then creates vector embeddings of the text of every node, ready to be queried by an LLM.



https://docs.llamaindex.ai/en/stable/api_reference/indices/vector_store.html

In [None]:
# Create a VectorStoreIndex from the documents and service context
index = VectorStoreIndex.from_documents(documents, service_context=service_context, show_progress=True)

Parsing nodes:   0%|          | 0/5 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/5 [00:00<?, ?it/s]

## Storing

Once you have data loaded and indexed, you will probably want to store it to avoid the time and cost of re-indexing it. By default, your indexed data is stored only in memory.
The simplest way to store your indexed data is to use the built-in .persist() method of every Index, which writes all the data to disk at the location specified. This works for any type of index.


https://docs.llamaindex.ai/en/stable/understanding/storing/storing.html





In [None]:
# Persist the index to storage for later use
index.storage_context.persist()

## Querying

querying is just a prompt to an LLM: it can be a question and get an answer, or a request for summarization, or a much more complex instruction.



The basis of all querying is the QueryEngine. The simplest way to get a QueryEngine is to get your index to create one.

### Stages of querying


* Retrieval is finding and returning the documents from your Index that best answer your query. The most popular kind of retrieval is "top-k" semantic retrieval.

* Postprocessing is the process of optionally reranking, transforming, or filtering the received Nodes. For example, you may mandate that they have particular metadata, like keywords, added.

* Response synthesis is the process of sending your LLM a request for a response by combining your prompt, your inquiry, and your most pertinent facts.

https://docs.llamaindex.ai/en/stable/understanding/querying/querying.html

In [None]:
# Create a query engine from the index
query_engine = index.as_query_engine()
response = query_engine.query(
    "trouble connecting to wifi network"
)

In [None]:
print(response)

Verify that you're connected to the correct network.
Enter the password correctly.
Ensure your router is functioning properly.
Restart your smartphone and router.
If the issue persists, check for any router firmware updates or contact your internet service provider.


### Customizing the RAG pipeline

#### Using different response modes.
* tree_summarize
* refine
* compact
* simple_summarize
* accumulate







In [None]:
#response_mode="tree_summarize"
query_engine = index.as_query_engine(response_mode='tree_summarize')
response = query_engine.query(
    "trouble connecting to wifi network"
)

In [None]:
print(response)

To troubleshoot trouble connecting to wifi network, you can:
1. Verify that you're connected to the correct network.
2. Enter the password correctly.
3. Ensure your router is functioning properly.
4. Restart your smartphone and router.
5. Check for any router firmware updates.
6. Contact your internet service provider.


#### Retrieving more context

Given a user query, retrievers are in charge of obtaining the most pertinent context (or chat message).

It can be defined individually or constructed on top of indexes. It is a fundamental component of query engines (as well as chat engines) that help them retrieve pertinent context.

In [None]:
#similarity_top_k=5

query_engine = index.as_query_engine(similarity_top_k=2)
response = query_engine.query(
    "trouble connecting to wifi network"
)

In [None]:
print(response)

Verify that you're connected to the correct network.
Enter the password correctly.
Ensure your router is functioning properly.
Restart your smartphone and router.
If the issue persists, check for any router firmware updates or contact your internet service provider.


#### custome query templates



In [None]:
from llama_index import Prompt
# Define a custom prompt
template = (
    "We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question and each answer should start with code word Doc chat:. And if the answer is not in given context should reply with sorry. {query_str}\n"
)
qa_template = Prompt(template)

In [None]:
# Use the custom prompt when querying
query_engine = index.as_query_engine(text_qa_template=qa_template)
response = query_engine.query(
    "trouble connecting to wifi network"
)
print(response)

Doc chat: Let's check the Wi-Fi connection settings:

● Verify that you're connected to the correct network.
● Enter the password correctly.
● Ensure your router is functioning properly.
● Restart your smartphone and router.
● If the issue persists, check for any router firmware updates or contact your internet service provider.


In [None]:
!pip install -q gradio


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.5/16.5 MB[0m [31m53.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.7/302.7 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.9/381.9 kB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.7/59.7 kB[0m [31m8.4 

In [None]:
import gradio as gr


In [None]:
textbox = gr.Textbox()

# Function for generating responses based on user input
def generate_response(user_query):
    response = query_engine.query(user_query)
    return response

# Gradio interface
demo = gr.Interface(fn=generate_response, inputs=textbox, outputs="text", title="Chat with Docs",
    description="Have a conversation with the bot!")

# Launch the interface
demo.launch()


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://0ec3f7226f259630fb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


