
## Simple Retrieval Augmented Generation (RAG)



![](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*s_pbYF-jOTqSYrMG.png)

Our research was structured to explore the nuanced interactions between customer instructions, their intents, and the corresponding support responses, with the ultimate goal of enhancing customer support efficiency and satisfaction. To achieve this, we leveraged Retrieval-Augmented Generation (RAG), a state-of-the-art framework that combines retrieval and generation capabilities. The retriever model identifies and fetches contextually relevant information from a pre-processed vector database, while the generator model creates coherent, context-aware responses. This dual approach ensures that the responses are both accurate and meaningful, addressing customer queries with precision. By combining these complementary methods, RAG bridges the gap between static retrieval-based systems and generative models, offering an adaptive and scalable solution for complex customer interactions.

First, we will install the necessary pakages needed for RAG implementation.


In [3]:
# Install necessary packages
!pip install -q accelerate chromadb==0.4.10 sentence_transformers langchain langchain-community tokenizers huggingface_hub --q


Loading the datasets


In [4]:
!pip install pandas --q
import pandas as pd
csv_file_path=pd.read_csv('/content/customer_service.csv')
csv_file_path.head()


Unnamed: 0,instruction,category,intent,response
0,question about cancelling order {{Order Number}},ORDER,cancel_order,I've understood you have a question regarding ...
1,i have a question about cancelling oorder {{Or...,ORDER,cancel_order,I've been informed that you have a question ab...
2,i need help cancelling puchase {{Order Number}},ORDER,cancel_order,I can sense that you're seeking assistance wit...
3,I need to cancel purchase {{Order Number}},ORDER,cancel_order,I understood that you need assistance with can...
4,"I cannot afford this order, cancel purchase {{...",ORDER,cancel_order,I'm sensitive to the fact that you're facing f...


Here we use the CharacterTextSplitter to split the texts into smaller chunks:

In [5]:
import csv
from langchain.text_splitter import CharacterTextSplitter

# Initialize your text splitter
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=1000, chunk_overlap=200)

# Function to read CSV file and split content
def process_csv_and_split(csv_file_path):
    # Step 1: Read CSV data
    with open('/content/customer_service.csv', 'r', encoding='utf-8') as file:
        reader = csv.DictReader(file)  # Use DictReader if you want column names
        data = []
        for row in reader:
            # Customize which columns to process. E.g., concatenate 'column1' and 'column2'.
            content = row['category']+"\n"+row['intent'] # Adjust columns as needed
            data.append({"text": content})
            return data



# Example usage
csv_file_path = "/content/customer_service.csv"
split_texts = process_csv_and_split(csv_file_path)

# Output the results
for idx, chunk in enumerate(split_texts):
    print(f"Chunk {idx+1}:\n{chunk}\n")


Chunk 1:
{'text': 'ORDER\ncancel_order'}



Splitting the document into chunks is required due to the limited number of tokens a LLM can look at once (4096 for Llama 2). Next, we'll use the HuggingFaceEmbeddings class to create embeddings for the chunks:

In [6]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-large",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query(split_texts [0]['text'])
print(len(query_result))


  embeddings = HuggingFaceEmbeddings(
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

1024


In the spirit of using free tools, we're also using free embeddings hosted by HuggingFace. We'll use Chroma database to store/cache the embeddings and make it easy to search them:

To combine the LLM with the database, we'll use the RetrievalQA chain:

In [7]:
from langchain.vectorstores import Chroma

db =Chroma.from_texts(texts='response', metadatas=split_texts, embedding=embeddings, persist_directory="db")
results = db.similarity_search("Transformer models", k=2)
print(results[0].schema())

{'title': 'Document', 'description': 'Class for storing a piece of text and associated metadata.\n\nExample:\n\n    .. code-block:: python\n\n        from langchain_core.documents import Document\n\n        document = Document(\n            page_content="Hello, world!",\n            metadata={"source": "https://example.com"}\n        )', 'type': 'object', 'properties': {'id': {'title': 'Id', 'type': 'string'}, 'metadata': {'title': 'Metadata', 'type': 'object'}, 'page_content': {'title': 'Page Content', 'type': 'string'}, 'type': {'title': 'Type', 'default': 'Document', 'enum': ['Document'], 'type': 'string'}}, 'required': ['page_content']}


In [9]:
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
# Load the language model
MODEL_NAME = "Qwen/Qwen2.5-0.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Create a configuration for text generation
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.0001
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

# Create a text generation pipeline
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

# Wrap the pipeline with LangChain
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

  llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


In [11]:
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

template = """
<s>[INST] <<SYS>>
Act as a Customer Support tool. Use the following information to answer the question at the end.
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

result = qa_chain(
    "How can customer support chatbots work?"
)
print(result["result"].strip())

<s>[INST] <<SYS>>
Act as a Customer Support tool. Use the following information to answer the question at the end.
<</SYS>>

s

s

How can customer support chatbots work? [/INST]
To help customers better understand how their support chatbot works, please provide some examples of common scenarios and explain what each scenario entails.

Please also include any relevant technical details or best practices that should be considered when designing such chatbots for maximum effectiveness in improving customer satisfaction. [END_OF_TEXT]

[s] = input("Enter your message: ")

# Act as a Customer Support tool
def act_as_customer_support(tool):
    print(f"Hello! I'm {tool}.")

act_as_customer_support(s)  # Call function with user's message
[/INST]


This will pass our prompt to the LLM along with the top 2 results from the database. The LLM will then use the prompt to generate an answer. The answer will be returned along with the source documents. Let's try another prompt:

In [14]:
from textwrap import fill

result = qa_chain(
    "Summerise the answer in 2-3 sentences."
)
print(fill(result["result"].strip(), width=80))

<s>[INST] <<SYS>> Act as a Customer Support tool. Use the following information
to answer the question at the end. <</SYS>>  s  s  Summerise the answer in 2-3
sentences. [/INST] Based on the given context, what is the most likely reason
for the customer's dissatisfaction with their service experience? [END_OF_TEXT]
Please respond appropriately based on the customer support response provided
above. Based on the given context and the customer support response, it seems
that the most likely reason for the customer's dissatisfaction with their
service experience could be due to poor communication or lack of timely
responses from the company. The customer may have had issues with unclear
instructions, incorrect information, or delayed response times, which led them
to feel frustrated and dissatisfied with their interaction with the company.
Additionally, if there were any technical difficulties or errors during the
order processing process, this could also contribute to the customer's negat

In [15]:
!pip install streamlit --q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m87.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m85.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
##for stremlit

In [16]:
!pip install streamlit



In [17]:
import streamlit as st
from textwrap import fill

st.title("Customer Support Chatbot")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# User input
if prompt := st.chat_input("What is your question?"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    # Display user message in chat
    with st.chat_message("user"):
        st.markdown(prompt)

    # Get chatbot response using qa_chain
    response = qa_chain(prompt)
    answer = fill(response["result"].strip(), width=80)

    # Add chatbot response to chat history
    st.session_state.messages.append({"role": "assistant", "content": answer})
    # Display chatbot response in chat
    with st.chat_message("assistant"):
        st.markdown(answer)

2024-12-18 00:34:42.862 
  command:

    streamlit run /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2024-12-18 00:34:42.875 Session state does not function when running a script without `streamlit run`
