### Retrieval Augmented Generation

Retrieval augmented generation (RAG) is a natural language processing (NLP) technique that combines the power of both generative and retrieval-based models to generate coherent and informative text.

RAG models consist of two components:
1. **Retrieval-based Model**, such as Information Retrieval (IR) models which are based on ranking the documents in a corpus, based on their relevance to a given query.
2. **Generative Model**, such as large language models like nemo and llama


During the inference phase, the retrieval-based model is used to retrieve relevant passages or documents from a knowledge base, which are then fed into the generative model. The generative model then generates text that incorporates the retrieved information, resulting in text that is more informative and relevant to the given context.

### Config

In [None]:
import shutil

source = '/content/drive/MyDrive/A_RAG'
destination = '/content/'

shutil.move(source, destination)


'/content/A_RAG'

In [None]:
DATASET_NAME = 'Starbucks_Dataset'
SOURCE_DATA = f"./{DATASET_NAME}/Dataset/*.txt"
PROCESSED_DATA = f'./{DATASET_NAME}/Processed_Data/documents_v1.pkl'
TEST_DATA = f'./{DATASET_NAME}/Question.json'
DOCUMENT_STORE = f"./{DATASET_NAME}/inmemory_store/document_store.pkl"
RETRIEVAL_MODEL=f"./Model/save_retriever_{DATASET_NAME}"

# Data

Before we can start building our RAG application, we need to first create our vector DB that will contain our processed data sources.

-> Load Data

-> Write the data in the document store

-> Preprocess the data into chunks in the document store

-> Load the embedding model

-> Index the preprocess chunks in the Vector DB

### Load Data

The dataset consists of articles and Question Answer pair related to starbucks

We have used following datasets for this exercise:

1 - FAQ Data (https://customerservice.starbucks.com/sbux)

2 - Subset of Menu (https://www.starbucks.com/menu/food/hot-breakfast).

In [None]:
import glob

def read_data(SOURCE_DATA):
    txt_files = glob.glob(SOURCE_DATA)
    txt_list = []
    for text_file in txt_files:
        with open(text_file, 'r') as file:
            data = file.read()

        txt_list.append(data)
    return txt_list

In [None]:
data = read_data(SOURCE_DATA)

In [None]:
print('Total number of words present in the dataset ',len(('.'.join(data)).split()))


Total number of words present in the dataset  45028


Sample data collected from the references

In [None]:
print(data[0][101004:101585])

Published Sep 27, 2022
Click here to see a brief video tutorial on customizing your beverage in the Starbucks Mobile app.
I tried to install the Starbucks application on Microsoft Teams
on my phone or tablet but I do not see it?
Published Oct 26, 2022
You must first install the Starbucks® application on your web or desktop version of Microsoft Teams.
Once you have installed the application there, it will then become available on your phone or tablet.
How can I find a Starbucks® location near me?
Published Aug 29, 2022
Check out our Store Finder for a store location near you.


### Concatenate data from multiple documents

We can concatenate data from multiple documents such that we can write it in the document store at once

In [None]:
def update_all_data(data):
    dict_database = []
    for idx, txt in enumerate(data):
        print(f"Doc {idx} length: ", len(txt.split()))
        dict_database.append({'content': txt, 'meta': {'_split_id': idx}})
    return dict_database

In [None]:
dict_database = update_all_data(data)

Doc 0 length:  45028


In [None]:
dict_database[0]['content'][:1000]

"How do I submit a data privacy request to Starbucks?\nPublished Sep 14, 2022\nAfter reviewing the Starbucks Privacy Statement, you may submit a data privacy request to\nStarbucks. To submit your request, click 'form' at the bottom of the Privacy Policy webpage, or by\nclicking here.\nAfter you successfully submit your request, please allow up to 45 days for a member of the\nStarbucks Privacy Team to contact you by email.\nWhere can I find out about internship opportunities?\nPublished Aug 29, 2022\nAll available internships will be listed on our Career Center site and will have “Intern” in the job title.\nInternship opportunities are usually posted in late spring.\nBring your ideas, work with the best\nOur Starbucks interns collaborate directly with leaders, have access to career-elevating\nseminars and enjoy curated local experiences. This immersive internship program\nincludes opportunities at our Starbucks Support Center and Starbucks Technology\nCenter, with varying openings and n

### Writing the data in the VECTOR DB to preproccess


In [None]:
!pip install farm-haystack==1.17.2

In [None]:
!haystack --version

2024-05-21 15:53:22.634154: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-21 15:53:22.634211: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-21 15:53:22.635565: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
haystack, version 1.17.2


In [None]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
document_store.write_documents(dict_database)


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Preprocess Data into chunks in the Vector DB

We now have a list of sections (with text and source of each section) but we shouldn't directly use this as context to our RAG application just yet. The text lengths of each section are all varied and many are quite large chunks.

If we were to use these large sections, then we'd be inserting a lot of noisy/unwanted context and because all LLMs have a maximum context length, we wouldn't be able to fit too much other relevant context. So instead, we're going to split the text within each section into smaller chunks. Intuitively, smaller chunks will encapsulate single/few concepts and will be less noisy compared to larger chunks. We're going to choose some typical text splitting values (ex. chunk_size=100) to create our chunks for now but we'll be experimenting with a wider range of values later.

In [None]:

from haystack.nodes import PreProcessor

def preprocessing_data(document_store):
    preprocessor = PreProcessor(
        clean_empty_lines=True,
        clean_whitespace=True,
        clean_header_footer=False,
        split_by="word",
        split_length=100,
        split_overlap=25,
        split_respect_sentence_boundary=True,
    )

    docs_default = preprocessor.process(document_store.get_all_documents())
    return docs_default


In [None]:
preprocessed_docs = preprocessing_data(document_store)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]



In [None]:
len(preprocessed_docs[:100])

100

In [None]:
print('1st Doc')
print(preprocessed_docs[-50].content)
print('-'*100)
print('2nd doc')
print(preprocessed_docs[-6].content)

1st Doc
These include efforts such as retrofitting store lighting to
LEDs; installing more energy-efficient appliances like Advansys dishwashers and ENERGY STAR
certified batch-type ice makers; and standardizing energy-use schedules and heating/cooling
temperatures.

Waste Diversion
Despite challenges to implementation, such as differing rules and availability in each city and
town, Starbucks recycles in more than 8,000 company-operated stores across the U.S. More
than 3,000 have both recycling and compost services. In Greener Stores, the goal is to align the
different rules around waste in each municipality with the workflow and best practices inside
each store. 
----------------------------------------------------------------------------------------------------
2nd doc
Located just nine blocks from the original Pike Place Store, the
Seattle Roastery is an immersive expression of passion for coffee and invites customers to
experience coffee from bean to cup. It is the fulfillment of a

### Indexing

Now that we have our embedded chunks, we need to index (store) them somewhere so that we can retrieve them quickly for inference. While there are many popular vector database options, we're going to use InMemoyStore from haystack for its simplicity and performance.

In [None]:
import pickle

document_store = InMemoryDocumentStore(use_gpu=True, similarity='cosine')
document_store.delete_documents()
document_store.write_documents(preprocessed_docs[:100])

### Load the embedding model.
 Here we are using all-mpnet-base-v2

In [None]:
from haystack.nodes import EmbeddingRetriever

em_retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="sentence-transformers/all-mpnet-base-v2",
    model_format="sentence_transformers",
    batch_size=16,
)

In [None]:
document_store.update_embeddings(retriever=em_retriever)

with open(DOCUMENT_STORE, "wb") as f:
        pickle.dump(document_store, f)

Updating Embedding:   0%|          | 0/89 [00:00<?, ? docs/s]

Batches:   0%|          | 0/6 [00:00<?, ?it/s]

# Retreiver

With our embedded chunks indexed in our vector database, we're ready to perform retrieval for a given query. We'll start by using the same embedding model we used to embed our text chunks to now embed the incoming query.

### Training

Build Question-Answer Samples to train the retreiver model. To create Question- Answer pair we are using msmarco-t5-base-v1 model

**QuestionGenerator** can be used to determine the frequency of question generated from the documents
Here we are generating 3 questions from documents having 200 token

**PseudoLabelGenerator** can be used to generate postive and negative answers for the question from the respective chunk of  document

In [None]:
import os
from haystack.nodes.question_generator.question_generator import QuestionGenerator
from haystack.nodes.label_generator.pseudo_label_generator import PseudoLabelGenerator

qg = QuestionGenerator(model_name_or_path="doc2query/msmarco-t5-base-v1", max_length=256, split_length=200, batch_size=4, num_queries_per_doc=3, use_gpu=True)
psg = PseudoLabelGenerator(qg, em_retriever,batch_size= 4)

if not os.path.isdir(RETRIEVAL_MODEL):
    print('Training ...')
    output, _ = psg.run(documents=document_store.get_all_documents())
else:
    print('-'*40)
    print('          Model is Trained')
    print('-'*40)





config.json:   0%|          | 0.00/667 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.12k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

Using sep_token, but it is not set yet.


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Training ...


Generating questions:   0%|          | 0/89 [00:00<?, ?it/s]

Mine negatives:   0%|          | 0/67 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Score margin:   0%|          | 0/67 [00:00<?, ?it/s]

In [None]:
print('Question :',output['gpl_labels'][0]['question'])
print('-'*100)
print('Pos_ans_ :',output['gpl_labels'][0]['pos_doc'])
print('Neg_ans_ :',output['gpl_labels'][0]['neg_doc'])
print('-'*50)


Question : how to submit a privacy request to starbucks
----------------------------------------------------------------------------------------------------
Pos_ans_ : How do I submit a data privacy request to Starbucks?
Published Sep 14, 2022
After reviewing the Starbucks Privacy Statement, you may submit a data privacy request to
Starbucks. To submit your request, click 'form' at the bottom of the Privacy Policy webpage, or by
clicking here.
After you successfully submit your request, please allow up to 45 days for a member of the
Starbucks Privacy Team to contact you by email.
Where can I find out about internship opportunities?

Neg_ans_ : A brief interaction with
customers or between baristas is very low risk.
You can keep updated on the latest actions Starbucks is taking to prevent the spread of the virus at
Starbucks Stories & News.
How does my organization receive donations from Starbucks?
Published Aug 29, 2022
Starbucks continues to remain focused on funding communities throu

In [None]:
em_retriever.save(RETRIEVAL_MODEL)

### Retrieve n chunks from the database that are relevent to the query

here top_k value is 10, we are extracting top 10 doc for a single query

In [None]:
def unit_retrieve(query, retriever,K):
    return retriever.retrieve(query=query, top_k=K)

### Ranker

We need re-ranking because the first-stage retriever may be flawed. It may rank some irrelevant documents high, while some relevant documents might get lower scores. Thus, not all top-k documents are relevant, and not all relevant documents are in the top-k. Re-ranker refines these results and brings up the most relevant answers

In [None]:
from haystack.nodes import EmbeddingRetriever, SentenceTransformersRanker

ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")

def rank_documents(query, retreived_ducuments, ranker):

    results = ranker.predict(query=query, documents=retreived_ducuments)
    ids = [doc.meta["_split_id"] for doc in results]
    score = [doc.score for doc in results]
    return results, score

config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

### Inference

In [None]:
import pickle
with open(DOCUMENT_STORE, 'rb') as f:
        document_store = pickle.load(f)

em_retriever = EmbeddingRetriever(document_store=document_store,
                        embedding_model=RETRIEVAL_MODEL)


In [None]:
query= "How old do I have to be to work at Starbucks®?"
K = 5

retreived_ducuments = unit_retrieve(query, em_retriever,K)
results, score = rank_documents(query, retreived_ducuments, ranker)



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
retreived_ducuments

[<Document: {'content': 'For more specifics on the rules, read our Terms and Conditions. Contact us to report an offensive\nposting.\nHow old do I have to be to work at Starbucks®?\nPublished Sep 27, 2022\nIn most states you must be at least 16 years old (14 in Montana) to work for Starbucks®.\nI have a Teavana Glass Tea Tumbler that was recalled, what\ndo I do?\nPublished May 1, 2023\nPlease call the our Customer Service team at 1-800-STARBUC (782-7282).\nFor additional information please visit\nhttps://www.cpsc.gov/Recalls/2013/Teavana-Recalls-Glass-Tea-Tumblers.\nWhy did I receive a refund for my purchase on a Starbucks\nStore credit card?\n', 'content_type': 'text', 'score': 0.9999407529830933, 'meta': {'_split_id': 58, '_split_overlap': [{'doc_id': 'a3008c44219c24c368f43cd518f500f0', 'range': (0, 153)}, {'doc_id': '68eb91bbfc1e09f28d33adfd57fe070b', 'range': (334, 613)}]}, 'id_hash_keys': ['content'], 'embedding': None, 'id': '768f437004d61561f8003fde5a0fe2c2'}>,
 <Document: {'con

In [None]:
results, score

([<Document: {'content': 'For more specifics on the rules, read our Terms and Conditions. Contact us to report an offensive\nposting.\nHow old do I have to be to work at Starbucks®?\nPublished Sep 27, 2022\nIn most states you must be at least 16 years old (14 in Montana) to work for Starbucks®.\nI have a Teavana Glass Tea Tumbler that was recalled, what\ndo I do?\nPublished May 1, 2023\nPlease call the our Customer Service team at 1-800-STARBUC (782-7282).\nFor additional information please visit\nhttps://www.cpsc.gov/Recalls/2013/Teavana-Recalls-Glass-Tea-Tumblers.\nWhy did I receive a refund for my purchase on a Starbucks\nStore credit card?\n', 'content_type': 'text', 'score': 0.9999407529830933, 'meta': {'_split_id': 58, '_split_overlap': [{'doc_id': 'a3008c44219c24c368f43cd518f500f0', 'range': (0, 153)}, {'doc_id': '68eb91bbfc1e09f28d33adfd57fe070b', 'range': (334, 613)}]}, 'id_hash_keys': ['content'], 'embedding': None, 'id': '768f437004d61561f8003fde5a0fe2c2'}>,
  <Document: {'c

In [None]:
threshold_score = 0.90
context=" ".join([results[i].content.strip().replace("\n\n", "").replace("\n", "") for i in range(3) if score[i] >= threshold_score])
print(context)

For more specifics on the rules, read our Terms and Conditions. Contact us to report an offensiveposting.How old do I have to be to work at Starbucks®?Published Sep 27, 2022In most states you must be at least 16 years old (14 in Montana) to work for Starbucks®.I have a Teavana Glass Tea Tumbler that was recalled, whatdo I do?Published May 1, 2023Please call the our Customer Service team at 1-800-STARBUC (782-7282).For additional information please visithttps://www.cpsc.gov/Recalls/2013/Teavana-Recalls-Glass-Tea-Tumblers.Why did I receive a refund for my purchase on a StarbucksStore credit card? Published Sep 14, 2022Our blogs are open forums for ideas and discussion. That said, we will not allow profanity or otherinappropriate conduct. Posts that are inappropriate will be removed from the site and users thatrepeatedly post inappropriate conduct will be asked not to post at all.At this time, we do not accept unsolicited business ideas from customers or vendors.For more specifics on the 

In [None]:
pip install openai==0.27.7

Collecting openai==0.27.7
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/72.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.7


In [None]:
def complete(prompt):
    # query text-davinci-003
    res = openai.Completion.create(
        engine='gpt-3.5-turbo-instruct',
        prompt=prompt,
        temperature=1,
        max_tokens=3000,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )
    # print(res)
    return res['choices'][0]['text'].strip()

In [None]:
import openai

In [None]:
openai.api_key = KEY

In [None]:
prompt = f'''Answer the question based on the below context
Context:
{context}

Question:{query}
Answer:'''
print(prompt)

Answer the question based on the below context
Context:
For more specifics on the rules, read our Terms and Conditions. Contact us to report an offensiveposting.How old do I have to be to work at Starbucks®?Published Sep 27, 2022In most states you must be at least 16 years old (14 in Montana) to work for Starbucks®.I have a Teavana Glass Tea Tumbler that was recalled, whatdo I do?Published May 1, 2023Please call the our Customer Service team at 1-800-STARBUC (782-7282).For additional information please visithttps://www.cpsc.gov/Recalls/2013/Teavana-Recalls-Glass-Tea-Tumblers.Why did I receive a refund for my purchase on a StarbucksStore credit card? Published Sep 14, 2022Our blogs are open forums for ideas and discussion. That said, we will not allow profanity or otherinappropriate conduct. Posts that are inappropriate will be removed from the site and users thatrepeatedly post inappropriate conduct will be asked not to post at all.At this time, we do not accept unsolicited business id

In [None]:
complete(prompt)

'In most states, you must be at least 16 years old (14 in Montana) to work for Starbucks®.'

# Generator  
We can now use the context to generate a response from our LLM. Without this relevant context that we retrieved, the LLM may not have been able to accurately answer our question. And as our data grows, we can just as easily embed and index any new data and be able to retrieve it to answer questions.

* **Top-k** -  Top-k tells the model that it has to keep the top k highest probability tokens, from which the next token is selected at random. Lower values reduce randomness as you are clipping off less likely tokens generating predictable text. If k is set to 0, Top-k is not used. When set to 1, it is always going to select the most probable token next.

* **MAX_LENGTH** - Earlier, I mentioned that the LLM is focused on generating the next token given the sequence of tokens. The model does this in a loop appending the predicted token to the input sequence. You wouldn’t want the LLM to go on and on.While there is a limit to the number of tokens ranging from 2048 to 4096 that NeMo models can accept for now, I don’t recommend hitting these limits as the model may generate off responses.

* **Temperature** - This parameter controls the creative ability of your model. As discussed earlier, while generating the next token in the input sequence, the model comes up with a probability distribution. The temperature parameter adjusts the shape of this distribution, leading to more diversity in the generated text.At a lower temperature, the model is more conservative and is limited to choosing tokens with higher probabilities. As you increase the temperature, that limit gets lenient, allowing the model to choose lesser probable words, resulting in more unpredictable and creative text.

* **Top-p** - Parameter top-p can be used where the model picks at random from the highest probability tokens whose probabilities sum to or exceed the top-p value. If top-p is set to 0.9, one of the following scenarios may occur

* **Repetition penalty** - This parameter can help penalize tokens based on how frequently they occur in the text, including the input prompt. A token that has already appeared five times is penalized more heavily than a token that has appeared only one time. A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.

In [None]:
HUGGING_FACE_URL="http://20.24.60.60:8080/load_generator_starbucks"

In [None]:
import requests
query = 'How to redeem my gift cards in starbucks ? give the answer in 2 sentences '
#Context = " ".join([results[i].content.strip().replace("\n\n", "").replace("\n", "") for i in range(3) if score[i] >= 0.5])

Question='\nQuestion: '+ query + '\nAnswer :'
Prompt = ''
MAX_LENGTH= 1000
temperature= 0.1
length_penalty=-0.2
top_p=0.1
top_k=0
chat_history = ''
repetition_penalty=1.176
llm = requests.get("http://20.24.60.60:8080/load_generator_starbucks?MAX_LENGTH={}&temperature={}&length_penalty={}&top_p={}&top_k={}&repetition_penalty={}&Question={}&chat_history={}&context={}".format(MAX_LENGTH,temperature,length_penalty,top_p,top_k,repetition_penalty,Question,chat_history,''))


In [None]:
print(llm.json()['result'])

  Sure! I'd be happy to help with that. To redeem your Starbucks gift card, simply present it to the barista at the time of purchase and they will apply the value to your order. Alternatively, you can also load your gift card to your Starbucks Rewards account and use it towards purchases made online or through the Starbucks mobile app.


## Evaluation

### Load Test dataset

In [None]:
import json
f = open(TEST_DATA)
data = json.load(f)

Question = data['Question']
Ground_truth = data['Answer']

### Extract context for Test questions

In [None]:
import pandas as pd
df = pd.DataFrame()
unranked_context = []
ranked_context=[]
for query in Question :
    retreived_ducuments = unit_retrieve(query, em_retriever,5)
    results, score = rank_documents(query, retreived_ducuments, ranker)
    df['query']=query
    unranked_result = " ".join([retreived_ducuments[i].content.strip().replace("\n\n", "").replace("\n", "") for i in range(2) ])
    ranked_results = " ".join([results[i].content.strip().replace("\n\n", "").replace("\n", "") for i in range(3) if score[i] >= 0.5])
    ranked_context.append(ranked_results)
    unranked_context.append(unranked_result)
df['query']=Question
df['answer']=Ground_truth
df['unranked_context']=unranked_context
df['ranked_context']=ranked_context

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
df.head()

Unnamed: 0,query,answer,unranked_context,ranked_context
0,How old do I have to be to work at Starbucks®?,In most states you must be at least 16 years o...,"For more specifics on the rules, read our Term...","For more specifics on the rules, read our Term..."
1,Why did I receive a refund for my purchase on ...,If you make a return on items that you purchas...,Note that Starbucks Store Credit cards may be ...,Note that Starbucks Store Credit cards may be ...
2,How does my organization receive donations fro...,Starbucks continues to remain focused on fundi...,With more than $20million invested globally th...,A brief interaction withcustomers or between b...
3,What's the best way to store coffee? How long ...,"Once roasted, coffee begins to lose its flavor...","How long will it stayfresh?Published Mar 12, 2...",All Starbucks stores can grind coffee to thiss...
4,How can I comment or give feedback about an ex...,We always welcome your feedback about our serv...,"Published Aug 29, 2022While the usual space fo...","Published Aug 29, 2022While the usual space fo..."


### Extract responses from LLM based on -
* Ranked context - Hit LLMs with Most highly Ranked context and query
* Unranked context - Hit LLMs with one of the top most relevant context and query
* Only LLM - Hit LLM with only query

In [None]:
import requests
ranked_response_list = []
unranked_response_list =[]
llm_list=[]
for i in df.values:
    Question=i[0]
    unranked_context = i[1]
    ranked_context = i[2]
    MAX_LENGTH= 1000
    temperature= 0.1
    length_penalty=-0.2
    top_p=0.1
    chat_history = ''
    top_k=0
    repetition_penalty=1.176
    ranked_response = requests.get("http://20.24.60.60:8080/load_generator_starbucks?MAX_LENGTH={}&temperature={}&length_penalty={}&top_p={}&top_k={}&repetition_penalty={}&Question={}&chat_history={}&context={}".format(MAX_LENGTH,temperature,length_penalty,top_p,top_k,repetition_penalty,Question,chat_history,ranked_context))
    unranked_response = requests.get("http://20.24.60.60:8080/load_generator_starbucks?MAX_LENGTH={}&temperature={}&length_penalty={}&top_p={}&top_k={}&repetition_penalty={}&Question={}&chat_history={}&context={}".format(MAX_LENGTH,temperature,length_penalty,top_p,top_k,repetition_penalty,Question,chat_history,unranked_context))
    llm = requests.get("http://20.24.60.60:8080/load_generator_starbucks?MAX_LENGTH={}&temperature={}&length_penalty={}&top_p={}&top_k={}&repetition_penalty={}&Question={}&chat_history={}&context={}".format(MAX_LENGTH,temperature,length_penalty,top_p,top_k,repetition_penalty,Question,chat_history,''))

    ranked_response_list.append(ranked_response.json()['result'])
    unranked_response_list.append(unranked_response.json()['result'])
    llm_list.append(llm.json()['result'])

df['ranked_response']=ranked_response_list
df['unranked_response']=unranked_response_list
df['llm_only']=llm_list

In [None]:
df.head()

Unnamed: 0,query,answer,unranked_context,ranked_context,ranked_response,unranked_response,llm_only
0,How old do I have to be to work at Starbucks®?,In most states you must be at least 16 years o...,"For more specifics on the rules, read our Term...","For more specifics on the rules, read our Term...","Hello! To work at Starbucks®, you typically ...",Hello! Thank you for reaching out to me with...,Hello! Thank you for reaching out to me with...
1,Why did I receive a refund for my purchase on ...,If you make a return on items that you purchas...,Note that Starbucks Store Credit cards may be ...,Note that Starbucks Store Credit cards may be ...,Hello! Thank you for reaching out to me toda...,Hello! Thank you for reaching out to us. Bas...,Hello! I'd be happy to help you with your qu...
2,How does my organization receive donations fro...,Starbucks continues to remain focused on fundi...,With more than $20million invested globally th...,A brief interaction withcustomers or between b...,"Hello! As a Starbucks chatbot, I'm happy to ...","Hello! As a representative of Starbucks, I'm...","Hello! As a Starbucks Chatbot, I'd be happy ..."
3,What's the best way to store coffee? How long ...,"Once roasted, coffee begins to lose its flavor...","How long will it stayfresh?Published Mar 12, 2...",All Starbucks stores can grind coffee to thiss...,"Hello! As a Starbucks enthusiast, I'd be hap...",Hello! I'd be happy to help with your questi...,Hello! I'd be happy to help with your questi...
4,How can I comment or give feedback about an ex...,We always welcome your feedback about our serv...,"Published Aug 29, 2022While the usual space fo...","Published Aug 29, 2022While the usual space fo...",Hello! Thank you for reaching out to us. We ...,Hello! Thank you for reaching out to us. We ...,Hello! Thank you for reaching out to me toda...


### Below responses show how RAG helps to get better responses

Response from llm seems very generalized but after using RAG, the response below is more specific and to the point

In [None]:
n=7
print('*'*100)
print('Question : ',df['query'][n])
print('*'*100)
print()
print('LLM Response :',df['llm_only'][n])
print()
print('*'*100)
print()
print('RAG LLM Response :',df['ranked_response'][n])
print()
print('*'*100)
print('*'*100)

****************************************************************************************************
Question :  How is Starbucks® Pick Up different than my local Starbucks cafe?
****************************************************************************************************

LLM Response :   Hello! I'd be happy to help answer your question about Starbucks Pick Up compared to your local Starbucks cafe.

Starbucks Pick Up is a convenient way to order and pay for your favorite Starbucks drinks and food items online or through the Starbucks app, then pick them up at a designated time at a participating store. Here are some key differences between Starbucks Pick Up and visiting your local Starbucks cafe:

1. Convenience: With Starbucks Pick Up, you can place your order and pay ahead of time, so when you arrive to pick up your order, it's ready and waiting for you. This saves you time and allows you to skip the line at the cafe.
2. Customization: When ordering through Starbucks Pick Up,

In [None]:
n=5
print('*'*100)
print('Question : ',df['query'][n])
print('*'*100)
print()
print('LLM Response :',df['llm_only'][n])
print()
print('*'*100)
print()
print('RAG LLM Response :',df['ranked_response'][n])
print()
print('*'*100)
print('*'*100)

****************************************************************************************************
Question :  How do I reduce sugar in my beverage?
****************************************************************************************************

LLM Response :   Hello! I'd be happy to help you with reducing sugar in your Starbucks beverages. Here are some tips:

1. Ask for a "skinny" version of your drink. This will eliminate extra syrups and whipped cream, which can add a lot of sugar.
2. Choose a drink made with black coffee instead of flavored syrups. Black coffee has no added sugars.
3. Opt for a drink made with steamed milk instead of foam. Steamed milk contains less sugar than foam.
4. Consider using a natural sweetener like honey, agave nectar, or coconut sugar instead of refined white sugar. These sweeteners have fewer calories and can add a unique flavor to your drink.
5. If you prefer a sweeter drink, try asking for a "light" version instead of a "full-sugar" version. 

In [None]:
df.to_csv('output/rag.csv')

In [None]:
from datasets import load_metric
import datasets

In [None]:
from datasets import load_metric
import datasets
bertscore_metric = datasets.load_metric('bertscore', 'mrpc')

ranked_llm_eval = bertscore_metric.compute(predictions=[df['ranked_response']], references=[df['answer']],lang="en")
unranked_llm_eval = bertscore_metric.compute(predictions=[df['unranked_response']], references=[df['answer']],lang="en")
llm_eval = bertscore_metric.compute(predictions=[df['llm_only']], references=[df['answer']],lang="en")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
print('After Evaluating llm resposes without RAG -> ',llm_eval['f1'][0])
print('After Evaluating unranked RAG responses   -> ',unranked_llm_eval['f1'][0])
print('After Evaluating ranked RAG responses     -> ',ranked_llm_eval['f1'][0])

After Evaluating llm resposes without RAG ->  0.827800452709198
After Evaluating unranked RAG responses   ->  0.8511368036270142
After Evaluating ranked RAG responses     ->  0.8712872862815857
