# RAG : Retrieval Augmented Generation

**(with Open-source LLMs)**

OBJECTIVES:

For this project, we will develop a RAG system that will answer the questions based on
the knowledge base. We are going to use two assignment instruction documents from
Module 3 i.e. [M3: AST-1] and [M3: AST-2] as a knowledge base. You can choose any
other dataset/knowledge base as well. You are encouraged to use open-source LLMs
but you are free to use OpenAI API

## Introduction

**Example workflow with embedding model:**

<br>

<img src='https://www.researchgate.net/publication/381125820/figure/fig2/AS:11431281249185289@1717499737731/Illustration-of-a-Retrieval-Augmented-Generation-RAG-workflow-Documents-are-loaded-and.ppm'>

### Install Dependencies

In [1]:
%%capture
!pip -q install langchain-core
!pip -q install langchain-community
!pip -q install sentence-transformers
!pip -q install langchain-huggingface
!pip -q install langchain-chroma
!pip -q install chromadb
!pip -q install pypdf

### Import Required Packages

In [2]:
import os
import numpy as np
from getpass import getpass
from langchain_huggingface import HuggingFaceEndpoint
from langchain_community.document_loaders import PyPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

#### **Authentication for Huggingface API**

In [3]:
import os
from getpass import getpass

hfapi_key = getpass("Enter you HuggingFace access token:")
os.environ["HF_TOKEN"] = hfapi_key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hfapi_key

Enter you HuggingFace access token:··········


In [4]:
# If your access token is in a text file, use this code cell.


# import os
# f = open('/content/hfapi_key.txt')
# hfapi_key=f.read()
# os.environ["HF_TOKEN"] = hfapi_key
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = hfapi_key

### Prepare Open Source LLM

In [5]:
# importing HuggingFace model abstraction class from langchain
from langchain_huggingface import HuggingFaceEndpoint

In [6]:
llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",       # Model card: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
    task="text-generation",
    max_new_tokens = 512,
    top_k = 30,
    temperature = 0.011,
    repetition_penalty = 1.03,
)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [7]:
# General query
response = llm.invoke("How to learn programming? give 5 points")
print(response)

.

1. Start with the basics: Before diving into complex programming concepts, it’s essential to start with the basics. Learn the syntax and structure of the programming language you want to learn. This will help you build a strong foundation and make it easier to understand more advanced concepts later on.

2. Practice, practice, practice: The more you practice programming, the better you’ll become. Set aside time each day to work on coding projects, even if they’re small at first. This will help you build confidence and develop your skills over time.

3. Collaborate with others: Join online communities or attend local meetups to connect with other programmers. Collaborating with others can help you learn new techniques and approaches, as well as provide valuable feedback on your code.

4. Use resources wisely: There are countless resources available for learning programming, from online tutorials to books and courses. However, it’s essential to use these resources wisely. Focus on res

In [48]:
# Specific query
llm.invoke("What is production code?")

'\n\nProduction code is the final version of software that is released to end-users. It is the version that has been thoroughly tested, debugged, and optimized for performance and reliability. Production code is typically deployed to a production environment, which is a live system where users interact with the software.\n\nIn contrast, development code is the version of software that is being created or modified by developers. It is not yet ready for release to end-users and may contain bugs, errors, or incomplete features. Development code is typically tested in a development or staging environment, which is a simulated production environment used for testing and debugging.\n\nThe difference between production code and development code is significant because production code must meet higher standards of quality, reliability, and performance than development code. Production code must be able to handle large volumes of data, respond quickly to user requests, and be highly available an

### **Loading the documents**

[PDF Loader](https://python.langchain.com/docs/how_to/document_loader_pdf/)

Step 1: Load the document files (1 point)


In [9]:
# UPLOAD the Docs first to this notebook, then run this cell

from langchain_community.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("/content/M3-W2-AST2.pdf"),
    PyPDFLoader("/content/M3-W1-AST1.pdf"),
]

docs = []
for loader in loaders:
    docs.extend(loader.load())


In [10]:
len(docs)        # 7 pages were there in total from above documents

11

In [11]:
docs

[Document(metadata={'source': '/content/M3-W2-AST2.pdf', 'page': 0}, page_content="Module\n3:\nAST-2\nTITLE:\nTesting\nthe\nModules\n&\nPackaging \nLEARNING\nOBJECTIVES:\nAt\nthe\nend\nof\nthe\nexperiment,\nyou\nwill\nbe\nable\nto\ninclude\ntesting\naspects\nin\nthe\nproject\nand\nwrite\ntest\ncases\ncontinuing\nfrom\nAST1.\nFinally ,\nyou\nwill\nbe\nable\nto\ncreate\na\npython\npackage\nof\nthe\nmodel\nwhich\ncan\nbe\neasily\nconsumed\nby\nany\nAPI.\nYou\nwill\nbe\nable\nto\nunderstand\nand\nimplement\nthe\nfollowing\naspects:\n1.\nTesting\nconcept\nand\nautomated\ntesting\nusing\npytest\n \n2.\nPackaging\nof\nmodel\nINTRODUCTION\nTesting:\nSoftware\ntesting\nis\na\ncrucial\npart\nof\nthe\nsoftware\ndevelopment\nprocess.\nIt\ninvolves\nexecuting\na\nprogram\nor\nsystem\nwith\nthe\nintention\nof\nfinding\nerrors\nor\nverifying\nits\ncompliance\nwith\nspecified\nrequirements.\nThe\ngoal\nof\ntesting\nis\nto\nidentify\ndefects\nand\nensure\nthat\nthe\nsoftware\nfunctions\nas\nintended,\n

In [12]:
print(docs[0].page_content)

Module
3:
AST-2
TITLE:
Testing
the
Modules
&
Packaging 
LEARNING
OBJECTIVES:
At
the
end
of
the
experiment,
you
will
be
able
to
include
testing
aspects
in
the
project
and
write
test
cases
continuing
from
AST1.
Finally ,
you
will
be
able
to
create
a
python
package
of
the
model
which
can
be
easily
consumed
by
any
API.
You
will
be
able
to
understand
and
implement
the
following
aspects:
1.
Testing
concept
and
automated
testing
using
pytest
 
2.
Packaging
of
model
INTRODUCTION
Testing:
Software
testing
is
a
crucial
part
of
the
software
development
process.
It
involves
executing
a
program
or
system
with
the
intention
of
finding
errors
or
verifying
its
compliance
with
specified
requirements.
The
goal
of
testing
is
to
identify
defects
and
ensure
that
the
software
functions
as
intended,
meets
user
expectations,
and
operates
reliably
in
various
scenarios.
Testing
provides
several
benefits
to
software
development:
●
(i)
Error
Detection
:
Testing
helps
identify
bugs,
errors,
and
unexpected
behavior

### **Splitting of document**

[Recursively split by character](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

[Split by character](https://python.langchain.com/docs/how_to/character_text_splitter/)

Step 2: Split the loaded documents into chunks (1 point)
Experiment with different chunking method and parameters and check the final
response

In [13]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [14]:
# Split
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

In [15]:
splits = text_splitter.split_documents(docs)

print(len(splits))
print(len(splits[0].page_content) )
splits[0].page_content

38
493


'Module\n3:\nAST-2\nTITLE:\nTesting\nthe\nModules\n&\nPackaging \nLEARNING\nOBJECTIVES:\nAt\nthe\nend\nof\nthe\nexperiment,\nyou\nwill\nbe\nable\nto\ninclude\ntesting\naspects\nin\nthe\nproject\nand\nwrite\ntest\ncases\ncontinuing\nfrom\nAST1.\nFinally ,\nyou\nwill\nbe\nable\nto\ncreate\na\npython\npackage\nof\nthe\nmodel\nwhich\ncan\nbe\neasily\nconsumed\nby\nany\nAPI.\nYou\nwill\nbe\nable\nto\nunderstand\nand\nimplement\nthe\nfollowing\naspects:\n1.\nTesting\nconcept\nand\nautomated\ntesting\nusing\npytest\n \n2.\nPackaging\nof\nmodel\nINTRODUCTION\nTesting:\nSoftware'

In [16]:
splits[0]

Document(metadata={'source': '/content/M3-W2-AST2.pdf', 'page': 0}, page_content='Module\n3:\nAST-2\nTITLE:\nTesting\nthe\nModules\n&\nPackaging \nLEARNING\nOBJECTIVES:\nAt\nthe\nend\nof\nthe\nexperiment,\nyou\nwill\nbe\nable\nto\ninclude\ntesting\naspects\nin\nthe\nproject\nand\nwrite\ntest\ncases\ncontinuing\nfrom\nAST1.\nFinally ,\nyou\nwill\nbe\nable\nto\ncreate\na\npython\npackage\nof\nthe\nmodel\nwhich\ncan\nbe\neasily\nconsumed\nby\nany\nAPI.\nYou\nwill\nbe\nable\nto\nunderstand\nand\nimplement\nthe\nfollowing\naspects:\n1.\nTesting\nconcept\nand\nautomated\ntesting\nusing\npytest\n \n2.\nPackaging\nof\nmodel\nINTRODUCTION\nTesting:\nSoftware')

### **Embeddings**

Let's take our splits and embed them.

Step 3: Create embeddings and store in Vector Database (3 points)
Your are encouraged to use open source embedding models from
mteb/leaderboard [Massive Text Embedding Benchmark (MTEB) Leaderboard]


In [17]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")

Device: cpu


In [18]:
# Embedding Model

from langchain_huggingface import HuggingFaceEmbeddings

modelPath ="mixedbread-ai/mxbai-embed-large-v1"                  # Model card: https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                                                                 # Find other Emb. models at: https://huggingface.co/spaces/mteb/leaderboard

# Create a dictionary with model configuration options, specifying to use the CPU for computations
model_kwargs = {'device': device}      # cuda/cpu

# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': False}

embedding =  HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/171 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/114k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/677 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/297 [00:00<?, ?B/s]

In [19]:
embedding

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
), model_name='mixedbread-ai/mxbai-embed-large-v1', cache_folder=None, model_kwargs={'device': 'cpu'}, encode_kwargs={'normalize_embeddings': False}, multi_process=False, show_progress=False)

### **Vectorstores**

In [20]:
from langchain_chroma import Chroma       # Light-weight and in memory

In [21]:
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [22]:
vectordb = Chroma.from_documents(
    documents=splits,                    # splits we created earlier
    embedding=embedding,
    persist_directory=persist_directory, # save the directory
)

In [23]:
print(vectordb._collection.count()) # same as number of splits

38


Step 4: Perform the retrieval augmented generation by integrating with LLM (3 points)
Experiment with and without MMR and check the final response. It is encouraged
to use open-source LLMs.

In [24]:
# Without MMR
question = "What is software testing?"
retriever = vectordb.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 0, 'source': '/content/M3-W2-AST2.pdf'}, page_content='Packaging\nof\nmodel\nINTRODUCTION\nTesting:\nSoftware\ntesting\nis\na\ncrucial\npart\nof\nthe\nsoftware\ndevelopment\nprocess.\nIt\ninvolves\nexecuting\na\nprogram\nor\nsystem\nwith\nthe\nintention\nof\nfinding\nerrors\nor\nverifying\nits\ncompliance\nwith\nspecified\nrequirements.\nThe\ngoal\nof\ntesting\nis\nto\nidentify\ndefects\nand\nensure\nthat\nthe\nsoftware\nfunctions\nas\nintended,\nmeets\nuser\nexpectations,\nand\noperates\nreliably\nin\nvarious\nscenarios.\nTesting\nprovides\nseveral\nbenefits\nto\nsoftware\ndevelopment:\n●\n(i)\nError\nDetection\n:'),
 Document(metadata={'page': 0, 'source': '/content/M3-W2-AST2.pdf'}, page_content="to\nsoftware\ndevelopment:\n●\n(i)\nError\nDetection\n:\nTesting\nhelps\nidentify\nbugs,\nerrors,\nand\nunexpected\nbehavior\nin\nthe\nsoftware\n●\n(ii)\nVerification\nand\nValidation\n:\nTesting\nvalidates\nthat\nthe\nsoftware\nmeets\nthe\nspecified\nrequirement

In [25]:
# With MMR
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 2, "fetch_k":5})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 0, 'source': '/content/M3-W2-AST2.pdf'}, page_content='Packaging\nof\nmodel\nINTRODUCTION\nTesting:\nSoftware\ntesting\nis\na\ncrucial\npart\nof\nthe\nsoftware\ndevelopment\nprocess.\nIt\ninvolves\nexecuting\na\nprogram\nor\nsystem\nwith\nthe\nintention\nof\nfinding\nerrors\nor\nverifying\nits\ncompliance\nwith\nspecified\nrequirements.\nThe\ngoal\nof\ntesting\nis\nto\nidentify\ndefects\nand\nensure\nthat\nthe\nsoftware\nfunctions\nas\nintended,\nmeets\nuser\nexpectations,\nand\noperates\nreliably\nin\nvarious\nscenarios.\nTesting\nprovides\nseveral\nbenefits\nto\nsoftware\ndevelopment:\n●\n(i)\nError\nDetection\n:'),
 Document(metadata={'page': 1, 'source': '/content/M3-W2-AST2.pdf'}, page_content='Types\nof\nTests\n:\nThree\ntypes\nof\ntests\ncan\nbe\ndesigned: \nUnit\ntest,\nIntegration\ntest \nand \nsystem\ntest.\nA\nunit\ntest\n is\nthe\nsmallest\nand\nsimplest\nform\nof\nsoftware\ntesting. \nThese\ntests\nare\nemployed\nto\nassess\na\nseparable\nunit\n

## **Augmentation**

In [26]:
from langchain_core.prompts import PromptTemplate                                    # To format prompts
from langchain_core.output_parsers import StrOutputParser                            # to transform the output of an LLM into a more usable format
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough          # Required by LCEL (LangChain Expression Language)

In [27]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""

QA_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

## **Creating final RAG Chain**

> <img src='https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2F63f8a8482c9ec06a8d7d1041514f87c06dd108a9-3442x942.png&w=3840&q=75' width=1200px>

[[Image source](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/)]

In [28]:
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 7, "fetch_k":15})
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x79c6e4c54e50>, search_type='mmr', search_kwargs={'k': 7, 'fetch_k': 15})

In [29]:
retrieval = RunnableParallel(
    {
        "context": RunnablePassthrough(context= lambda x: x["question"] | retriever),
        "question": RunnablePassthrough()
        }
    )

In [30]:
# RAG Chain

rag_chain = (retrieval                     # Retrieval
             | QA_PROMPT                   # Augmentation
             | llm                         # Generation
             | StrOutputParser()
             )

Step 5: Frame at least five logical questions relevant to the knowledge base and
demonstrate relevant answers from the RAG system (1 point)

In [31]:
response = rag_chain.invoke({"question": "What is software testing ?"})

response

' Software testing is the process of executing a software application with the intent of finding errors, bugs, and other issues. The goal of software testing is to ensure that the software meets the requirements specified by the customer and functions as expected in real-world scenarios. Testing is an essential part of the software development lifecycle, as it helps to identify and resolve issues early on, which can save time and resources in the long run. Some common types of software testing include functional testing, performance testing, security testing, and regression testing. Thanks for asking!'

In [32]:
response = rag_chain.invoke({"question": "What are the benefits of testing to software development ?"})

response

" Testing is an essential part of software development as it helps to identify and fix issues before the software is released to the end-users. Here are some benefits of testing to software development:\n\n1. Improved Quality: Testing ensures that the software meets the required quality standards and is free from defects. This results in a better user experience and reduces the likelihood of customer complaints.\n\n2. Reduced Costs: Early detection of issues through testing can help to reduce the overall cost of software development. This is because fixing issues during the testing phase is less expensive than fixing them during the production phase.\n\n3. Faster Time-to-Market: By identifying and addressing issues during testing, software development teams can release the software to the market faster. This is because there are fewer delays due to bug fixes and retesting.\n\n4. Enhanced Customer Satisfaction: By delivering high-quality software that meets the customer's needs, softwar

In [33]:
response = rag_chain.invoke({"question": "What should be considerations for the production code?"})

print(response)

 {'answer': "1. Performance: The production code should be optimized for performance to ensure that the application runs smoothly and efficiently. This may involve using techniques such as caching, lazy loading, and minimizing database queries.
2. Scalability: The production code should be designed to handle a large number of users and requests. This may involve using load balancing, horizontal scaling, and distributed computing techniques.
3. Security: The production code should include measures to protect user data and prevent common security threats such as SQL injection, cross-site scripting (XSS), and cross-site request forgery (CSRF).
4. Maintainability: The production code should be easy to understand, modify, and maintain over time. This may involve using clear and concise variable names, commenting code where necessary, and following a consistent coding style.
5. Documentation: The production code should be well-documented to help other developers understand how it works and h

In [47]:
# For queries that is not in documents
response = rag_chain.invoke({"question": "Where is TituMonPumping ?"})

print(response)           # It should return "I don't know. Thanks for asking!". The open-source model used is not that great.

 TituMonPumping is currently located in the city of Miami, which is in the state of Florida in the United States. Thank you for asking!


Step 6: Create a Gradio App where user can write the query and get the response from
the RAG system (1 point)

In [39]:
!pip -q install gradio

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/18.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/18.1 MB[0m [31m17.2 MB/s[0m eta [36m0:00:02[0m[2K   [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/18.1 MB[0m [31m54.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m9.8/18.1 MB[0m [31m93.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m16.6/18.1 MB[0m [31m183.2 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m18.1/18.1 MB[0m [31m180.9 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m18.1/18.1 MB[0m [31m180.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.1/18.1 MB[0m [31m

In [40]:
import gradio as gr

Step 7: Deploy the application on HF Spaces [Optional

### **Download the vector DB**

In [35]:
# Zip the entire folder
!zip -r /content/docs.zip /content/docs

  adding: content/docs/ (stored 0%)
  adding: content/docs/chroma/ (stored 0%)
  adding: content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/ (stored 0%)
  adding: content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/data_level0.bin (deflated 9%)
  adding: content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/link_lists.bin (stored 0%)
  adding: content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/header.bin (deflated 61%)
  adding: content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/length.bin (deflated 49%)
  adding: content/docs/chroma/chroma.sqlite3 (deflated 57%)


In [36]:
from google.colab import files
files.download("/content/docs.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### **Upload the vector db from previous step and unzip**

In [37]:
!unzip /content/docs.zip  -d /

Archive:  /content/docs.zip
replace /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/data_level0.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/data_level0.bin  
replace /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/link_lists.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
 extracting: /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/link_lists.bin  
replace /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/header.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/header.bin  
replace /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/length.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: /content/docs/chroma/7b760f42-286b-4009-96e5-79ccc8a8963a/length.bin  
  inflating: /content/docs/chroma/chroma.sqlite3  


In [38]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

embedding =  HuggingFaceEmbeddings(
    model_name="mixedbread-ai/mxbai-embed-large-v1",                             # Provide the pre-trained model's path
    model_kwargs={'device': "cuda" if torch.cuda.is_available() else "cpu"},     # Pass the model configuration options
    encode_kwargs={'normalize_embeddings': False}                                # Pass the encoding options
)

vectordb = Chroma(persist_directory = 'docs/chroma/',
                  embedding_function = embedding
                  )

In [43]:
def generate_query_response(prompt):
    embedding =  HuggingFaceEmbeddings(
    model_name="mixedbread-ai/mxbai-embed-large-v1",                             # Provide the pre-trained model's path
    model_kwargs={'device': "cuda" if torch.cuda.is_available() else "cpu"},     # Pass the model configuration options
    encode_kwargs={'normalize_embeddings': False}                                # Pass the encoding options
    )

    vectordb = Chroma(persist_directory = 'docs/chroma/',
                  embedding_function = embedding
                  )

    template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Always say "thanks for asking!" at the end of the answer.
    {context}
    Question: {question}
    Helpful Answer:"""

    QA_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

    retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3, "fetch_k":6})

    retrieval = RunnableParallel(
    {
        "context": RunnablePassthrough(context= lambda x: x["question"] | retriever),
        "question": RunnablePassthrough()
        }
    )

    # RAG Chain

    rag_chain = (retrieval                     # Retrieval
             | QA_PROMPT                   # Augmentation
             | llm                         # Generation
             | StrOutputParser()
             )

    response = rag_chain.invoke({"question": prompt})

    return response




In [44]:
# Output response
prompt = "What are the benefits of testing to software development?"
out_response = generate_query_response(prompt)
print(out_response)

 Testing is an essential part of software development as it helps to identify and fix issues before the software is released to the market. Here are some benefits of testing to software development:

1. Improved Quality: Testing ensures that the software meets the required quality standards and is free from defects. This results in a better user experience and reduces the likelihood of customer complaints.

2. Reduced Costs: Early detection of issues through testing can help to reduce the overall cost of software development. This is because fixing issues during the testing phase is less expensive than fixing them during the production phase.

3. Faster Time-to-Market: By identifying and fixing issues early in the development cycle, testing can help to accelerate the software release process. This results in a faster time-to-market and a competitive advantage.

4. Enhanced Customer Satisfaction: By delivering high-quality software that meets the customer's needs, testing can help to en

In [46]:
# Gradio interface to generate UI link
iface = gr.Interface(fn=generate_query_response,
                    inputs = [gr.Textbox(label="Enter your your prompt")],
                    outputs="textbox",
                    title = "RAG using OpenSource LLMs",
                    description = "RAG using OpenSource LLMs",
                    allow_flagging = 'never')

iface.launch(share = True,debug=True)
# YOUR CODE HERE to launch the interface

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://0e840c37b01b1e0439.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://0e840c37b01b1e0439.gradio.live


