<a href="https://colab.research.google.com/github/mun1shk/genAI/blob/main/RAG_using_Gemma%2C_Langchain_and_ChromaDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:


# install required libraries
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes
!pip install langchain
!pip install sentence-transformers
!pip install chromadb



  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
Collecting accelerate
  Downloading accelerate-0.28.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m70.8 MB/s

Introduction

This notebook demonstrates how to build a retrieval augmented generation (RAG) system using Gemma as a large language model (LLM), Langchain for tools to process input files, and ChromaDB as vector database.
What is RAG?

Retriever augmented generation (RAG) is a system that improves the response generated by a LLM in two ways:

    First, the information is retrieved from a dataset that is stored in vector database; the query is used to perform similarity search in the documents stored in the vector database.
    Second, by restraining the context provided to the LLM to content that is similar with the initial query, stored in the vector database, we can reduce significantly (or even eliminate) LLM's halucinations, since the answer is provided from the context of the stored documents.

An important advantage of this approach is that we do not need to fine-tune the LLM with our custom data; instead, the data is ingested (cleaned, transformed, chunked, and indexed in the vector database).
Procedure

We create two classes:

    AIAgent - An AI Agent that query Gemma LLM using a custom prompt that instruct Gemma to generate and answer (from the query) by refering to the context (as well provided); the answer to the AI Agent query function is then returned.
    RAGSystem - initialized with the dataset with Data Science information, with an AIAgent object. In the init function of this class, we ingest the data from the dataset in the vector database. This class have as well a query member function. In this function we first perform similarity search with the query to the vector database. Then, we call the generate function of the ai agent object. Before returning the answer, we use a predefined template to compose the overal response from the question, answer and the context retrieved.


In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

from IPython.display import display, Markdown

In [2]:
class AIAgent:
    """Gemma 2b-it based assistant that replies given the retrieved documents"""
    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
        self.gemma_lm = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="cuda")

    def create_prompt(self, query, context):
        # prompt template
        prompt = f"""
        You are an AI Agent specialized to answer to questions about Data Science.
        Explain the concept or answer the question about Data Science.
        In order to create the answer, please only use the information from the
        context provided (Context). Do not include other information.
        Answer with simple words.
        If needed, include also explanations.
        Question: {query}
        Context: {context}
        Answer:
        """
        return prompt

    def generate(self, query, retrieved_info):
        prompt = self.create_prompt(query, retrieved_info)
        input_ids = self.tokenizer(query, return_tensors="pt").input_ids.to('cuda')
        # Answer generation
        answer = self.gemma_lm.generate(
            input_ids,
            max_length=512, # limit the answer to 512
        )
        # Decode and return the answer
        answer = self.tokenizer.decode(answer[0], skip_special_tokens=True)
        return answer

In [3]:
class RAGSystem:
    """Sentence embedding based Retrieval Based Augmented generation.
        Given database of pdf files, retriever finds num_retrieved_docs relevant documents"""
    def __init__(self, ai_agent, num_retrieved_docs=3):
        # load the data
        self.num_docs = 3
        self.ai_agent = ai_agent
        loader = CSVLoader("dataset.csv")
        documents = loader.load()
        self.template = "\n\nQuestion:\n{question}\n\nAnswer:\n{answer}\n\nContext:\n{context}"

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=800,
            chunk_overlap=100)
        all_splits = text_splitter.split_documents(documents)
        # create a vectorstore database
        embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vector_db = Chroma.from_documents(documents=all_splits,
                                               embedding=embeddings,
                                               persist_directory="chroma_db")
        self.retriever = self.vector_db.as_retriever()

    def retrieve(self, query):
        # retrieve top k similar documents to query
        docs = self.retriever.get_relevant_documents(query)
        return docs

    def query(self, query):
        # generate the answer
        context = self.retrieve(query)
        answer = self.ai_agent.generate(query, context)

        return self.template.format(question=query,
                                   answer=answer,
                                   context=context)

In [4]:
def colorize_text(text):
    for word, color in zip(["Question", "Answer", "Context"], ["blue", "red", "green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

In [12]:
!pip install -U transformers==4.38.2

Collecting transformers==4.38.2
  Downloading transformers-4.38.2-py3-none-any.whl (8.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers==4.38.2)
  Downloading huggingface_hub-0.22.2-py3-none-any.whl (388 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, transformers
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.17.3
    Uninstalling huggingface-hub-0.17.3:
      Successfully uninstalled huggingface-hub-0.17.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.35.0
    Uninstalling transformers-4.35.0:
      Successfully uninstalled transformers-4.35.0
Successfully installed huggingface-hub-0.22.2 transformers-4.38.2


In [5]:
ai_agent = AIAgent()
rag_system = RAGSystem(ai_agent)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
answer = rag_system.query("What is SVM?")
display(Markdown(colorize_text(answer)))



**<font color='blue'>Question:</font>**
What is SVM?

**<font color='red'>Answer:</font>**
What is SVM?

SVM stands for Support Vector Machine. It is a supervised machine learning algorithm used for both classification and regression tasks.

**Key Concepts of SVM:**

* **Support Vector Machines (SVMs):** These are hyperplanes in a high-dimensional space that best separate the different classes of data.
* **Kernels:** These are functions that determine the distance between data points in the feature space. Different kernels can be used to define the distance metric.
* **Cost Function:** This is a penalty term that is added to the loss function to encourage the SVM to find a hyperplane that is as far from the data points as possible.
* **Optimization Algorithm:** This algorithm is used to find the optimal hyperplane that minimizes the cost function.

**How SVMs Work:**

1. **Data Preparation:** The data is first transformed into a high-dimensional feature space using a kernel function.
2. **Finding the Hyperplane:** An SVM is found by solving a linear optimization problem. The goal is to find the hyperplane that best separates the data points into different classes.
3. **Classification:** New data points are then classified by checking which side of the hyperplane they fall on.

**Advantages of SVMs:**

* High accuracy
* Robust to noise and outliers
* Can be used for both classification and regression

**Disadvantages of SVMs:**

* Can be sensitive to the choice of kernel function
* Not suitable for high-dimensional data
* Can be computationally expensive to train

**Applications of SVMs:**

* Credit card fraud detection
* Medical diagnosis
* Image recognition
* Natural language processing

**<font color='green'>Context:</font>**
[Document(page_content='question: What’s singular value decomposition? How is it typically used for machine learning? \u200d⭐️\nanswer: * Singular Value Decomposition (SVD) is a general matrix decomposition method that factors a matrix X into three matrices L (left singular values), Σ (diagonal matrix) and R^T (right singular values).\n* For machine learning, Principal Component Analysis (PCA) is typically used. It is a special type of SVD where the singular values correspond to the eigenvectors and the values of the diagonal matrix are the squares of the eigenvalues. We use these features as they are statistically descriptive.', metadata={'row': 141, 'source': 'dataset.csv'}), Document(page_content='question: What’s singular value decomposition? How is it typically used for machine learning? \u200d⭐️\nanswer: * Singular Value Decomposition (SVD) is a general matrix decomposition method that factors a matrix X into three matrices L (left singular values), Σ (diagonal matrix) and R^T (right singular values).\n* For machine learning, Principal Component Analysis (PCA) is typically used. It is a special type of SVD where the singular values correspond to the eigenvectors and the values of the diagonal matrix are the squares of the eigenvalues. We use these features as they are statistically descriptive.', metadata={'row': 141, 'source': 'dataset.csv'}), Document(page_content='question: What’s singular value decomposition? How is it typically used for machine learning? \u200d⭐️\nanswer: * Singular Value Decomposition (SVD) is a general matrix decomposition method that factors a matrix X into three matrices L (left singular values), Σ (diagonal matrix) and R^T (right singular values).\n* For machine learning, Principal Component Analysis (PCA) is typically used. It is a special type of SVD where the singular values correspond to the eigenvectors and the values of the diagonal matrix are the squares of the eigenvalues. We use these features as they are statistically descriptive.', metadata={'row': 141, 'source': 'dataset.csv'}), Document(page_content='question: What is supervised machine learning? 👶\nanswer: Supervised learning is a type of machine learning in which our algorithms are trained using well-labeled training data, and machines predict the output based on that data. Labeled data indicates that the\xa0input data has already been tagged with the appropriate output. Basically, it is the task of learning a function that maps the input set and returns an output. Some of its examples are: Linear Regression, Logistic Regression, KNN, etc.', metadata={'row': 0, 'source': 'dataset.csv'})]