<a href="https://colab.research.google.com/github/muthuraman2002/RAG-system-in-colab/blob/main/Chatbot_using_RAG_and_LangChain_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Implement a RAG system using the Ollama GPT-2 model.

## Install necessary libraries

### Subtask:
Install all required libraries for setting up the RAG system, including `ollama`, `transformers`, `sentence-transformers`, `langchain`, and `faiss-cpu`.


**Reasoning**:
Install the required libraries using pip.



In [11]:
%pip install transformers sentence-transformers langchain faiss-cpu



## Load and process data

### Subtask:
Load the documents for the RAG system and process them into a suitable format for embedding.


**Reasoning**:
Load the documents and process them into a suitable format for embedding using RecursiveCharacterTextSplitter.



**Reasoning**:
Correct the typo in the class name and re-run the code to split the documents.



In [33]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
import pandas as pd


# 1. Load CSV file
df = pd.read_csv("../content/healthcare_dataset.csv")

# 2. Convert rows into LangChain Documents
documents = []
for i, row in df.iterrows():
    content = " ".join([f"{col}: {row[col]}" for col in df.columns])
    documents.append(Document(page_content=content))

# 1. Define a list of strings where each string represents a document.
# documents = [
#     "This is the first document. It talks about the basics of Large Language Models and their applications.",
#     "The second document discusses the architecture of the GPT series of models, including GPT-2.",
#     "Document three focuses on Retrieval Augmented Generation (RAG) systems and how they combine retrieval and generation techniques.",
#     "The fourth document explores the use of vector databases like FAISS for efficient similarity search in RAG systems.",
#     "This is the fifth and final document. It provides an overview of the Langchain framework and its components for building LLM applications."
# ]

# 2. Instantiate a RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

# 3. Use the create_documents method to process the document strings
# Extract the page_content (string) from each Document object
text_content = [doc.page_content for doc in documents]
docs = text_splitter.create_documents(text_content)

# Display the first few processed documents
print(docs[:2])

[Document(metadata={}, page_content='Name: Bobby JacksOn Age: 30 Gender: Male Blood Type: B- Medical Condition: Cancer Date of Admission:'), Document(metadata={}, page_content='Date of Admission: 2024-01-31 Doctor: Matthew Smith Hospital: Sons and Miller Insurance Provider:')]


## Create embeddings

### Subtask:
Generate embeddings for the processed documents using a sentence transformer model.


**Reasoning**:
Generate embeddings for the processed documents using a sentence transformer model.



**Reasoning**:
The previous command failed because the `langchain_community` module was not found. This is likely because the installed version of `langchain` does not include this module or it needs to be installed separately. Install `langchain-community` to resolve this issue.



In [15]:
%pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

**Reasoning**:
Now that `langchain-community` is installed, retry generating embeddings for the processed documents using a sentence transformer model.



In [16]:
from langchain_community.embeddings import SentenceTransformerEmbeddings

# Instantiate SentenceTransformerEmbeddings
# Ensure you have internet access to download the model if not already cached
try:
    embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    print("SentenceTransformerEmbeddings model loaded successfully.")
except Exception as e:
    print(f"Error loading SentenceTransformerEmbeddings model: {e}")
    embeddings = None


# Generate embeddings for the processed documents
if embeddings is not None and docs: # Use 'docs' which contains the split documents
    print(f"Generating embeddings for {len(docs)} documents...")
    try:
        doc_embeddings = embeddings.embed_documents([doc.page_content for doc in docs])
        print(f"Generated {len(doc_embeddings)} embeddings.")
        if doc_embeddings:
            print(f"Example embedding (first 10 dimensions): {doc_embeddings[0][:10]}")
    except Exception as e:
        print(f"Error generating embeddings: {e}")
        doc_embeddings = None
else:
    print("Skipping embedding generation due to issues with embeddings model or processed documents.")
    doc_embeddings = None

  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Generated 9 embeddings.
Example embedding (first 10 dimensions): [-0.04937281832098961, -0.04599956423044205, 0.02887227199971676, 0.01174787525087595, -0.02238904871046543, 0.016524719074368477, -0.08935360610485077, 0.06632869690656662, -0.00850805826485157, 0.02113102748990059]


## Build a vector store

### Subtask:
Build a vector store (e.g., using FAISS) from the document embeddings.


**Reasoning**:
Import the necessary FAISS class and create a FAISS index from the documents and embeddings.



In [17]:
from langchain_community.vectorstores import FAISS

# Create a FAISS index from the documents and embeddings
vectorstore = None
if docs and embeddings: # Use 'docs' which contains the split documents
    try:
        vectorstore = FAISS.from_documents(docs, embeddings)
        print("FAISS vectorstore created successfully.")
    except Exception as e:
        print(f"Error creating FAISS vectorstore: {e}")
else:
    print("Skipping FAISS vectorstore creation due to missing processed documents or embeddings.")

FAISS vectorstore created successfully.


## Set up the rag chain

### Subtask:
Configure the RAG chain using Langchain, combining the vector store and the GPT-2 model for generation.


**Reasoning**:
Configure the RAG chain by creating a Langchain LLM instance from the loaded GPT-2 model and tokenizer, creating a retriever from the FAISS vector store, and instantiating RetrievalQA with the retriever and LLM.



## Implement the rag query function

### Subtask:
Create a function that takes a user query, retrieves relevant documents from the vector store, and uses the RAG chain to generate a response.


**Reasoning**:
Define a function `answer_query` that takes a query string, calls the `qa_chain` with the query, and returns the result.



In [19]:
def answer_query(query: str):
    """
    Answers a user query using the configured RAG chain.

    Args:
        query: The user's question.

    Returns:
        The response generated by the RAG chain.
    """
    result = qa_chain.invoke({"query": query})
    return result

# Example usage (optional, for testing)
# query = "What is RAG?"
# response = answer_query(query)
# print(response)

## Test the rag system

### Subtask:
Test the implemented RAG system with sample queries to ensure it is working correctly.


**Reasoning**:
Define sample queries and call the answer_query function for each, then print the results.



In [34]:
# Define sample queries
sample_queries = [
    "What is a cancer percent?",
    "Tell me about the architecture of GPT models.",
    "How do RAG systems work?",
    "What is the role of vector databases like FAISS in RAG?",
    "What is Langchain used for?"
]

# Test the RAG system with sample queries
for query in sample_queries:
    print(f"Query: {query}")
    response = answer_query(query)
    print(f"Response: {response['result']}\n")

Query: What is a cancer percent?
Response: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

similarity search in RAG systems.

how they combine retrieval and generation techniques.

framework and its components for building LLM applications.

Models and their applications.

Question: What is a cancer percent?
Helpful Answer: Cancer percent is the percentage of the overall population that will die each year. Cancer percent is a measure of all the cancers that are on the Earth, and it is a measure of human health. Cancer percent is also used by the public to calculate the health of the population, and the population is considered the healthiest part of the population. Cancer percent is also used to calculate the number of people who are physically or mentally physically unable to live their lives.

What is a cancer percent? Cancer

Query: Tell me about the architecture of GP

## Create embeddings

### Subtask:
Generate embeddings for the processed documents using a sentence transformer model.

**Reasoning**:
Generate embeddings for the processed documents using a sentence transformer model.

## Build a vector store

### Subtask:
Build a vector store (e.g., using FAISS) from the document embeddings.

**Reasoning**:
Import the necessary FAISS class and create a FAISS index from the documents and embeddings.