<a href="https://colab.research.google.com/github/shirazkk/Generative_Ai_Projects/blob/main/Project_2_LangChain_RAG_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **LangChain RAG with Google Gemini Flash and Pinecone**

## **Overview**
This project implements a **Retrieval-Augmented Generation (RAG)** system using **LangChain**, **Google Gemini Flash**, and **Pinecone**. The system retrieves relevant information from a document store and generates contextual responses using an AI model. The key components include:
1. Storing vectorized documents in **Pinecone**.
2. Using **Google Gemini Flash** to generate embeddings and responses.
3. Integrating **LangChain** to build the RAG workflow.

## **Prerequisites**
Before you begin, ensure you have the following:

1. **Python 3.8+** installed.
2. **API keys** for:
   - Google Gemini Flash (Gemini-1.5-flash)
   - Pinecone

3. Install the following Python libraries:
  

---



In [27]:
!pip install langchain pinecone-client google-generativeai python-dotenv



## **Step 1: Set Up Environment Variables**
To securely manage API keys, store them in a `.env` file in your project directory. The `.env` file should look like this:

```
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
GOOGLE_API_KEY=your_google_api_key
```

Next, use the following script to load these environment variables:




In [28]:
import os
from dotenv import load_dotenv

load_dotenv()
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.getenv("PINECONE_ENVIRONMENT")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")


## **Step 2: Initialize Pinecone**
Pinecone is used to store and retrieve vectorized documents. The following code sets up Pinecone and creates an index for storing your documents:

In [29]:
import pinecone
import os

# Initialize Pinecone using Pinecone class
pinecone_client = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENVIRONMENT"])

# Create or connect to an index using the pinecone_client instance
index_name = "new-rag-project"
if index_name not in pinecone_client.list_indexes().names(): # Use pinecone_client to access methods
    pinecone_client.create_index(index_name, dimension=768)  # Adjust based on embedding size

index = pinecone_client.Index(index_name) # Use pinecone_client to access methods

## **Step 3: Use LangChain for RAG Workflow**

### **1. Set Up Embedding Model**
To convert your documents into vector embeddings, we use the Google Generative AI Embedding Model (embedding-001). This model creates numerical vector representations of text, which are later stored in Pinecone for efficient retrieval.


In [39]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    api_key=GOOGLE_API_KEY
)

### **2. Set Up Document Loader**
Load and preprocess the documents you wish to index.

In [43]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load documents
loader = TextLoader("/content/schedule of web development.txt")
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

### **3. Embed and Store Documents in Pinecone**
Now, embed the document chunks and store them in Pinecone:





In [44]:
from tqdm import tqdm

# Create embeddings and upload to Pinecone
for doc in tqdm(docs):
    # Generate the vector (embedding)
    vector = embeddings.embed_query(doc.page_content)

    # Ensure metadata is a dictionary
    metadata = {
    "text": doc.page_content,  # The actual content to be embedded
    "source": doc.metadata["source"]  # Metadata like source
    }

    # Upsert the document into Pinecone
    index.upsert([(doc.metadata["source"], vector, metadata)])



100%|██████████| 10/10 [00:03<00:00,  3.20it/s]


### **4. Set Up Retriever**
Now, set up the Pinecone retriever using LangChain:


In [45]:
from langchain.vectorstores import Pinecone

# Initialize Pinecone with index, text_key, and the Embeddings object
retriever = Pinecone(index, embedding=embeddings, text_key="text")

# Use as_retriever (no need for search_kwargs now)
retriever = retriever.as_retriever()

## **Step 4: Set Up Google Gemini Flash Chat Model**

To generate responses, we use the `gemini-1.5-flash ` model from Google Gemini Flash. This chat model processes user queries and combines them with context retrieved from Pinecone to generate relevant and human-like responses.


In [46]:
# Set up Google chat model
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI

# Access the API key using os.environ
google_gemini = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",  # Specify the model name, e.g., 'gemini-1.5-flash'
    api_key=GOOGLE_API_KEY, #Use the environment variable for the API key
    temperature=0.8,  # controls the randomness of the responses generated by the model.
)



## **Step 5: Combine Retriever and LLM**

Integrate the **retriever** and **Google Gemini Flash** using LangChain’s **RetrievalQA** chain:

*   Retriever: Finds the most relevant document chunks based on the query.

*  LLM `(gemini-1.5-flash):` Uses the query and retrieved documents to generate a context-aware response.




In [47]:
from langchain.chains import RetrievalQA
from langchain.vectorstores.pinecone import Pinecone
from langchain.chains.retrieval_qa.base import BaseRetrievalQA

# Setup RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=google_gemini,
    chain_type="stuff",
    retriever=retriever  # Use as_retriever() to get a BaseRetriever
)


## **Step 6: Query the RAG System**

Test the system by asking a question, and the RAG pipeline will retrieve relevant documents and generate a response:


In [48]:
query = "give me intro of html"
response = qa_chain.run(query)

print("Question:", query)
print("Response:", response)

Question: give me intro of html
Response: HTML stands for HyperText Markup Language.  It's the foundation of all web pages.  You use HTML tags to structure content like headings, paragraphs, images, and links.  Browsers interpret these tags to display the content correctly.


### **Conclusion**
This project demonstrates how to create a **Retrieval-Augmented Generation (RAG)** system using **LangChain**, **Google Gemini Flash**, and **Pinecone**. The system efficiently retrieves relevant documents from Pinecone based on a user's query and uses a `gemini-1.5-flash` model to generate contextual responses.

By following the steps outlined in this documentation, you can set up a similar RAG system for your use cases and integrate it with various models and data sources.