<a href="https://colab.research.google.com/github/lakshanravi/LangChain_RAG-Application/blob/main/02_simple_rag_langchain_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Simple RAG Application** using LangChain and Gemini

In [1]:
# Install the necessary packages
!pip install langchain langchain-chroma -qU
!pip install langchain-google-genai -qU
!pip install langchain-chroma -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.0/52.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.7/111.7 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.4/21.4 MB[0m [31m70.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m500.1/500.1 kB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.1/158.1 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m65.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.1/17.1 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Import necessary libraries
import os
from google.colab import userdata

### Initialize Google LLM

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Set OpenAI API key
os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')

# Initialize the ChatOpenAI model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-lite",
    temperature=0
)

### Initialize Embedding Model

GoogleGenerativeAIEmbeddings class from the langchain-google-genai package

In [9]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embedding_model = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

### Create and Embed Documents

In [10]:
# using langchain Document, we are passing some data to that document object. currently we didn't load some docs, excell sheets for this.
from langchain_core.documents import Document

# Define a list of documents with content and metadata
documents = [
    Document(
        page_content="The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide.India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign.The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South Africa.",
        metadata={"source": "cricket news"},
    ),
    Document(
        page_content="The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally.In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain.Both ties promise thrilling encounters, featuring some of the best talents in world football.",
        metadata={"source": "football news"},
    ),
    Document(
        page_content="As election season heats up, the latest developments reveal a highly competitive atmosphere across several key races.The presidential election has seen intense campaigning from all major candidates, with recent polls indicating a tight race.Incumbent President Jane Doe is seeking re-election on a platform of economic stability and healthcare reform, while her main rival, Senator John Smith, focuses on education and climate change initiatives.",
        metadata={"source": "election news"},
    ),
    Document(
        page_content="The AI revolution continues to transform industries and reshape the global economy.Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs.Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.",
        metadata={"source": "ai revolution news"},
    ),
]

In [11]:
# Create a vector store using the documents and embedding model. here we use chroma db to save those data.Now saved all data to the DB.
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents,
    embedding=embedding_model,
)

### Perform Similarity Search

Using vector store, retrieve some info, based on similarity of search text.

In [12]:
results = vectorstore.similarity_search("test match")

for result in results:
  print("------------------------")
  print(result.page_content)
  print(result.metadata)

------------------------
The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide.India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign.The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South Africa.
{'source': 'cricket news'}
------------------------
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally.In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain.Both ties promise thrilling encounters, featuring some of the best talents in world football.
{'source'

In [15]:
results = vectorstore.similarity_search("machine learning")

for result in results:
  print("------------------------")
  print(result.page_content)
  print(result.metadata)

  # Now the AI revolution news comes first

------------------------
The AI revolution continues to transform industries and reshape the global economy.Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs.Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}
------------------------
As election season heats up, the latest developments reveal a highly competitive atmosphere across several key races.The presidential election has seen intense campaigning from all major candidates, with recent polls indicating a tight race.Incumbent President Jane Doe is seeking re-election on a platform of economic stability and healthcare reform, while her main rival, Senator John Smith, focuses on education and climate change initiatives.
{'source': 'election news'}
------------------------
The T20 World Cup 2024 is in full swing, bringi

### Embed Query and Perform Similarity Search by Vector

We can search through the embedding also.Through this vectorstore.similarity_search_by_vecto

In [22]:
# Embed a query using the embedding model
query_embedding = embedding_model.embed_query("football ")

query_embedding[:10]

[-0.029430116,
 0.0044759414,
 0.016414464,
 -0.027015027,
 -0.00094792974,
 -0.0005082724,
 0.015635641,
 -0.020168222,
 0.010700943,
 0.0015359898]

In [23]:
# Print the length of the query embedding. ALL text(query) given by us, reperesent  numerical values
len(query_embedding)

3072

In [21]:
results = vectorstore.similarity_search_by_vector(query_embedding)

# similarity_search_by_vector now we used this method

for result in results:
  print("------------------------")
  print(result.page_content)
  print(result.metadata)

------------------------
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally.In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain.Both ties promise thrilling encounters, featuring some of the best talents in world football.
{'source': 'football news'}
------------------------
The AI revolution continues to transform industries and reshape the global economy.Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs.Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}
------------------------
The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans 

### Create Retriever

In [29]:
# Create a retriever from the vector
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2},
)
# k means number of results we need

# Perform batch retrieval using the retriever. used batch of user queries
batch_results = retriever.batch(["machine learning", "test match"])

for result in batch_results:
  print("------------------------")
  for doc in result:
    print(doc.page_content)
    print(doc.metadata)

------------------------
The AI revolution continues to transform industries and reshape the global economy.Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs.Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}
As election season heats up, the latest developments reveal a highly competitive atmosphere across several key races.The presidential election has seen intense campaigning from all major candidates, with recent polls indicating a tight race.Incumbent President Jane Doe is seeking re-election on a platform of economic stability and healthcare reform, while her main rival, Senator John Smith, focuses on education and climate change initiatives.
{'source': 'election news'}
------------------------
The T20 World Cup 2024 is in full swing, bringing excitement and drama t

### Create Prompt Template

In [30]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Define a message template for the chatbot
message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

# Create a chat prompt template from the message
prompt = ChatPromptTemplate.from_messages([("human", message)])

### Chain Retriever and Prompt Template with LLM

In [27]:
chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm

In [31]:
response = chain.invoke("current state of 2024 t20 world cup")
# above user text is the runnablepassthrogh value
print(response.content)

The T20 World Cup 2024 is currently in full swing, generating excitement and drama for cricket fans globally. India, led by captain Rohit Sharma, is gearing up for an important match against Ireland, with Jasprit Bumrah anticipated to be a key player. The tournament has already faced controversy regarding the pitch conditions at Nassau County International Cricket Stadium in New York, which received criticism after a low-scoring match between Sri Lanka and South Africa.
