##### Copyright 2025 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Question Answering using LangChain and Pinecone

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Pinecone_WebLoad.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

## Overview

[Gemini](https://ai.google.dev/models/gemini) is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input.

[LangChain](https://www.langchain.com/) is a data framework designed to make integration of Large Language Models (LLM) like Gemini easier for applications.

[Pinecone](https://www.pinecone.io/) is a cloud-first vector database that allows users to search across billions of embeddings with ultra-low query latency.

In this notebook, you'll learn how to create an application that answers questions using data from a website with the help of Gemini, LangChain, and Pinecone.

## Setup

First, you must install the packages and set the necessary environment variables.

### Installation

Install LangChain's Python library, `langchain` and LangChain's integration package for Gemini, `langchain-google-genai`. Next, install LangChain's integration package for the new version of Pinecone, `langchain-pinecone` and the `pinecone-client`, which is Pinecone's Python SDK. Finally, install `langchain-community` to access the `WebBaseLoader` module later.

In [None]:
%pip install --quiet langchain-core
%pip install --quiet langchain
%pip install --quiet langchain-google-genai
%pip install --quiet -U langchain-community
%pip install --quiet pinecone

## Configure your API key

To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.


In [1]:
import os
GOOGLE_API_KEY=os.environ.get('GOOGLE_API_KEY')
# COHERE_API_KEY=os.environ.get('COHERE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
# os.environ["COHERE_API_KEY"] = COHERE_API_KEY

### Setup Pinecone

To use Pinecone in your application, you must have an API key. To create an API key you have to set up a Pinecone account. Visit [Pinecone's app page](https://app.pinecone.io/), and Sign up/Log in to your account. Then navigate to the "API Keys" section and copy your API key.

For more detailed instructions on getting the API key, you can read Pinecone's [Quickstart documentation](https://docs.pinecone.io/docs/quickstart#2-get-your-api-key).

Set the environment variable `PINECONE_API_KEY` to configure Pinecone to use your API key.


In [2]:
PINECONE_API_KEY=os.environ.get('PINECONE_API_KEY')

## Basic steps
LLMs are trained offline on a large corpus of public data. Hence they cannot answer questions based on custom or private data accurately without additional context.

If you want to make use of LLMs to answer questions based on private data, you have to provide the relevant documents as context alongside your prompt. This approach is called Retrieval Augmented Generation (RAG).

You will use this approach to create a question-answering assistant using the Gemini text model integrated through LangChain. The assistant is expected to answer questions about Gemini model. To make this possible you will add more context to the assistant using data from a website.

In this tutorial, you'll implement the two main components in an RAG-based architecture:

1. Retriever

    Based on the user's query, the retriever retrieves relevant snippets that add context from the document. In this tutorial, the document is the website data.
    The relevant snippets are passed as context to the next stage - "Generator".

2. Generator

    The relevant snippets from the website data are passed to the LLM along with the user's query to generate accurate answers.

You'll learn more about these stages in the upcoming sections while implementing the application.

## Import the required libraries

In [3]:
from langchain import hub
from langchain import PromptTemplate
from langchain.docstore.document import Document
from langchain.document_loaders import WebBaseLoader
from langchain.schema import StrOutputParser
from langchain.schema.prompt_template import format_document
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_pinecone import PineconeVectorStore

from pinecone import PodSpec

USER_AGENT environment variable not set, consider setting it to identify your requests.
  from .autonotebook import tqdm as notebook_tqdm


## Retriever

In this stage, you will perform the following steps:

1. Read and parse the website data using LangChain.

2. Create embeddings of the website data.

    Embeddings are numerical representations (vectors) of text. Hence, text with similar meaning will have similar embedding vectors. You'll make use of Gemini's embedding model to create the embedding vectors of the website data.

3. Store the embeddings in Pinecone's vector store.
    
    Pinecone is a vector database. The Pinecone vector store helps in the efficient retrieval of similar vectors. Thus, for adding context to the prompt for the LLM, relevant embeddings of the text matching the user's question can be retrieved easily using Pinecone.

4. Create a Retriever from the Pinecone vector store.

    The retriever will be used to pass relevant website embeddings to the LLM along with user queries.

### Read and parse the website data

LangChain provides a wide variety of document loaders. To read the website data as a document, you will use the `WebBaseLoader` from LangChain.

To know more about how to read and parse input data from different sources using the document loaders of LangChain, read LangChain's [document loaders guide](https://python.langchain.com/docs/integrations/document_loaders).

In [4]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-exp-03-07")

### Store the data using Pinecone


To create a Pinecone vector database, first, you have to initialize your Pinecone client connection using the API key you set previously.

In Pinecone, vector embeddings have to be stored in indexes. An index represents the vector data's top-level organizational unit. The vectors in any index must have the same dimensionality and distance metric for calculating similarity. You can read more about indexes in [Pinecone's Indexes documentation](https://docs.pinecone.io/docs/indexes).

First, you'll create an index using Pinecone's `create_index` function. Pinecone allows you to create two types of indexes, Serverless indexes and Pod-based indexes. Pinecone's free starter plan lets you create only one project and one pod-based starter index with sufficient resources to support 100,000 vectors. For this tutorial, you have to create a pod-based starter index. To know more about different indexes and how they can be created, read Pinecone's [create indexes guide](https://docs.pinecone.io/docs/new-api#creating-indexes).


Next, you'll insert the documents you extracted earlier from the website data into the newly created index using LangChain's `Pinecone.from_documents`. Under the hood, this function creates embeddings from the documents created by the document loader of LangChain using any specified embedding model and inserts them into the specified index in a Pinecone vector database.  

You have to specify the `docs` you created from the website data using LangChain's `WebBasedLoader` and the `gemini_embeddings` as the embedding model when invoking the `from_documents` function to create the vector database from the website data.

In [13]:
from pinecone import Pinecone as pc

# Initialize Pinecone client
pine_client = pc(api_key="pcsk_3NGhRC_2Eg2DzYQKdYXBv1ReHB6EYoBjsKCoCzm5BJe1HeKH8LRbm3CdL6h6bmasJna9vo")
index_name = "seerat-ul-mahdi"
index = pine_client.Index(index_name, "https://seerat-ul-mahdi-vm3wi2f.svc.aped-4627-b74a.pinecone.io")
vectorstore = PineconeVectorStore(
    index=index,
    embedding=gemini_embeddings
)


In [43]:
from time import sleep

book_id = "1342-12463"

for i in range(8, 437):
    url = f"https://new.alislam.org/api/books/text?id={book_id}&pages={i}"
    response = requests.get(url)
    text = response.json()[0]["content"]
    page_number = response.json()[0]["printPageNum"]
    sleep(1)
    vectorstore.add_documents([Document(page_content=text, metadata={"chunk_index": i, "volume": 1, "title": "seerat-ul-mahdi", "page": page_number})])


### Create a retriever using Pinecone

You'll now create a retriever that can retrieve website data embeddings from the newly created Pinecone vector store. This retriever can be later used to pass embeddings that provide more context to the LLM for answering user's queries.

Invoke the `as_retriever` function of the vector store you initialized in the last step, to create a retriever.

In [44]:

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
# Check if the retriever is working by trying to fetch the relevant docs related
# to the word 'MMLU'(Massive Multitask Language Understanding). If the length is
# greater than zero, it means that the retriever is functioning well.
print(len(retriever.invoke("mufti muhammad sadiq")))

3


## Generator

The Generator prompts the LLM for an answer when the user asks a question. The retriever you created in the previous stage from the Pinecone vector store will be used to pass relevant embeddings from the website data to the LLM to provide more context to the user's query.

You'll perform the following steps in this stage:

1. Chain together the following:
    * A prompt for extracting the relevant embeddings using the retriever.
    * A prompt for answering any question using LangChain.
    * An LLM model from Gemini for prompting.
    
2. Run the created chain with a question as input to prompt the model for an answer.


### Initialize Gemini

You must import `ChatGoogleGenerativeAI` from LangChain to initialize your model.
 In this example, you will use **gemini-2.0-flash**, as it supports text summarization. To know more about the text model, read Google AI's [language documentation](https://ai.google.dev/models/gemini).

You can configure the model parameters such as ***temperature*** or ***top_p***,  by passing the appropriate values when initializing the `ChatGoogleGenerativeAI` LLM.  To learn more about the parameters and their uses, read Google AI's [concepts guide](https://ai.google.dev/docs/concepts#model_parameters).

In [45]:
from langchain_google_genai import ChatGoogleGenerativeAI

# To configure model parameters use the `generation_config` parameter.
# eg. generation_config = {"temperature": 0.7, "topP": 0.8, "topK": 40}
# If you only want to set a custom temperature for the model use the
# "temperature" parameter directly.

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-05-20")

### Create prompt templates

You'll use LangChain's [PromptTemplate](https://python.langchain.com/docs/how_to/#prompt-templates) to generate prompts to the LLM for answering questions.

In the `llm_prompt`, the variable `question` will be replaced later by the input question, and the variable `context` will be replaced by the relevant text from the website retrieved from the Pinecone vector store.

In [92]:
# Prompt template to query Gemini
llm_prompt_template = """You are an Ahmadi scholar who writes counter arguments against Non-Ahmadi scholars.
Use the following books of the Promised Messiah A.S to write counter arguments.
Add references to the sources of the argument for example: Ruhani Khazain Vol. X Pg. X

Question: {question}
Books: {book}
Answer:"""

llm_prompt = PromptTemplate.from_template(llm_prompt_template)

print(llm_prompt)

input_variables=['book', 'question'] input_types={} partial_variables={} template='You are an Ahmadi scholar who writes counter arguments against Non-Ahmadi scholars.\nUse the following books to write arguments.\nAdd reference to the sources of the argument.\n\nQuestion: {question}\nBooks: {book}\nAnswer:'


In [60]:
# Prompt template to query Gemini
llm_prompt_template = """You are an Ahmadi scholar who answer general questions.
Use the following books of to answer questions.
Add references to the sources of the answer for example: Ruhani Khazain Vol. X Pg. X, Seerat ul Mahdi Vol. XPg. X

Question: {question}
Books: {book}
Answer:"""

llm_prompt = PromptTemplate.from_template(llm_prompt_template)

print(llm_prompt)

input_variables=['book', 'question'] input_types={} partial_variables={} template='You are an Ahmadi scholar who answer general questions.\nUse the following books of to answer questions.\nAdd references to the sources of the answer for example: Ruhani Khazain Vol. X Pg. X, Seerat ul Mahdi Pg. X\n\nQuestion: {question}\nBooks: {book}\nAnswer:'


### Create a stuff documents chain

LangChain provides [Chains](https://python.langchain.com/docs/modules/chains/) for chaining together LLMs with each other or other components for complex applications. You will create a **stuff documents chain** for this application. A stuff documents chain lets you combine all the relevant documents, insert them into the prompt, and pass that prompt to the LLM.

You can create a stuff documents chain using the [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language).

To learn more about different types of document chains, read LangChain's [chains guide](https://python.langchain.com/docs/modules/chains/document/).

The stuff documents chain for this application retrieves the relevant website data and passes it as the context to an LLM prompt along with the input question.

In [53]:
# Combine data from documents to readable string format.
def format_docs(docs):
    print(docs)
    return "\n\n".join(doc.page_content for doc in docs)

# Create stuff documents chain using LCEL.
# This is called a chain because you are chaining
# together different elements with the LLM.
# In the following example, to create a stuff chain,
# you will combine content, prompt, LLM model, and
# output parser together like a chain using LCEL.
#
# The chain implements the following pipeline:
# 1. Extract data from documents and save to the variable `context`.
# 2. Use the `RunnablePassthrough` option to provide question during invoke.
# 3. The `context` and `question` are then passed to the prompt and
#    input variables in the prompt are populated.
# 4. The prompt is then passed to the LLM (`gemini-2.0-flash`).
# 5. Output from the LLM is passed through an output parser
#    to structure the model response.
rag_chain = (
    {"book": retriever | format_docs, "question": RunnablePassthrough()}
    | llm_prompt
    | llm
    | StrOutputParser()
)

### Prompt the model

You can now query the LLM by passing any question to the `invoke()` function of the stuff documents chain you created previously.

In [110]:
rag_chain.invoke("""*Argument (Made by Muhammad Imtiaz against MGA): QURAAN IS FULL OF SWEARS
Replying to the allegations of using bad language against hindu, christians and muslim opponents, Mirza Ghulam Qadiani says if his language is considered as the abusive language then all the Quraan is full of swears. (NAUZUBILLAH Min Zalik.)
Give a response.""")

[Document(id='3ea5823b-757e-4b00-91eb-98ed6a319ecd', metadata={'chunk_index': 9.0, 'page': 110.0, 'title': 'اِزالہ اوھام', 'volume': '۳'}, page_content='روحانی خزائن ۔ کمپیوٹرائزڈوہ الفاظ بیان نہیں فرمائے جو اس عاجز نے بزعم ان کے اپنی تالیفات میں استعمال کئے ہیں اور درحقیقت سب و شتم میں داخل ہیں۔میں سچ سچ کہتا ہوں کہ جہاں تک مجھے معلوم ہے میں نے ایک لفظ بھی ایسا استعمال نہیں کیا جس کو دشنام دہی کہا جائے بڑے دھوکہ کی با ت یہ ہے کہ اکثر لوگ دشنام دہی اور بیان واقعہ کو ایک ہی صورت میں سمجھ لیتے ہیں اور ان دونوں مختلف مفہوموں میں فرق کرنا نہیں جانتے بلکہ ایسی ہریک بات کو جو دراصل ایک واقعی امر کا اظہار ہو اور اپنے محل پر چسپاں ہو محض اس کی کسی قدر مرارت کی وجہ سے جو حق گوئی کے لازم حال ہوا کرتی ہے دشنام ہی تصور کر لیتے ہیں حالانکہ دشنام اور سب اور شتم فقط اس مفہوم کا نام ہے جو خلاف واقعہ اور دروغ کے طور پر محض آزار رسانی کی غرض سے استعمال کیا جائے اور اگر ہریک سخت اور آزار دہ تقریر کو محض بوجہ اس کے مرارت اور تلخی اور ایذارسانی کے دشنام کے مفہوم میں داخل کر سکتے ہیں تو پھر اقرار کرنا پڑے گ

'The allegation made by Muhammad Imtiaz against the Promised Messiah, Hazrat Mirza Ghulam Ahmad (peace be upon him), regarding the use of "bad language" and the claim that he stated "all the Quran is full of swears," is a grave misrepresentation and a distortion of his words.\n\nThe Promised Messiah (peace be upon him) categorically denied using any language that could be termed as \'abusive\' or \'swearing\' (dushnam-dehi). He clarified that there is a fundamental difference between true abuse and a statement of fact, which many people mistakenly conflate.\n\nHe states:\n"I truly say that as far as I know, I have not used a single word that can be called abusive. It is a great deception that most people consider abusive language and a statement of fact to be the same, and they do not know how to differentiate between these two distinct concepts. Rather, they consider every such statement, which is actually an expression of a factual matter and relevant to its context, as abuse merely 

In [59]:
rag_chain.invoke("surkh qatro ka kya waqya hai")

[Document(id='8d7d1670-a12c-4a35-b2ef-7ee4a9fcf7e8', metadata={'chunk_index': 79.0, 'page': '73', 'title': 'seerat-ul-mahdi', 'volume': 1.0}, page_content='سیرت المہدی\n73\nحصہ اوّل\nکہنی آپ نے سر کے نیچے بطور تکیہ کے رکھ لی اور دوسری اسی صورت میں سر کے اوپر ڈھانک لی۔میں پاؤں\nدبانے بیٹھ گیا۔وہ رمضان\nکا مہینہ تھا اور ستائیس تاریخ تھی اور جمعہ کا دن تھا اس لئے میں دل میں بہت\nمسرور تھا کہ میرے لئے ایسے مبارک موقعے جمع ہیں۔یعنی حضرت صاحب جیسے مبارک انسان کی خدمت\nکر رہا ہوں وقت فجر کا ہے جو مبارک وقت ہے مہینہ رمضان کا ہے جو مبارک مہینہ ہے۔تاریخ ستائیس اور\nجمعہ کا دن ہے اور گزشتہ شب شب قدر تھی کیونکہ میں نے حضرت صاحب سے سنا ہوا تھا کہ جب رمضان کی\nستائیس تاریخ اور جمعہ مل جاویں تو وہ رات یقیناً شب قدر ہوتی ہے۔میں انہی باتوں کا خیال کر کے دل میں\nمسرور ہو رہا تھا کہ حضرت صاحب کا بدن یکلخت کانپا اور اس کے بعد حضور نے آہستہ سے اپنے اوپر کی کہنی\nذرا ہٹا کر میری طرف دیکھا اس وقت میں نے دیکھا کہ حضرت صاحب کی آنکھوں میں آنسو بھرے ہوئے\nتھے اس کے بعد آپ نے پھر اسی طرح اپنی کہنی رکھ لی۔میں دبات

'**سرخ قطروں کا واقعہ (The Incident of Red Drops)**\n\nسرخ قطروں کا واقعہ، حضرت اقدس مسیح موعود علیہ السلام کی زندگی کا ایک نہایت اہم اور روحانی واقعہ ہے جو آپ کے کشوف اور الہامات کی صداقت اور ان کے ظاہری وجود میں ظاہر ہونے کا ایک بین ثبوت ہے۔ اس واقعہ کی تفصیل میاں عبداللہ صاحب سنوری رضی اللہ عنہ کی روایت سے سیرت المہدی میں بیان کی گئی ہے۔\n\n**واقعہ کی تفصیل:**\n\nیہ واقعہ رمضان المبارک کی ستائیسویں شب (جو شب قدر بھی ہو سکتی تھی) کو فجر کے وقت پیش آیا، جب میاں عبداللہ صاحب سنوری رضی اللہ عنہ حضرت مسیح موعود علیہ السلام کے پاؤں دبا رہے تھے۔\n\n1.  **سرخ قطروں کا مشاہدہ:** میاں عبداللہ صاحب نے سب سے پہلے حضرت صاحب کے پاؤں پر ٹخنے کے نیچے ایک سرخ قطرہ دیکھا جو تازہ گرے ہونے کی وجہ سے بستہ تھا۔ انہوں نے اسے انگلی سے لگا کر سونگھا لیکن اس میں کوئی خوشبو نہیں تھی۔ بعد ازاں، انہوں نے حضرت صاحب کے کرتے پر پسلیوں کے پاس ایک اور بڑا سرخ قطرہ دیکھا۔ انہیں حیرانی ہوئی کہ یہ سرخی کہاں سے آئی ہے، اور انہوں نے چھت وغیرہ میں تلاش کی لیکن کہیں کوئی خارجی ذریعہ نظر نہ آیا۔\n\n2.  **حضرت مسیح موعود علی