Login into https://aistudio.google.com/app/apikey and click get api key. This will generate the API key for Google Gemini Flash 2.0 lite. Keep the key secure and treat it as confidential

Making your first API call to Gemini model

In [None]:
import requests
import json

# Replace with your actual API key
api_key = "<Enter Key>"

# API endpoint URL
url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=" + api_key

# Request headers
headers = {
    "Content-Type": "application/json"
}

# Request data (prompt)
data = {
    "contents": [
        {
            "parts": [
                {"text": "Explain how AI works"}
            ]
        }
    ]
}

try:
  # Send POST request to Gemini API
  response = requests.post(url, headers=headers, json=data)

  # Check for successful request
  response.raise_for_status()

  # Process the JSON response
  response_json = response.json()

  # Extract and print the generated text
  generated_text = response_json.get('candidates', [{}])[0].get('content', '')
  print(generated_text)

except requests.exceptions.RequestException as e:
  print(f"An error occurred: {e}")
except (KeyError, IndexError) as e:
  print(f"Error parsing response: {e}")
  print(f"Full response: {response.text}")


In [None]:
!pip install pdfplumber
!pip install google-generativeai
!pip install langchain
!pip install faiss-cpu
!pip install -U langchain-google-genai
!pip install -U langchain-community

import pdfplumber
import google.generativeai as genai
from langchain_google_genai import GoogleGenerativeAIEmbeddings # Import GoogleGenerativeAIEmbeddings from langchain_google_genai
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_google_genai import GoogleGenerativeAI
import os


### **Installing Dependencies:**
pdfplumber: Extracts text from PDF files.
google-generativeai: Provides access to Google's Generative AI models.
langchain: A framework for working with LLMs (large language models).
faiss-cpu: A library for efficient similarity search and clustering.
langchain-google-genai: Integration of LangChain with Google Generative AI.
langchain-community: An updated community-driven LangChain package.
Importing Required Libraries:

**pdfplumber:** Reads and extracts text from PDF documents.
**google.generativeai:** Connects to Google's Generative AI models.
**GoogleGenerativeAIEmbeddings:** Generates vector embeddings from text using Google's AI.
**FAISS:** A vector store for storing and retrieving similar text chunks.
**TextLoader:** Handles loading text-based documents (not used in this script).
**RecursiveCharacterTextSplitter:** Splits text into smaller chunks for better processing.
**Document: **Represents a structured text document.
**GoogleGenerativeAI:** Utilizes Google's AI model for generating responses.
**os:** Used for setting environment variables (like API keys).

In [None]:
# Set up Gemini API Key (replace with your actual API key)
os.environ["GOOGLE_API_KEY"] = "<ENTER KEY>"

def extract_text_from_pdf(pdf_path):
    """Extracts text from a given PDF file using pdfplumber."""
    text = ""
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            text += page.extract_text() + "\n" if page.extract_text() else ""  # Avoid None values
    return text.strip()




# **Key Functionalities**
# **1. API Key Configuration**
The Google Gemini API Key is set as an environment variable using os.environ["GOOGLE_API_KEY"].
This allows secure access to Google's Generative AI models for embedding generation and text-based responses.
# **2. Extracting Text from PDFs**(extract_text_from_pdf)
Uses pdfplumber to read and extract text from a given PDF file.
Iterates through each page of the document and retrieves its text.
Handles None values to prevent errors during processing.
Returns the cleaned text as a single string.
# 3.**Processing and Vectorizing** the Extracted Text (vectorize_pdf)
Calls extract_text_from_pdf(pdf_path) to retrieve the text content of a PDF.
Prepares the extracted text for further processing and vector embedding (though vectorization is not yet implemented in this snippet).

In [None]:
def vectorize_pdf(pdf_path):
    """Processes and vectorizes the text from a PDF file."""
    text = extract_text_from_pdf(pdf_path)

# Split text into smaller chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_text(text)

    # Convert chunks into Document objects
    documents = [Document(page_content=chunk) for chunk in chunks]

    # Initialize Google Gemini Embeddings
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

    # Store in FAISS vector database
    vectorstore = FAISS.from_documents(documents, embeddings)
    return vectorstore


# **Key Components**
# **1. Splitting Text into Chunks**

Why split the text? Large documents can be difficult to process, so breaking them into smaller segments ensures better search accuracy.
RecursiveCharacterTextSplitter:
chunk_size=500: Each text chunk will have approximately 500 characters.
chunk_overlap=50: Ensures some overlap between consecutive chunks to maintain context.
# 2. **Converting Chunks into Document Objects**

Each chunk is wrapped inside a Document object, making it compatible with vectorization and retrieval models.
# 3. **Initializing Google Gemini Embeddings**

GoogleGenerativeAIEmbeddings converts text chunks into numerical vector representations.
These vectors allow the system to perform similarity searches and retrieve relevant content.
# 4. **Storing Vectors in FAISS**

FAISS (Facebook AI Similarity Search) is an efficient indexing system for fast similarity searches.
It stores document vectors, enabling quick and accurate retrieval when querying the document later.

In [None]:
def query_pdf(vectorstore, query):
    """Retrieves relevant information from the vectorstore and generates a response."""
    # Search for relevant documents
    docs = vectorstore.similarity_search(query, k=3)
    context = "\n".join([doc.page_content for doc in docs]
                        )

    # Initialize Gemini Flash 2.0 Lite model
    llm = GoogleGenerativeAI(model="gemini-1.5-flash")

    # Generate response based on context
    prompt = f"Using the following extracted information from a PDF, answer the user's question:\n\n{context}\n\nQuestion: {query}\n\nAnswer:"
    response = llm.invoke(prompt)
    return response


# **1. Retrieving Relevant Information**

similarity_search(query, k=3): Searches the FAISS database for the top 3 most relevant text chunks related to the user’s query.
Joins retrieved document chunks into a single string (context) to provide meaningful context for the LLM.
# **2. Initializing the Google Gemini Model**

Loads the Gemini 1.5 Flash model, a fast and efficient generative AI designed for real-time question answering.
# **3. Creating the Prompt for AI Response Generation**

Prompt Engineering:
Provides retrieved context from the PDF.
Clearly defines the user’s question to guide the AI model.
Ensures the model stays factually grounded in the document content.
# **4. Generating and Returning the Response**

llm.invoke(prompt): Uses the AI model to generate an answer based on the context.
Returns the AI-generated response to the user.

In [None]:
# Example usage
pdf_path = "/content/weekly-report-7.pdf"  # Provide the path to your PDF file
vectorstore = vectorize_pdf(pdf_path)

while True:
    query = input("\nAsk a question (or type 'exit' to quit): ")
    if query.lower() == "exit":
        break
    answer = query_pdf(vectorstore, query)
    print("\nResponse:", answer)

# **Key Components**
# **1. Defining the PDF Path and Vectorizing Its content**

pdf_path: Specifies the location of the PDF file to be processed.
vectorize_pdf(pdf_path): Extracts, chunks, and embeds the PDF text into the FAISS vector database for efficient retrieval.
# **2. User Input Loop for Querying the PDF**

Starts an infinite loop to continuously accept user questions.
Allows users to type queries dynamically.
Includes an exit condition to terminate the program when "exit" is entered.
# **3. Querying the Vector Database and Generating Responses**

Checks if the user input is "exit" (case insensitive) and breaks the loop if true.
Calls query_pdf(vectorstore, query) to retrieve the most relevant text chunks and generate a response using the AI model.
Prints the AI-generated response for the user.