# Colab Notebook for Processing PDFs and Answering Questions with Gemini

**Description:**

This Colab notebook demonstrates a workflow for processing a collection of PDFs, extracting text, create vectors using FAISS and usin Gemini  to answer questions about the content.

###**Set up the enviroment**

In [None]:
%pip install --upgrade --quiet  langchain-google-genai
%pip install langchain-community
%pip install pypdf
%pip install langchain
%pip install -U langchain-community faiss-cpu langchain-openai tiktoken


###**Set up google credentials**

In [2]:
#Access your Gemini API key

import google.generativeai as genai
from google.colab import userdata

gemini_api_secret_name = 'GoogleAIStudio'  # @param {type: "string"}

try:
  GOOGLE_API_KEY=userdata.get(gemini_api_secret_name)
  genai.configure(api_key=GOOGLE_API_KEY)
except userdata.SecretNotFoundError as e:
   print(f'''Secret not found\n\nThis expects you to create a secret named {gemini_api_secret_name} in Colab\n\nVisit https://makersuite.google.com/app/apikey to create an API key\n\nStore that in the secrets section on the left side of the notebook (key icon)\n\nName the secret {gemini_api_secret_name}''')
   raise e
except userdata.NotebookAccessError as e:
  print(f'''You need to grant this notebook access to the {gemini_api_secret_name} secret in order for the notebook to access Gemini on your behalf.''')
  raise e
except Exception as e:
  # unknown error
  print(f"There was an unknown error. Ensure you have a secret {gemini_api_secret_name} stored in Colab and it's a valid key from https://makersuite.google.com/app/apikey")
  raise e


### **Get all text found as pdfs into a drive folder**

Create a folder and copy and paste the path below:

In [None]:
# Define the folder path containing the PDFs
folder_path = "/content/drive/MyDrive/books_pdfs" # @param {type: "string"}

In [3]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Function to extract text from individual PDF
def get_pdf_text(filepath):
  loader = PyPDFLoader(filepath)
  pdf_pages = loader.load_and_split()
  text = ""
  for page in pdf_pages:
    text += page.page_content + "\n"
  return text

# Function to process all PDFs in the folder
def process_all_pdfs():
  all_text = ""
  for filename in os.listdir(folder_path):
    if filename.endswith(".pdf"):
      filepath = os.path.join(folder_path, filename)
      text = get_pdf_text(filepath)
      all_text += f"--- Text from {filename} ---\n{text}\n"
  return all_text

def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=250)
    chunks = text_splitter.split_text(text)
    return chunks


# Call the processing function
all_text = process_all_pdfs()
chunks = get_text_chunks(all_text)


### **Create a FAISS vector store locally**

In [5]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from langchain_community.vectorstores import FAISS

def get_vector_store(text_chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=GOOGLE_API_KEY)
    db = FAISS.from_texts(chunks,embeddings)
    db.save_local("faiss_index")

get_vector_store(chunks)

### **Create a conversation chain with gemini**

In [25]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate

def get_conversational_chain():

    prompt_template = """
    Answer the question as detailed as possible from the provided context,
    making sure to provide all the details. If the answer is not available
    in the context, just say "answer is not available in the context,"
    and do not provide the wrong answer.

    Context:

    {context}

    Question:

    {question}

    Answer:
    """

    model = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.3, google_api_key=GOOGLE_API_KEY)

    prompt = PromptTemplate(template = prompt_template, input_variables = ["context", "question"])
    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)

    return chain



In [31]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate

user_input = "What is PEFT?" # @param {type: "string"}
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=GOOGLE_API_KEY)
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(user_input)

# Prepare the input dictionary with context and question
input_data = {"input_documents": docs , "question": user_input}

chain = get_conversational_chain()

# Process the input with the chain and retrieve the response
response = chain(input_data, return_only_outputs=True)


print(response['output_text'])




    Parameter-efficient fine-tuning (PEFT) is a set of techniques that allow you to fine-tune your models while utilizing less compute resources. It involves freezing the parameters of the pre-trained model and fine-tuning a smaller set of parameters.
