<a href="https://colab.research.google.com/github/kdhaw6/LLM-RAG-based-system/blob/main/Capstone_Project_LLM_RAG_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this capstone project, we will embark on an exciting journey to create an Industry-Specific Large Language Model (LLM) Bot using state-of-the-art pre-trained models from sources like Hugging Face/Gemini model. The primary objective is to build an intelligent bot that can effectively engage with users by answering questions and providing insights specific to a chosen industry. This project will not only enhance your technical skills but also provide a deep understanding of the chosen industry's nuances, challenges, and trends.

Industry chosen is Technology and Information Technology (IT) but since it is RAG based system we can use PDF from any industry

Installing the Libraries

In [2]:
!pip install streamlit PyPDF2 langchain langchain_google_genai faiss-cpu google-generativeai python-dotenv langchain_community

Collecting streamlit
  Downloading streamlit-1.44.1-py3-none-any.whl.metadata (8.9 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting langchain_google_genai
  Downloading langchain_google_genai-2.1.2-py3-none-any.whl.metadata (4.7 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.21-py3-none-any.whl.metadata (2.4 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_goo

In [3]:
!pip install -U google-generativeai langchain-google-genai

Collecting langchain-google-genai
  Using cached langchain_google_genai-2.1.2-py3-none-any.whl.metadata (4.7 kB)
INFO: pip is looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
  Using cached langchain_google_genai-2.1.1-py3-none-any.whl.metadata (4.7 kB)
  Using cached langchain_google_genai-2.1.0-py3-none-any.whl.metadata (3.6 kB)
  Using cached langchain_google_genai-2.0.11-py3-none-any.whl.metadata (3.6 kB)


Importing all the required libraries

In [4]:
import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import google.generativeai as genai
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from google.colab import files
import tempfile

Using Gemini API Key to use google Gemini model

In [7]:
#Run this using you own key. Key changed to avoid any misuse
GEMINI_API_KEY = 'asfasfdsfasfc'

In [8]:
# For Google Colab, we'll use a direct API key input instead of loading from .env
api_key = 'AIzadsfsdfdsfsdf'
os.environ["GOOGLE_API_KEY"] = api_key
genai.configure(api_key=api_key)

In [9]:
#converting PDF into Text
def get_pdf_text(pdf_files):
    text = ""
    for pdf_file in pdf_files:
        pdf_reader = PdfReader(pdf_file)
        for page in pdf_reader.pages:
            text += page.extract_text()
    return text

#converting text into smaller chunks
def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
    chunks = text_splitter.split_text(text)
    return chunks

#converting text into vectors and storing them on local
def get_vector_store(text_chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
    vector_store.save_local("faiss_index")
    return vector_store

#giving a promt to model, advising a way for a model to behave
def get_conversational_chain():
    prompt_template = """
    Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
    provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
    Context:\n {context}?\n
    Question: \n{question}\n

    Answer:
    """

    model = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest", temperature=0.3)
    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
    return chain
#taking question as in put and giving an output
def process_question(user_question, vector_store):
    docs = vector_store.similarity_search(user_question)
    chain = get_conversational_chain()

    response = chain(
        {"input_documents": docs, "question": user_question},
        return_only_outputs=True
    )

    print(response["output_text"])
    return response["output_text"]

# For Colab, we'll use a simpler approach instead of Streamlit
print("PDF Chat with Gemini")
print("Upload your PDF files:")

uploaded = files.upload()
pdf_files = []

# Process uploaded files
for filename, content in uploaded.items():
    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
        temp_file.write(content)
        pdf_files.append(temp_file.name)

# Process PDFs
if pdf_files:
    print("Processing PDFs...")
    raw_text = get_pdf_text(pdf_files)
    text_chunks = get_text_chunks(raw_text)
    vector_store = get_vector_store(text_chunks)
    print("Processing complete! You can now ask questions.")

    # Simple question loop
    while True:
        question = input("\nAsk a question (or type 'exit' to quit): ")
        if question.lower() == 'exit':
            break
        answer = process_question(question, vector_store)
        print(f"Answer: {answer}")
else:
    print("No PDF files were uploaded.")

# Clean up temporary files
for file_path in pdf_files:
    if os.path.exists(file_path):
        os.unlink(file_path)

PDF Chat with Gemini
Upload your PDF files:


Saving NIPS-2017-attention-is-all-you-need-Paper.pdf to NIPS-2017-attention-is-all-you-need-Paper.pdf
Processing PDFs...
Processing complete! You can now ask questions.

Ask a question (or type 'exit' to quit): what is attention and why is it required


stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
  response = chain(


Attention is a function that maps a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors.  The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. It is required because it allows modeling of dependencies without regard to their distance in the input or output sequences.  In the Transformer model, attention entirely replaces the recurrent layers typically used in encoder-decoder architectures.
Answer: Attention is a function that maps a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors.  The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. It is required because it allows modeling of dependencies without regard to their distance in the 

In [None]:
import google.generativeai as genai

# After configuring your API key
for model in genai.list_models():
    if "gemini" in model.name.lower():
        print(model.name)

models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.5-pro-exp-03-25
models/gemini-2.5-pro-preview-03-25
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thinking-exp-01-21
models/gemini-2.0-flash-thinking-exp
models/gemini-2.0-flash-thinki