<a href="https://colab.research.google.com/github/sahilmehta1205/Stack-Story/blob/main/LangChain_Chroma_Chatbot_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 LangChain + Chroma Chatbot Workshop
This notebook demonstrates how to build a **retrieval-based chatbot** using:

- **LangChain**: Framework for connecting LLMs to tools & data  
- **Chroma**: Vector database for embeddings & retrieval  
- **Open/Free LLMs**: e.g., Zephyr, Mistral  
- **PDF Upload**: Query your own documents interactively  

**Goal:** Upload a PDF and ask questions about its content using an LLM.


In [2]:
!pip install -q langchain langchain-community chromadb sentence-transformers transformers accelerate bitsandbytes gradio pypdf

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.9/19.9 MB[0m [31m89.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.5/323.5 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m17.6 MB/s[0m eta [36m0:00

## 2️⃣ Upload Your PDF
Use the file uploader to provide a PDF document.  

- You can upload **any PDF**: research paper, policy doc, or internal report.  
- The notebook will automatically extract text and split it into chunks for better retrieval.


In [3]:
from google.colab import files
print("📤 Please upload a PDF file to analyze...")
uploaded = files.upload()
pdf_path = list(uploaded.keys())[0]
print(f"✅ Uploaded file: {pdf_path}")

📤 Please upload a PDF file to analyze...


Saving Bajaj Finance Q1 FY26 Investor Presentation.pdf to Bajaj Finance Q1 FY26 Investor Presentation.pdf
✅ Uploaded file: Bajaj Finance Q1 FY26 Investor Presentation.pdf


## 3️⃣ Load and Split the PDF
- `PyPDFLoader` extracts text from PDF pages  
- `RecursiveCharacterTextSplitter` splits text into manageable chunks  
This ensures the LLM receives context in smaller, meaningful pieces.


In [4]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader(pdf_path)
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(docs)

print(f"✅ Loaded {len(chunks)} chunks from {pdf_path}")

✅ Loaded 112 chunks from Bajaj Finance Q1 FY26 Investor Presentation.pdf


## 4️⃣ Create Embeddings & Store in Chroma
- Convert each text chunk into **vector embeddings** using `SentenceTransformerEmbeddings`  
- Store vectors in **Chroma** for fast similarity-based retrieval  
- `retriever` allows querying the vector database efficiently


In [5]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = Chroma.from_documents(chunks, embedding=embeddings)
retriever = vectordb.as_retriever(search_kwargs={"k": 3})

  embeddings = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## 5️⃣ Load Free/Open LLM
- Using `HuggingFacePipeline` to wrap a model for LangChain  
- Recommended models for Colab: `HuggingFaceH4/zephyr-7b-alpha` or `mistralai/Mistral-7B-Instruct`  
- Using 8-bit quantization (`load_in_8bit=True`) reduces memory footprint.


In [None]:
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "HuggingFaceH4/zephyr-7b-alpha"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    load_in_8bit=True
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=pipe)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

## 6️⃣ Build Retrieval-QA Chain
- Combines **retriever** + **LLM**  
- Workflow: retrieve relevant chunks and generate an answer


In [None]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

## 7️⃣ Try Asking a Question
- Ask **any question** about your uploaded PDF  
- Example query: "Summarize the document briefly."


In [None]:
query = "Summarize the document briefly."
print("❓ Query:", query)
print("💬 Answer:", qa_chain.run(query))

## 8️⃣ Interactive Chat UI
- Launch a Gradio interface for interactive queries  
- Users can type questions and get answers from the PDF in real-time


In [None]:
def chat_fn(query):
    return qa_chain.run(query)

import gradio as gr
iface = gr.Interface(fn=chat_fn, inputs="text", outputs="text", title="📚 Ask Your PDF (LangChain + Chroma)")
iface.launch()