In [49]:
import os
from dotenv import load_dotenv
load_dotenv()

groq_api=os.getenv("FELLOWSHIP_GROQ_KEY")
pinecone_api=os.getenv("PINECONE_API_KEY")

if groq_api:
    print("API loaded successfully.")
else:
    print("API not loaded.")
if pinecone_api:
    print("API loaded successfully.")
else:
    print("API not loaded.")

API loaded successfully.
API loaded successfully.


In [50]:
# initialize GROQ chat model
from langchain_groq import ChatGroq
chat=ChatGroq(
    groq_api_key=groq_api,
    model_name="Llama-3.3-70B-Versatile"
)

In [51]:
from langchain.schema import AIMessage, HumanMessage, SystemMessage
messages=[
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="Who is the founder of Pakistan?"),
    AIMessage(content="Quaid-e-Azam Muhammad Ali Jinnah was the founder of Pakistan."),
    HumanMessage(content="I'd like to know about the history of Pakistan.")
]

In [52]:
res=chat(messages)
print(res.content)

The history of Pakistan is a rich and complex one, spanning over a thousand years. Here's a brief overview:

**Ancient History (3300 BCE - 500 CE):**
The region that is now Pakistan has been inhabited by various civilizations, including the Indus Valley Civilization, which is considered one of the oldest civilizations in the world. The region was later conquered by the Persians, Greeks, and Arabs, who introduced Islam to the region.

**Medieval History (500 - 1500 CE):**
In the medieval period, Pakistan was ruled by various Muslim dynasties, including the Ghaznavids, Ghurids, and Delhi Sultanate. The region was a major center of trade, culture, and learning, with cities like Lahore and Multan becoming important hubs.

**Mughal Era (1526 - 1756 CE):**
The Mughal Empire, founded by Babur, ruled Pakistan for over two centuries. During this period, the region experienced a golden age of culture, architecture, and art. The Mughals built many iconic monuments, including the Badshahi Mosque a

In [53]:
messages.append(res)


In [54]:
messages

[SystemMessage(content='You are a helpful assistant', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Who is the founder of Pakistan?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Quaid-e-Azam Muhammad Ali Jinnah was the founder of Pakistan.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="I'd like to know about the history of Pakistan.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="The history of Pakistan is a rich and complex one, spanning over a thousand years. Here's a brief overview:\n\n**Ancient History (3300 BCE - 500 CE):**\nThe region that is now Pakistan has been inhabited by various civilizations, including the Indus Valley Civilization, which is considered one of the oldest civilizations in the world. The region was later conquered by the Persians, Greeks, and Arabs, who introduced Islam to the region.\n\n**Medieval History (500 - 1500 CE):**\nIn the medieval period, Pakistan was ruled by vari

In [55]:
prompt=HumanMessage(content="Who was the first prime minister of Pakistan?")
messages.append(prompt)
res=chat(messages)
print(res.content)

The first Prime Minister of Pakistan was Liaquat Ali Khan. He served as the Prime Minister from August 14, 1947, to October 16, 1951. Liaquat Ali Khan was a close associate of Muhammad Ali Jinnah, the founder of Pakistan, and played a key role in the country's early years. He was instrumental in shaping the country's constitution, economy, and foreign policy. Unfortunately, his tenure was cut short when he was assassinated on October 16, 1951, in Rawalpindi.


In [56]:
pip install PyMuPDF

Note: you may need to restart the kernel to use updated packages.


In [57]:
# Load PDF Document as Knowledge Source
import fitz
pdf_path="text.pdf"
text=""
with fitz.open(pdf_path) as doc:
    for page in doc:
        text+=page.get_text("text")
    
print(text)

Title: A Short History of Pakistan
Pakistan was created on August 14, 1947, after the partition of British India. 
The demand for a separate homeland for Muslims was led by Quaid-e-Azam 
Muhammad Ali Jinnah. 
He envisioned Pakistan as a country where Muslims could freely practice their 
religion and culture.
The new country faced major challenges from the beginning — a shortage of 
resources, migration of millions of refugees, and administrative difficulties. 
Despite these hardships, Pakistan quickly established its government 
institutions.
Liaquat Ali Khan became the first Prime Minister of Pakistan, while Muhammad 
Ali Jinnah served as the first Governor-General.
In 1956, Pakistan adopted its first constitution, officially becoming the Islamic 
Republic of Pakistan. 
The country experienced periods of democracy and military rule throughout its 
history.
In 1971, East Pakistan separated and became the independent country of 
Bangladesh after a civil war.
Today, Pakistan is known for

In [58]:
# Split PDF text into small chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts=splitter.split_text(text)
print(texts)

['Title: A Short History of Pakistan\nPakistan was created on August 14, 1947, after the partition of British India. \nThe demand for a separate homeland for Muslims was led by Quaid-e-Azam \nMuhammad Ali Jinnah. \nHe envisioned Pakistan as a country where Muslims could freely practice their \nreligion and culture.\nThe new country faced major challenges from the beginning — a shortage of \nresources, migration of millions of refugees, and administrative difficulties.', 'resources, migration of millions of refugees, and administrative difficulties. \nDespite these hardships, Pakistan quickly established its government \ninstitutions.\nLiaquat Ali Khan became the first Prime Minister of Pakistan, while Muhammad \nAli Jinnah served as the first Governor-General.\nIn 1956, Pakistan adopted its first constitution, officially becoming the Islamic \nRepublic of Pakistan. \nThe country experienced periods of democracy and military rule throughout its \nhistory.', 'The country experienced peri

In [59]:
# Convert into DataFrame for batching
import pandas as pd
data=pd.DataFrame({"chunks":texts})
print(f"Loaded and split PDF into {len(data)} chunks")


Loaded and split PDF into 3 chunks


In [60]:
# Create Pinecone Index & Embeddings
from pinecone import Pinecone, AwsRegion, ServerlessSpec, Metric, CloudProvider
from langchain_huggingface import HuggingFaceEmbeddings
from tqdm.auto import tqdm

pc=Pinecone(api_key=pinecone_api)
index_name="pakistan-history"


In [None]:
# Delete if exists
if index_name in [i["name"] for i in pc.list_indexes()]:
    pc.delete_index(index_name)

In [62]:
# Create new index
pc.create_index(
    name=index_name,
    metric=Metric.DOTPRODUCT,
    dimension=384,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS,
        region=AwsRegion.US_EAST_1
    )
)

{
    "name": "pakistan-history",
    "metric": "dotproduct",
    "host": "pakistan-history-15e932d.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 384,
    "deletion_protection": "disabled",
    "tags": null
}

In [63]:
# Connect to the index
index=pc.Index(index_name)

In [64]:
# Embedding Model
embed_model=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [67]:
batch_size = 100
for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i + batch_size)
    batch = data.iloc[i:i_end]
    ids = [f"chunk-{j}" for j in range(i, i_end)]
    texts = batch["chunks"].tolist()
    embeds = embed_model.embed_documents(texts)
    metadata = [{"text": x} for x in texts]
    index.upsert(vectors=zip(ids, embeds, metadata))

print("All chunks uploaded to Pinecone!")

100%|██████████| 1/1 [00:00<00:00,  1.15it/s]

All chunks uploaded to Pinecone!





In [73]:
# Create VectorStore
from langchain_pinecone import PineconeVectorStore
text_field="text"
vectorstore=PineconeVectorStore(
    index=index, 
    embedding=embed_model,
    text_key=text_field
)

In [74]:
def augment_prompt(query:str):
    results=vectorstore.similarity_search(query,k=3)
    source_knowledge="\n".join([x.page_content for x in results])
    augmented_prompt=f"""Using the context below, answer the query.
Context:
{source_knowledge}
Query:
{query}
"""
    return augmented_prompt

In [None]:
query = "What are the main events in Pakistan’s independence movement?"
prompt=HumanMessage(content=augment_prompt(query))
messages.append(prompt)
res=chat(messages)
res.content

"The context provided does not explicitly mention the main events in Pakistan's independence movement. However, it does mention the following key events and figures that contributed to Pakistan's independence:\n\n1. The demand for a separate homeland for Muslims was led by Quaid-e-Azam Muhammad Ali Jinnah.\n2. Pakistan was created on August 14, 1947, after the partition of British India.\n\nTo provide a more comprehensive answer, some of the main events in Pakistan's independence movement include:\n\n* The Pakistan Resolution (1940): The Muslim League, led by Muhammad Ali Jinnah, passed a resolution demanding a separate homeland for Muslims.\n* The Partition of British India (1947): The British Indian Empire was divided into two separate countries: India and Pakistan.\n* The role of key figures: Quaid-e-Azam Muhammad Ali Jinnah, Liaquat Ali Khan, and other leaders played important roles in the independence movement.\n\nNote that the provided context is limited, and a more detailed acco

In [77]:
prompt = HumanMessage(content=augment_prompt("Who was Pakistan's first Governor-General?"))
res = chat(messages + [prompt])
res.content

"Muhammad Ali Jinnah served as Pakistan's first Governor-General."

In [78]:
# Optional: Delete index after testing
# pc.delete_index(index_name)

#### New Work

In [80]:
import os
from dotenv import load_dotenv
load_dotenv()

groq_api=os.getenv("FELLOWSHIP_GROQ_KEY")
pinecone_api=os.getenv("PINECONE_API_KEY")
index_name="pakistan-history"
pdf_path="text.pdf"
dimension=384


if not pinecone_api:
    raise ValueError("Set PINECONE_API_KEY in your .env")
if not groq_api:
    raise ValueError("Set FELLOWSHIP_GROQ_KEY in your .env")
if not os.path.exists(pdf_path):
    raise ValueError(f"PDF file not found: {pdf_path}")

In [81]:
import fitz
import pandas as pd
from tqdm.auto import tqdm

from langchain_groq import ChatGroq
from langchain.schema import SystemMessage, AIMessage, HumanMessage

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings

from langchain_pinecone import PineconeVectorStore
from pinecone import AwsRegion, CloudProvider, Pinecone, ServerlessSpec, Metric


In [82]:
chat=ChatGroq(
    groq_api_key=groq_api,
    model_name="Llama-3.3-70B-Versatile"
)

In [84]:
def load_pdf_split(path, chunk_size=500, chunk_overlap=100):
    pdf_path="text.pdf"
    text=""
    with fitz.open(pdf_path) as doc:
        for page in doc:
            text+=page.get_text("text") + "\n"
    splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=chunk_overlap)
    chunks=splitter.split_text(text)
    return chunks

print("Loading and splitting PDF...")
chunks=load_pdf_split(pdf_path)
data=pd.DataFrame({"chunks":texts})
print(f"Loaded and split PDF into {len(data)} chunks.")
            

Loading and splitting PDF...
Loaded and split PDF into 3 chunks.


In [85]:
pc=Pinecone(api_key=pinecone_api)
index_name="pakistan-history"

In [None]:
if index_name in [i["name"] for i in pc.list_indexes()]:
    pc.delete_index(index_name)

In [None]:
# Create new index
pc.create_index(
    name=index_name,
    metric=Metric.DOTPRODUCT,
    dimension=384,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS,
        region=AwsRegion.US_EAST_1
    )
)

In [None]:
index=pc.Index(index_name)

In [None]:
embed_model=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [86]:
# if index has no vectors, upload; otherwise skip to avoid duplicates.
stats=index.describe_index_stats()
total_vectors = stats.get("namespaces", {}).get("", {}).get("vector_count", 0) if stats else 0


In [88]:
if total_vectors == 0:
    print("Index is empty — uploading chunks...")
    batch_size = 100
    for i in tqdm(range(0, len(data), batch_size)):
        i_end = min(len(data), i + batch_size)
        batch = data.iloc[i:i_end]
        ids = [f"chunk-{j}" for j in range(i, i_end)]
        texts = batch["chunk"].tolist()
        embeds = embed_model.embed_documents(texts)
        metadata = [{"text": t} for t in texts]
        # Pinecone expects iterable of (id, vector, metadata)
        index.upsert(vectors=zip(ids, embeds, metadata))
    print("Upload complete.")
else:
    print(f"Index already contains vectors ({total_vectors}) — skipping upload.")


Index already contains vectors (3) — skipping upload.


In [89]:
vectorstore=PineconeVectorStore(
    index=index, 
    embedding=embed_model,
    text_key=text_field
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})


In [90]:
system_prompt=SystemMessage(content="You are a helpful assistant. Use the provided document context to answer user queries. If answer is not in the context, say you don't know.")
message_history=[system_prompt]


In [94]:
def get_context_for_query(query: str, k: int = 3) -> str:
    docs = vectorstore.similarity_search(query, k=k)
   
    texts = [d.page_content for d in docs]
    # trim/format
    return "\n\n".join(texts)

In [95]:
def build_user_prompt_with_context(query: str) -> HumanMessage:
    context = get_context_for_query(query)
    augmented = f"Context:\n{context}\n\nQuestion:\n{query}\n\nAnswer using ONLY the context above. If the context does not contain the answer, say 'I don't know from the provided documents.'"
    return HumanMessage(content=augmented)

In [None]:
print("\nConversational RAG Chatbot ready. Type your question and press Enter.")
print("Type 'exit' or 'quit' to stop.\n")

while True:
    try:
        user_input = input("You: ").strip()
    except KeyboardInterrupt:
        print("\nExiting...")
        break
    if not user_input:
        continue
    if user_input.lower() in ("exit", "quit", "bye"):
        print("Goodbye")
        break

    # Build prompt with retrieved context
    user_msg = build_user_prompt_with_context(user_input)

    # Append user message to history
    message_history.append(user_msg)

    # Send full conversation history (system + prior messages + this user message) to Groq
    # Groq Chat accepts a list of messages like LangChain schema objects
    response = chat(message_history)   # earlier you used chat(messages) successfully
    bot_text = response.content if hasattr(response, "content") else str(response)

    # Print assistant response
    print("Bot:", bot_text)

    # Append assistant reply to history (so next turn has context)
    message_history.append(AIMessage(content=bot_text))



Conversational RAG Chatbot ready. Type your question and press Enter.
Type 'exit' or 'quit' to stop.

Bot: I don't know from the provided documents.
Bot: Pakistan was created on August 14, 1947, after the partition of British India, led by Quaid-e-Azam Muhammad Ali Jinnah, who envisioned it as a country where Muslims could freely practice their religion and culture. The country faced major challenges from the beginning, including a shortage of resources, migration of millions of refugees, and administrative difficulties. Despite these hardships, Pakistan quickly established its government institutions, with Liaquat Ali Khan as the first Prime Minister and Muhammad Ali Jinnah as the first Governor-General. In 1956, Pakistan adopted its first constitution, becoming the Islamic Republic of Pakistan. The country experienced periods of democracy and military rule throughout its history. In 1971, East Pakistan separated and became Bangladesh after a civil war. Today, Pakistan is known for i