### **Extracting Text from pdf**

In [4]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("Handouts_Lecture.pdf")
docs = loader.load()

In [5]:
for doc in docs:
    print(doc.page_content)

COMSATS University Islamabad, Virtual Campus 
HUM111 Pakistan studies 
Lecture 17 Handouts 
 
Page 1 of 6 
Constitution of 1973 
Background 
Abrogation of the 1962 Constitution on March 25, 1969 led to second martial law in the 
country.  Zulfiqar Ali Bhutto became first civilian chief martial law administrator of Pakistan 
on 20th December, 1971. After assuming power the most important task for Zulfiqar Ali 
Bhutto was to frame a new constitution. He was in favor of presidential form of government 
as this would give him more power but due to conflicting opinions within the Pakistan 
People’s Party he had to settle for parliamentary system. National  Assembly approved an 
Interim Constitution, which was enforced on April 21, 1972. 
Constitution Making 
Constitutional Committee comprised National Assembly (NA) members from all parties was 
set up in April 1972. Law Minister was the Chairman of this Committ ee. All parties agreed 
on the future political system in October 1972. The Comm

### **Splitting into chunks:**

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [7]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
chunks = splitter.split_documents(docs) 

In [8]:
len(chunks)

14

In [9]:
chunks

[Document(metadata={'producer': 'Microsoft® Word for Office 365', 'creator': 'Microsoft® Word for Office 365', 'creationdate': '2018-10-26T13:14:05+05:00', 'title': 'COMSATS University Islamabad, Virtual Campus', 'author': 'mudassir', 'subject': 'HUM111 Pakistan studies', 'moddate': '2018-10-26T13:14:05+05:00', 'source': 'Handouts_Lecture.pdf', 'total_pages': 6, 'page': 0, 'page_label': '1'}, page_content='COMSATS University Islamabad, Virtual Campus \nHUM111 Pakistan studies \nLecture 17 Handouts \n \nPage 1 of 6 \nConstitution of 1973 \nBackground \nAbrogation of the 1962 Constitution on March 25, 1969 led to second martial law in the \ncountry.  Zulfiqar Ali Bhutto became first civilian chief martial law administrator of Pakistan \non 20th December, 1971. After assuming power the most important task for Zulfiqar Ali \nBhutto was to frame a new constitution. He was in favor of presidential form of government \nas this would give him more power but due to conflicting opinions within t

### **Chatgroq**

In [10]:
from langchain_groq import ChatGroq
import os
from dotenv import load_dotenv

In [11]:
# Load variables from .env
load_dotenv()

# Get API key from environment
api_key = os.getenv("GROQ_API_KEY")

if not api_key:
    raise ValueError("❌ GROQ_API_KEY not found. Please set it in your .env file.")


In [15]:
# Create the LLM client
llm = ChatGroq(
    temperature=0.1,
    groq_api_key=api_key,
    model_name="llama-3.3-70b-versatile"
)

# Example query
response = llm.invoke("Who was the first person to land on the moon?")
print(response.content)

The first person to land on the moon was Neil Armstrong. He stepped out of the lunar module Eagle and onto the moon's surface on July 20, 1969, during the Apollo 11 mission. Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first human to set foot on the moon. He was followed by fellow astronaut Edwin "Buzz" Aldrin, who also walked on the moon's surface during the mission.


### WHATS HAPPENING?
The user uploads a document(pdf). The text has been extracted and converted into chunks.

The user can choose different modes:
1) QNA
2) MCQS
3) Flashcards
4) Summary

If the user uses any from the first three options, the user has to select no.of questions and diffculty(easy,medium, hard). \
For example the user chooses:
- mode: mcqs
- no.of ques: 5
- difficulty: easy

For each chunk,5 easy mcqs are generated (**chunk-wise results**). Then all the mcqs are passed to final prompt that combines them to ensure that total 5 mcqs are there and there is no repition (**Merged output from all chunks**).

If the user chooses mode: summary, then for each chunk, summary is generated and then passed to final prompt that generates a final summary from individual sumamaries of each chunk.

### **Chunk-wise Results**:

In [18]:
def generate_prompt(mode, num_questions, difficulty, chunk_text):
    if mode == "QNA":
        if not num_questions or not difficulty:
            raise ValueError("For QNA, 'num_questions' and 'difficulty' must be provided.")
        prompt = f"""
        You are a knowledgeable teacher. Generate {num_questions} question-answer pairs from the text below.

        Requirements:
        - Questions should be {difficulty} level (easy, medium, hard)
        - Each answer should be complete, concise, and accurate
        - Include only factual information from the text
        - Format as:
           Q:1 ...
           A: ...
           Q:2 ...
           A: ...

        Text:
        {chunk_text}"""
    
    elif mode == "MCQS":
        if not num_questions or not difficulty:
            raise ValueError("For MCQS, 'num_questions' and 'difficulty' must be provided.")
        prompt = f""" 
        You are an educational content creator. Create {num_questions} multiple-choice questions from the text below.

        Requirements:
        - Questions should be {difficulty} level (easy, medium, hard)
        - Each question should have 4 options labeled A, B, C, D
        - Clearly indicate the correct option
        - Avoid ambiguous or trick questions
        - Format as:
        1. Question: ...
           A) ...
           B) ...
           C) ...
           D) ...
           Answer: B

        Text:
        {chunk_text}
        """

    elif mode == "Flashcards":
        if not num_questions or not difficulty:
            raise ValueError("For Flashcards, 'num_questions' and 'difficulty' must be provided.")
        prompt = f"""You are an expert educator. Create {num_questions} flashcards from the text below.

        Requirements:
        - Questions should be {difficulty} level (easy, medium, hard)
        - Each flashcard should have a “Question” and an “Answer”
        - Keep answers concise but informative
        - Format as:
           Q:1 ...
           A: ...
           Q:2 ...
           A: ...

        Text:
        {chunk_text}"""

    elif mode == "Summary":
        prompt = f"""
        You are an expert summarizer. Create a clear, concise, and accurate summary of the following text.

        Requirements:
        - Highlight the main ideas and key points
        - Avoid repetition and unnecessary detail
        - Keep it objective and fact-based
        - Aim for readability

        Text:
        {chunk_text}
        """
    else:
        raise ValueError("Invalid mode. Choose from: QNA, MCQS, Flashcards, Summary")

    return llm.invoke(prompt)

### **Merged Output from All Chunks:**

In [None]:

def consolidated_result(mode, num_questions=None, difficulty=None):
    output = []
    # generating result for each chunk
    for chunk in chunks:
        result = generate_prompt(mode, num_questions, difficulty, chunk)
        output.append(result)

    # combining results
    joined_outputs = [o.content for o in output]

    # passing the combined result to the prompt to get final result
    prompt = f"You are an expert educator. Here are {mode} generated from different chunks:\n\n{joined_outputs}\n\n"

    if mode in ["MCQS", "QNA", "Flashcards"]:
        prompt += "- Remove duplicates and similar questions\n"
        prompt += f"- Ensure all items are {difficulty} level\n"
        prompt += f"- Limit the final output to {num_questions} items\n"
        prompt += "- Number them sequentially and keep formatting clean\n"
    elif mode == "Summary":
        prompt += "- Merge into a coherent and concise summary\n- Avoid repetition\n"
    else:
        raise ValueError("Mode must be MCQS, QNA, Flashcards, or Summary")

    prompt += "Return only the final polished output."

    return llm.invoke(prompt).content

In [None]:
# no need to pass difficulty and no.of questions becuz they are none by default
result = consolidated_result("Summary")

In [None]:
result = consolidated_result("MCQS",5,"easy")

In [None]:
result