In [55]:
import os
import json
import traceback
import pandas as pd
import PyPDF2
from datetime import datetime

- os → for environment variable handling (API keys).
- json → serialize/deserialize LLM outputs (since model outputs are text, but we want structured JSON).
- traceback → helpful for printing detailed error logs.
- pandas (pd) → creating structured tables (CSV export).
- PyPDF2 → optional; not directly used in this snippet, but typically for extracting text from PDFs.
- datetime → timestamp filenames for versioning.
- dotenv.load_dotenv() → loads API keys from a .env file, keeping secrets out of code.

In [56]:
from dotenv import load_dotenv
load_dotenv()

True

In [57]:
# from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback
from langchain.schema.runnable import RunnableMap
from langchain_openai import ChatOpenAI

- PromptTemplate → dynamic templates with variables ({text}, {subject}, etc.) so we don’t hardcode prompts.
- get_openai_callback → tracks token usage (cost monitoring).
- RunnableMap → runs multiple chains in parallel (maps input → multiple outputs).
- ChatOpenAI → wrapper around OpenAI’s ChatCompletion API → abstracts raw API requests, easier integration with LangChain chains.

# Load API key

In [58]:
KEY=os.getenv("OPENAI_API_KEY")

# LLM Initialisation

In [59]:
llm = ChatOpenAI(
    openai_api_key=KEY,  
    model_name="gpt-3.5-turbo",
    temperature=0.5
)


- ChatOpenAI: LangChain’s wrapper for OpenAI models (gpt-3.5-turbo).
- Parameters:
> * openai_api_key → key loaded earlier.
> * model_name → selects which LLM.
> * temperature → randomness in outputs (0.5 = moderate creativity).

- Under the hood, this calls openai.ChatCompletion.create().
- Instead of raw JSON responses, LangChain wraps results into AIMessage objects.
- LLM is now callable like a function (llm.invoke(input)).

In [60]:
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x00000209ED1826A0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x00000209ED182E20>, root_client=<openai.OpenAI object at 0x00000209ED0CDDF0>, root_async_client=<openai.AsyncOpenAI object at 0x00000209ED1823A0>, temperature=0.5, openai_api_key=SecretStr('**********'), openai_proxy='')

# Response JSON schema

In [61]:
RESPONSE_JSON = {
    "1": {"mcq": "multiple choice question",
          "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"},
          "correct": "correct answer"},
    "2": {"mcq": "multiple choice question",
          "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"},
          "correct": "correct answer"},
    "3": {"mcq": "multiple choice question",
          "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"},
          "correct": "correct answer"}
}

- We guide the LLM to always produce MCQs in this structure.

# Templates

In [62]:
TEMPLATE = """
Text:{text}
You are an expert MCQ maker. Given the above text, it is your job to \
create a quiz of {number} multiple choice questions for {subject} students in {tone} tone. 
Make sure the questions are not repeated and check all the questions to be conforming to the text as well.
Make sure to format your response like RESPONSE_JSON below and use it as a guide. \
Ensure to make {number} MCQs.

### RESPONSE_JSON
{response_json}
"""

quiz_generation_prompt = PromptTemplate(
    input_variables=["text", "number", "subject", "tone", "response_json"],
    template=TEMPLATE
)

In [63]:
TEMPLATE2 = """
You are an expert English grammarian and writer. 
Given a Multiple Choice Quiz for {subject} students, you need to evaluate the complexity 
of the question and give a complete analysis of the quiz. Use at most 50 words for complexity analysis. 

If the quiz is not at par with the cognitive and analytical abilities of the students,
update the quiz questions that need to be changed and adjust the tone such that it perfectly fits the student abilities.

Quiz_MCQs:
{quiz}

Check from an expert English Writer of the above quiz:
"""

quiz_evaluation_prompt = PromptTemplate(
    input_variables=["subject", "quiz"],
    template=TEMPLATE2
)

# Chaining in LangChain

In [64]:
quiz_chain = quiz_generation_prompt | llm
review_chain = quiz_evaluation_prompt | llm

- The | operator = pipeline chaining.
> - quiz_chain: Takes prompt template → runs on LLM → returns quiz.
> - review_chain: Takes evaluation prompt → runs on LLM → returns review.

- This is function composition in LangChain.
- Think of it like Unix pipes → each stage passes output → next stage.

# RunnableMap (Parallel Execution)

In [None]:
generate_evaluate_chain = RunnableMap({
    "quiz": quiz_chain # runs quiz generation
}).assign(
    review=lambda x: review_chain.invoke({"subject": x["subject"], "quiz": x["quiz"]}) # adds another key "review" by invoking evaluation chain using quiz output
)

# Load Input Text

In [66]:
file_path = r"D:\SACHI\GEN-AI\mcq-generator\data.txt"
with open(file_path, 'r') as file:
    TEXT = file.read()

NUMBER = 5
SUBJECT = "biology"
TONE = "simple"

# Run with Token Usage Tracking

In [67]:
with get_openai_callback() as cb:
    response = generate_evaluate_chain.invoke({
        "text": TEXT,
        "number": NUMBER,
        "subject": SUBJECT,
        "tone": TONE,
        "response_json": json.dumps(RESPONSE_JSON)
    })
    
    
    # Debug prints
    print("Generated Quiz:\n", response["quiz"])
    print("\nQuiz Review:\n", response["review"])

    # Token usage report
    print("\n--- Token Usage ---")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost: ${cb.total_cost:.6f}")

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

- invoke → executes the chain.

> cb (callback) tracks:
> - total_tokens
> - prompt_tokens
> - completion_tokens
> - total_cost

- Useful for monitoring OpenAI usage.

# Extract Quiz and Save to CSV

In [None]:
try:
    quiz_json = json.loads(response["quiz"])  # Convert string (generated quiz) → dict

    quiz_table_data = []
    for key, value in quiz_json.items():
        mcq = value["mcq"]
        options = " | ".join([f"{opt}: {ans}" for opt, ans in value["options"].items()])
        correct = value["correct"]
        quiz_table_data.append({"MCQ": mcq, "Choices": options, "Correct": correct})

    quiz_df = pd.DataFrame(quiz_table_data)

    # Timestamped filename
    timestamp = datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
    filename = f"quiz_{SUBJECT}_{timestamp}.csv"
    quiz_df.to_csv(filename, index=False)

    print(f"\n✅ Quiz saved as: {filename}")

except Exception as e:
    print("⚠️ Error while processing quiz JSON:", str(e))
    traceback.print_exc()

⚠️ Error while processing quiz JSON: name 'response' is not defined


Traceback (most recent call last):
  File "C:\Users\Saachi\AppData\Local\Temp\ipykernel_11420\21759903.py", line 2, in <module>
    quiz_json = json.loads(response["quiz"])  # Convert string → dict
NameError: name 'response' is not defined


- Iterates through quiz questions.
- Normalizes into tabular format.
- Saves as timestamped CSV for uniqueness.

> - 🔑 LLM Concepts at Work

- Prompt Engineering → instructing the LLM with strict formatting.
- PromptTemplate → avoids hardcoding, reusability.
- Chain of Thought (hidden in LLM) → model generates reasoning to produce valid JSON.
- Temperature → balances creativity vs accuracy.
- Chaining (|) → modular pipelines, easy composition.
- RunnableMap → multi-output orchestration.
- Callback → token usage monitoring.
- Post-Processing → json.loads → structured → CSV.

> ✅ Flow of Work (Step by Step):

- Load .env → fetch API key.
- Initialize ChatOpenAI LLM wrapper.
- Define JSON schema for quiz.
- Build prompt templates for generation & evaluation.
- Create LangChain pipelines (chains).
- Use RunnableMap → generate quiz + review together.
- Read input text.
- Invoke pipeline with input variables.
- Print quiz + review.
- Track tokens/cost.
- Parse quiz JSON → tabularize → save to CSV.
- Handle errors gracefully with traceback.