In [1]:
import os 
import json
import pandas as pd
import getpass
import traceback

In [21]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In [90]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain.chains import LLMChain, SequentialChain
from langchain.callbacks import get_openai_callback

import PyPDF2

In [212]:
llm = ChatOpenAI(
    model="gpt-4.1-nano",
    temperature=0.5,
    max_tokens=256,
    timeout=None,
    max_retries=1,
    api_key=os.environ["OPENAI_API_KEY"],
    # organization="...",
    # other params...
)

## Note for new chain usage

In [225]:
RESPONSE_JSON = {
    "subject":"A",
    "quiz":{
        "1": {
            "mcq": "multiple choice question",
            "options": {
                "A": "option A",
                "B": "option B",
                "C": "option C",
                "D": "option D"
            },
            "answer": "A"
        },
        "2": {
            "mcq": "multiple choice question",
            "options": {
                "A": "option A",
                "B": "option B",
                "C": "option C",
                "D": "option D"
            },
            "answer": "B"
        }
    }
}

In [142]:
TEMPLATE = """
Text:{text}
You are a quiz generation AI. Your task is to generate a quiz of {number} choice questions for {subject} students in {tone} tone\
based on the provided text. \
Make sure the questions are not repeated and check all the questions to be conforming the text as well. \
Make sure to format your response like RESPONSE_JSON below and use it as a guide. \
Ensure to make {number} MCQs
### RESPONSE_JSON
{response_json}
"""

In [143]:
quiz_generation_prompt = ChatPromptTemplate.from_template(TEMPLATE)

In [144]:
TEMPLATE2 = """
You are an expert english grammarian and writer. Given a multiple choice quiz for {subject} students.\
Your task is to analyze the quiz and provide feedback on its grammatical correctness, clarity, and overall complexity.Only use at max 50 words for complexity.\
Please ensure that your feedback is constructive and aimed at helping the quiz creator improve their work.\
Update the quiz questions which needs to be changed and change the tone such that it perfectly fits the student abilities.\
Quiz_MCQs:
{quiz}
"""

In [145]:
quiz_review_prompt= ChatPromptTemplate.from_template(TEMPLATE2)

### Case 1: Leverage RunnablePassthrough

- Sequential Chain is similar to Sequential Model in Keras that it will take the output of previous chain as input of the next chain

- Input of Chain is in form of dictionary ex:
    ```
        input_params = {
            "text": "What is the capital of France",
            "subject": "Geography", 
            "tone": "Formal",
            "number": 1,
            "response_json": RESPONSE_JSON
        }
    ```

- ```JsonOutputParser``` will cast a JSON object to dictionary in Python so it is a need between 2 chains if the previous chain output JSON object. I consider it as a bonding

- ```RunnablePassthrough``` is used to customize the input to the chain like adding key ```quiz_result``` and ```original_input``` to the output of first chain and then use ```RunnableLambda``` and function ```add_quiz_to_params``` to create a suittable input that required by the next chain.

- ```json.dumps()``` to serialize python dictionary to JSON-formatted string

In [158]:
def add_quiz_to_params(result):
    # This function adds the quiz result back to the original parameters
    original_input = result["original_input"]
    quiz_data = result["quiz_result"]
    print(quiz_data)
    return {
        "quiz": json.dumps(quiz_data, indent=2),
        "subject": original_input["subject"]
        }

In [159]:
parser = JsonOutputParser()
generate_review_chain = (
        RunnablePassthrough.assign(
            quiz_result=(
                quiz_generation_prompt 
                | llm 
                | parser
            )
        ).assign(
            original_input=lambda x: x
        )
        | RunnableLambda(add_quiz_to_params)
        | quiz_review_prompt
        | llm
)
    

### Case 2: Raw usage

- If we already required the LLMs to format the output of the first chain following the requirements of the input of the second chain we can use it directly

- However ```Case 1 ``` offer us more flexible implementation as this case do not allow us to view the output of intermediate chain but last chain. While ```Case 1 ``` with the RunnableLambda we can do that by simply adding ```print``` to the custom function.

In [213]:
parser = JsonOutputParser()
generate_review_chain = (
    quiz_generation_prompt 
    | llm 
    | parser  # Parse JSON string to dictionary
    | RunnablePassthrough.assign(
            quiz_result=(
                quiz_review_prompt
                | llm
        )
    )
)

### Invoke

In [182]:
input_params = {
    "text": "What is the capital of France",
    "subject": "Geography", 
    "tone": "Formal",
    "number": 1,
    "response_json": RESPONSE_JSON
}

In [183]:
result = generate_review_chain.invoke(input_params)

In [190]:
print(result.keys())
print(result['quiz'])
print(result['quiz_result'].content)

dict_keys(['subject', 'quiz', 'quiz_result'])
{'1': {'mcq': 'What is the capital of France?', 'options': {'A': 'Paris', 'B': 'London', 'C': 'Berlin', 'D': 'Rome'}}}
Feedback: 
The quiz question is clear and simple, suitable for Geography students. However, it could be improved by adding more diverse and challenging questions to test the students' knowledge further. Consider incorporating questions about physical geography or world cultures to enhance the quiz.


## Callbacks to track TOKEN usage

In [201]:
file_path = "C:/Users/khang/Desktop/mcqgen/experiment/data.txt"

In [203]:
with open(file_path, "r") as file:
    TEXT = file.read()
print(TEXT)

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit instructions.[1] Within a subdiscipline in machine learning, advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance.[2]

ML finds application in many fields, including natural language processing, computer vision, speech recognition, email filtering, agriculture, and medicine. The application of ML to business problems is known as predictive analytics.

Statistics and mathematical optimisation (mathematical programming) methods comprise the foundations of machine learning. Data mining is a related field of study, focusing on exploratory data analysis (EDA) via unsupervised learning.[4][5]

From a theoretical viewpoint, probably approximately c

In [226]:
SUBJECT = 'machine learning'
TONE = 'Formal'
NUMBER = 1

input_params = {
    "text": TEXT,
    "subject": SUBJECT, 
    "tone": TONE,
    "number": NUMBER,
    "response_json": RESPONSE_JSON
}

with get_openai_callback() as cb:
    result = generate_review_chain.invoke(input_params)
    print(cb)

Tokens Used: 814
	Prompt Tokens: 581
		Prompt Tokens Cached: 0
	Completion Tokens: 233
		Reasoning Tokens: 0
Successful Requests: 2
Total Cost (USD): $0.0001513


In [232]:
print(json.dumps(result['quiz'], indent=4))
print(result['quiz_result'].content)

{
    "1": {
        "mcq": "Which of the following best describes the primary goal of machine learning?",
        "options": {
            "A": "To develop algorithms that can learn from data and generalise to unseen data without explicit instructions",
            "B": "To manually program all possible tasks for a computer to perform",
            "C": "To replace all human decision-making processes with fixed rules",
            "D": "To analyze only structured data without making predictions"
        },
        "answer": "A"
    }
}
The question is clear and grammatically correct, but options could be simplified for student clarity. Option D is slightly confusing; clarify that it refers to limited data types. 

Revised question:
**Which best describes the main goal of machine learning?**  
A) To create algorithms that learn from data and make predictions on new, unseen data.  
B) To manually program every task a computer should perform.  
C) To replace all human decisions with fixe