<a href="https://colab.research.google.com/github/mzohaibnasir/GenAI/blob/main/05_LangchainProject_MCQGenerator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
! pip install openai langchain streamlit python-dotenv PyPDF2

In [15]:
import os
import json
import pandas as pd
import traceback
from dotenv import load_dotenv # imports key value pairs from .env file and can set them as env variables

In [16]:
from langchain.chat_models import ChatOpenAI   # to access openai api
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain
from langchain.callbacks import get_openai_callback

import PyPDF2

In [6]:
from google.colab import userdata

OPENAIAPIKEY = userdata.get("OPENAIAPIKEY")
# os.getenv("OPENAIAPIKEY")

# or

# load_dotenv() will load all env varibales implicity.. mean no declaring


In [14]:

llm = ChatOpenAI(
    api_key=OPENAIAPIKEY,
    model_name="gpt-3.5-turbo",
    temperature=0.5
)
# llm

# 1. Design input and output prompts


In [21]:
# RESPONSE FORMAT
RESPONSE_JSON = {
    "1": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "2": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "3": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
}

In [22]:

TEMPLATE = """
Text:{text}
You are an expert MCQ maker. Given the above text, it is your job to \
create a quiz  of {number} multiple choice questions for {subject} students in {tone} tone.
Make sure the questions are not repeated and check all the questions to be conforming the text as well.
Make sure to format your response like  RESPONSE_JSON below  and use it as a guide. \
Ensure to make {number} MCQs
### RESPONSE_JSON
{response_json}

"""



In [23]:
quiz_generation_prompt = PromptTemplate(
    input_variables=['text', 'number', 'subject', 'tone', 'response_json'],  # vars will be input by user
    template=TEMPLATE
)

2. Use LLMChain to connect LLM and prompt

In [25]:
quiz_chain = LLMChain(
    llm=llm,
    prompt = quiz_generation_prompt,
    output_key="quiz",
    verbose=True
)
# quiz_chain

# 3. prompt to evaluate generated quiz

In [26]:
TEMPLATE2="""
You are an expert english grammarian and writer. Given a Multiple Choice Quiz for {subject} students.\
You need to evaluate the complexity of the question and give a complete analysis of the quiz. Only use at max 50 words for complexity analysis.
if the quiz is not at per with the cognitive and analytical abilities of the students,\
update the quiz questions which needs to be changed and change the tone such that it perfectly fits the student abilities
Quiz_MCQs:
{quiz}

Check from an expert English Writer of the above quiz:
"""

In [27]:
quiz_evaluation_prompt = PromptTemplate(
    input_variables=['subject', 'quiz'],  # vars will be input by user
    template=TEMPLATE2
)

In [28]:
review_chain = LLMChain(
    llm=llm,
    prompt = quiz_evaluation_prompt,
    output_key="review",
    verbose=True
)
# review_chain

# 4. Connect both chains using sequentialChain

In [34]:
generate_evaluate_chain = SequentialChain(

                                          chains=[quiz_chain, review_chain],
                                          input_variables=['text', 'number', 'subject', 'tone', 'response_json'],
                                          output_variables = ['quiz','review'],
                                          verbose=True,
)
# generate_evaluate_chain

# 5. Getting data

to get data :
1. if you have data in pdf, load the pdf, etc on basis of which quiz will be created

In [40]:
file_path = r'./data.txt'

In [41]:
with open(file_path, 'r') as file:
    TEXT = file.read()

In [42]:
print(TEXT)

Biology is the scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of cells that process hereditary information encoded in genes, which can be transmitted to future generations. Another major theme is evolution, which explains the unity and diversity of life.[1][2][3] Energy processing is also important to life as it allows organisms to move, grow, and reproduce.[1][2][3] Finally, all organisms are able to regulate their own internal environments.[1][2][3][4][5]

Biologists are able to study life at multiple levels of organization,[1] from the molecular biology of a cell to the anatomy and physiology of plants and animals, and evolution of populations.[1][6] Hence, there are multiple subdisciplines within biology, each defined by the nature of their research questions and the tools that they use.[7][8][9] Like other scientists, bio

In [43]:
# Serialize the Python dictionary into a JSON-formatted string
json.dumps(RESPONSE_JSON)


'{"1": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}, "2": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}, "3": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}}'

# 6. Token tracking in Langchain
  How to setup token usage traacking in langchain

In [44]:
NUMBER =5
SUBJECT="machine learning"
TONE = 'simple'

In [55]:
with get_openai_callback() as cb:
  response = generate_evaluate_chain(
      {
          "text": TEXT,
          "number": NUMBER,
          "subject":SUBJECT,
          "tone":TONE,
          "response_json":json.dumps(RESPONSE_JSON)
      }
  )

  print(cb)


# generate_evaluate_chain:



[1m> Entering new SequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Text:Biology is the scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of cells that process hereditary information encoded in genes, which can be transmitted to future generations. Another major theme is evolution, which explains the unity and diversity of life.[1][2][3] Energy processing is also important to life as it allows organisms to move, grow, and reproduce.[1][2][3] Finally, all organisms are able to regulate their own internal environments.[1][2][3][4][5]

Biologists are able to study life at multiple levels of organization,[1] from the molecular biology of a cell to the anatomy and physiology of plants and animals, and evolution of populations.[1][6] Hence, there are multiple subdiscipline

In [56]:
print(f"Total Tokens:{cb.total_tokens}")
print(f"Prompt Tokens:{cb.prompt_tokens}")
print(f"Completion Tokens:{cb.completion_tokens}")
print(f"Total Cost:{cb.total_cost}")




Total Tokens:1497
Prompt Tokens:1077
Completion Tokens:420
Total Cost:0.0024555


In [60]:
response

{'text': 'Biology is the scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of cells that process hereditary information encoded in genes, which can be transmitted to future generations. Another major theme is evolution, which explains the unity and diversity of life.[1][2][3] Energy processing is also important to life as it allows organisms to move, grow, and reproduce.[1][2][3] Finally, all organisms are able to regulate their own internal environments.[1][2][3][4][5]\n\nBiologists are able to study life at multiple levels of organization,[1] from the molecular biology of a cell to the anatomy and physiology of plants and animals, and evolution of populations.[1][6] Hence, there are multiple subdisciplines within biology, each defined by the nature of their research questions and the tools that they use.[7][8][9] Like other sci

In [61]:
print(response.get('quiz'))


{
  "1": {
    "mcq": "What is biology?",
    "options": {
      "a": "The study of rocks",
      "b": "The scientific study of life",
      "c": "The study of outer space",
      "d": "The study of economics"
    },
    "correct": "b"
  },
  "2": {
    "mcq": "What is a major theme in biology that explains the unity and diversity of life?",
    "options": {
      "a": "Evolution",
      "b": "Chemistry",
      "c": "Physics",
      "d": "Geology"
    },
    "correct": "a"
  },
  "3": {
    "mcq": "What allows organisms to move, grow, and reproduce?",
    "options": {
      "a": "Energy processing",
      "b": "Sleeping",
      "c": "Meditation",
      "d": "Reading"
    },
    "correct": "a"
  },
  "4": {
    "mcq": "What is a key component that all organisms are made up of?",
    "options": {
      "a": "Atoms",
      "b": "Cells",
      "c": "Rocks",
      "d": "Clouds"
    },
    "correct": "b"
  },
  "5": {
    "mcq": "How do biologists study life at different levels of organizat

In [62]:
quiz = response.get('quiz')
quiz

'\n{\n  "1": {\n    "mcq": "What is biology?",\n    "options": {\n      "a": "The study of rocks",\n      "b": "The scientific study of life",\n      "c": "The study of outer space",\n      "d": "The study of economics"\n    },\n    "correct": "b"\n  },\n  "2": {\n    "mcq": "What is a major theme in biology that explains the unity and diversity of life?",\n    "options": {\n      "a": "Evolution",\n      "b": "Chemistry",\n      "c": "Physics",\n      "d": "Geology"\n    },\n    "correct": "a"\n  },\n  "3": {\n    "mcq": "What allows organisms to move, grow, and reproduce?",\n    "options": {\n      "a": "Energy processing",\n      "b": "Sleeping",\n      "c": "Meditation",\n      "d": "Reading"\n    },\n    "correct": "a"\n  },\n  "4": {\n    "mcq": "What is a key component that all organisms are made up of?",\n    "options": {\n      "a": "Atoms",\n      "b": "Cells",\n      "c": "Rocks",\n      "d": "Clouds"\n    },\n    "correct": "b"\n  },\n  "5": {\n    "mcq": "How do biologists

In [65]:
quiz = json.loads(quiz)

# 6. Creating dataframe

In [67]:
quiz_table_data = []
for key, value in quiz.items():
    mcq = value["mcq"]
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value["options"].items()
            ]
        )
    correct = value["correct"]
    quiz_table_data.append({"MCQ": mcq, "Choices": options, "Correct": correct})
quiz_table_data

[{'MCQ': 'What is biology?',
  'Choices': 'a: The study of rocks | b: The scientific study of life | c: The study of outer space | d: The study of economics',
  'Correct': 'b'},
 {'MCQ': 'What is a major theme in biology that explains the unity and diversity of life?',
  'Choices': 'a: Evolution | b: Chemistry | c: Physics | d: Geology',
  'Correct': 'a'},
 {'MCQ': 'What allows organisms to move, grow, and reproduce?',
  'Choices': 'a: Energy processing | b: Sleeping | c: Meditation | d: Reading',
  'Correct': 'a'},
 {'MCQ': 'What is a key component that all organisms are made up of?',
  'Choices': 'a: Atoms | b: Cells | c: Rocks | d: Clouds',
  'Correct': 'b'},
 {'MCQ': 'How do biologists study life at different levels of organization?',
  'Choices': 'a: By staring at the sky | b: By asking random people | c: By using the scientific method | d: By guessing',
  'Correct': 'c'}]

In [68]:
quiz=pd.DataFrame(quiz_table_data)
quiz

Unnamed: 0,MCQ,Choices,Correct
0,What is biology?,a: The study of rocks | b: The scientific stud...,b
1,What is a major theme in biology that explains...,a: Evolution | b: Chemistry | c: Physics | d: ...,a
2,"What allows organisms to move, grow, and repro...",a: Energy processing | b: Sleeping | c: Medita...,a
3,What is a key component that all organisms are...,a: Atoms | b: Cells | c: Rocks | d: Clouds,b
4,How do biologists study life at different leve...,a: By staring at the sky | b: By asking random...,c


In [69]:
quiz.to_csv("machinelearning.csv",index=False)

In [70]:
from datetime import datetime
datetime.now().strftime('%m_%d_%Y_%H_%M_%S')

'03_16_2024_05_49_17'