### This is an experimental notebook, which will be used for code experimentation before creating end-to-end python file 

Just going to use OpenAI, but not run any of the code. Just going to type out the code and write comments for notes. Same will be done for end-to-end modiularized python file

#### ENVIRONMENT SET UP 

In [41]:
import os
import pandas as pd 
import json 
import traceback 
import numpy as np


Use the following code cells if using GPT 3.5 Turbo: 

In [83]:
from langchain.chat_models import ChatOpenAI ##Lets us access OpenAI with LangChain
from dotenv import load_dotenv


load_dotenv() ##Needed to take environment variables from .env (so we can use os.getenv())


True

In [86]:
KEY = os.getenv('OPENAI_API_KEY') ##key is stored in .env file

llm = ChatOpenAI(openai_api_key = KEY, 
                 model_name = 'gpt-3.5-turbo', 
                 temperature = 0.3) ##Call OpenAI and get the LLM model



In [88]:
#KEY

#### NOW THAT ENV IS SET UP, WE CAN DO THE ACTUAL CODE

In [50]:
##import everything we need that we haven't yet: 

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain
from langchain.callbacks import get_openai_callback
import PyPDF2



Now that we have our LLM, we will have to do the following: <br> 
* Design input prompt that will be passed into model <br> 
* Design the output prompt that will come out of the model after inputting the input prompt. This basically means we're designing how the output will look, kinda creating an outline of the output with a "fill in the blank," which the model will fill in with the result. 



#### WHAT IS OUR PROJECT? 
To create a multiple choice generator, in which the input will be some text (like a story, article, etc), and the output will be a multiple choice question that can be used for a quiz. 

In [51]:
## FIRST: Design prompt using prompt template 
##First is to define the template to be used

TEMPLATE = """
Text: {text}
You are an expert MCQ maker. Given the above text, it is your job to \
create a quize o {number} multiple choice questions for {subject} students in {tone} tone.
Make sure the questions are not repeated and check all the questions to be conforming the text as well.
Make sure to format your response like RESPONSE_JSON below and use it as a guide. \
Ensure to make {number} MCQs. 

### RESPONSE_JSON
{response_json}

"""

##Think of the above kinda as something you'd type into ChatGPT

### TONE means DIFFICULTY LEVEL (easy, intermediate, difficult, etc)

In [52]:
##LET'S ALSO DEFINE RESPONSE_JSON (ie, what the output should kinda look like)
RESPONSE_JSON = {
    "1": {
        "mcq": "multiple choice question", 
        "options": {
            "a": "choice here", 
            "b": "choice here", 
            "c": "choice here", 
            "d": "choice here"
        }, 
        "correct": "correct answer"
    },
    "2": {
        "mcq": "multiple choice question", 
        "options": {
            "a": "choice here", 
            "b": "choice here", 
            "c": "choice here", 
            "d": "choice here"
        }, 
        "correct": "correct answer"
    },
    "3": {
        "mcq": "multiple choice question", 
        "options": {
            "a": "choice here", 
            "b": "choice here", 
            "c": "choice here", 
            "d": "choice here"
        }, 
        "correct": "correct answer"
    }
}




In [53]:
##input variables are what the user will pass

quiz_generation_prompt = PromptTemplate(
    input_variables= ['text', 'number', 'subject', 
                      'tone', 'response_json'], 
                      template = TEMPLATE 
)

In [54]:
##now create llm chain (to connect llm and prompt)

quiz_chain = LLMChain(llm = llm, prompt = quiz_generation_prompt, 
                      output_key = 'quiz', verbose = True)




In [55]:
##NOTICE IN RESPONSE_JSON THAT WE ALSO HAVE A "CORRECT ANSWER" THING 
### For this, we have to define one more template 

SECOND_TEMPLATE = """ 
You are an expert English grammarian and writer. Given a Multiple Choice Quiz for {subject} students, \
you need to evauate the complexity of the question and give a complete analysis of the quiz. Only use at max 50 words for complexity. \
If the quiz is not at par with cognitive and analytical abilities of the students, \
update the quiz questions which needs to be changed, and change the tone such that it perfectly fits the student's ability. \
Quiz_MCQs: 
{quiz}

Check from an expert English writer of the above quiz: 

"""


In [56]:
##Create another prompt template using SWCOND_TEMPLATE

quiz_evaluation_prompt = PromptTemplate(input_variables = ['subject', 'quiz'], 
                                        template = SECOND_TEMPLATE)




In [57]:
##Now using the second prompt template, make one more chain connecting the llm 
##with the second prompt template
review_chain = LLMChain(llm = llm, prompt = quiz_evaluation_prompt, 
                        output_key = 'review', verbose = True)





In [58]:
##Now connect the two LLM chains with Sequential Chain 

generate_evaluate_chain = SequentialChain(chains = [quiz_chain, review_chain], 
                                          input_variables = ['text', 'number', 'subject', 'tone', 'response_json'],
                                          output_variables = ['quiz', 'review'], verbose = True)



Now that we have our chain, we need some data to use to generate our quiz. 

In data.txt, I copy-pasted info about PCOS from https://www.mayoclinic.org/diseases-conditions/pcos/symptoms-causes/syc-20353439


Let's read the data

In [59]:
##Read the data from data.txt
file_path = '/Users/test/MCQGenerator/data.txt'

with open(file_path, 'r') as file: 
    TEXT = file.read()




In [60]:
print(TEXT)

Polycystic ovary syndrome (PCOS) is a problem with hormones that happens during the reproductive years. If you have PCOS, you may not have periods very often. Or you may have periods that last many days. You may also have too much of a hormone called androgen in your body.

With PCOS, many small sacs of fluid develop along the outer edge of the ovary. These are called cysts. The small fluid-filled cysts contain immature eggs. These are called follicles. The follicles fail to regularly release eggs.

The exact cause of PCOS is unknown. Early diagnosis and treatment along with weight loss may lower the risk of long-term complications such as type 2 diabetes and heart disease.

Symptoms
Symptoms of PCOS often start around the time of the first menstrual period. Sometimes symptoms develop later after you have had periods for a while.

The symptoms of PCOS vary. A diagnosis of PCOS is made when you have at least two of these:

Irregular periods. Having few menstrual periods or having period

RESPONSE_JSON is a dictionary. We want to seriaize this into a JSON-formatted string 

In [61]:
json.dumps(RESPONSE_JSON)


'{"1": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}, "2": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}, "3": {"mcq": "multiple choice question", "options": {"a": "choice here", "b": "choice here", "c": "choice here", "d": "choice here"}, "correct": "correct answer"}}'

Now we will put all of our work together using get_openai_callback. get-openai_callback allows us to track how many tokens are being used, which will affect how much you are paying to use GPT. 



In [62]:
##set the input variable values that we don't have yet:
NUMBER = 5
SUBJECT = 'PCOS'
TONE = 'simple'

In [63]:
#How to set up Token Usage tracking in langchain: 
#https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking

with get_openai_callback() as cb: 
    response = generate_evaluate_chain(
        {
            'text': TEXT, 
            'number': NUMBER, 
            'subject': SUBJECT, 
            'tone': TONE, 
            'response_json': json.dumps(RESPONSE_JSON)

        }
    )

    



[1m> Entering new SequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Text: Polycystic ovary syndrome (PCOS) is a problem with hormones that happens during the reproductive years. If you have PCOS, you may not have periods very often. Or you may have periods that last many days. You may also have too much of a hormone called androgen in your body.

With PCOS, many small sacs of fluid develop along the outer edge of the ovary. These are called cysts. The small fluid-filled cysts contain immature eggs. These are called follicles. The follicles fail to regularly release eggs.

The exact cause of PCOS is unknown. Early diagnosis and treatment along with weight loss may lower the risk of long-term complications such as type 2 diabetes and heart disease.

Symptoms
Symptoms of PCOS often start around the time of the first menstrual period. Sometimes symptoms develop later after you have had periods for a while.

The symptoms of PCO

In [66]:
##Before seeing our actual output, let's look at everything like tokens and cost
##Using our get_openai_callback: 

print(f"Total Tokens: {cb.total_tokens}")
print(f'Prompt Tokens: {cb.prompt_tokens}')
print(f'Completion tokens: {cb.completion_tokens}')
print(f'Total Cost: {cb.total_cost}')

Total Tokens: 2132
Prompt Tokens: 1673
Completion tokens: 459
Total Cost: 0.0034275


Now let's look at our response (it will be a dictionary)

In [67]:
response

{'text': "Polycystic ovary syndrome (PCOS) is a problem with hormones that happens during the reproductive years. If you have PCOS, you may not have periods very often. Or you may have periods that last many days. You may also have too much of a hormone called androgen in your body.\n\nWith PCOS, many small sacs of fluid develop along the outer edge of the ovary. These are called cysts. The small fluid-filled cysts contain immature eggs. These are called follicles. The follicles fail to regularly release eggs.\n\nThe exact cause of PCOS is unknown. Early diagnosis and treatment along with weight loss may lower the risk of long-term complications such as type 2 diabetes and heart disease.\n\nSymptoms\nSymptoms of PCOS often start around the time of the first menstrual period. Sometimes symptoms develop later after you have had periods for a while.\n\nThe symptoms of PCOS vary. A diagnosis of PCOS is made when you have at least two of these:\n\nIrregular periods. Having few menstrual per

In [69]:
##let's look at the 'quiz' key specifically:

quiz = response.get('quiz')
quiz

'\n{\n    "1": {\n        "mcq": "What is one of the common signs of PCOS?",\n        "options": {\n            "a": "Regular periods",\n            "b": "Fewer than nine periods a year",\n            "c": "Periods occurring every 28 days",\n            "d": "Periods lasting less than 3 days"\n        },\n        "correct": "b"\n    },\n    "2": {\n        "mcq": "What hormone may be present in excess in the body of someone with PCOS?",\n        "options": {\n            "a": "Estrogen",\n            "b": "Progesterone",\n            "c": "Androgen",\n            "d": "Insulin"\n        },\n        "correct": "c"\n    },\n    "3": {\n        "mcq": "What is a possible cause of PCOS according to the text?",\n        "options": {\n            "a": "Excessive vitamin intake",\n            "b": "Low-grade inflammation",\n            "c": "Regular exercise",\n            "d": "High levels of vitamin D"\n        },\n        "correct": "b"\n    },\n    "4": {\n        "mcq": "What is a potent

In [71]:
##json_loads the quiz to make it readable: 
quiz = json.loads(quiz)

quiz

{'1': {'mcq': 'What is one of the common signs of PCOS?',
  'options': {'a': 'Regular periods',
   'b': 'Fewer than nine periods a year',
   'c': 'Periods occurring every 28 days',
   'd': 'Periods lasting less than 3 days'},
  'correct': 'b'},
 '2': {'mcq': 'What hormone may be present in excess in the body of someone with PCOS?',
  'options': {'a': 'Estrogen',
   'b': 'Progesterone',
   'c': 'Androgen',
   'd': 'Insulin'},
  'correct': 'c'},
 '3': {'mcq': 'What is a possible cause of PCOS according to the text?',
  'options': {'a': 'Excessive vitamin intake',
   'b': 'Low-grade inflammation',
   'c': 'Regular exercise',
   'd': 'High levels of vitamin D'},
  'correct': 'b'},
 '4': {'mcq': 'What is a potential complication of PCOS?',
  'options': {'a': 'Hypertension',
   'b': 'Osteoporosis',
   'c': 'Gallstones',
   'd': 'Infertility'},
  'correct': 'd'},
 '5': {'mcq': 'What is a possible long-term complication of PCOS related to the liver?',
  'options': {'a': 'Liver cancer',
   'b':

In [73]:
##create a dataframe using the quiz: 

quiz_table_data = [] 
for key, value in quiz.items():
    mcq = value['mcq']
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value['options'].items()
        ]
    )
    correct = value['correct']
    quiz_table_data.append({'MCQ': mcq, 'Choices': options, 'Correct': correct})

    

In [74]:
quiz_table_data

[{'MCQ': 'What is one of the common signs of PCOS?',
  'Choices': 'a: Regular periods | b: Fewer than nine periods a year | c: Periods occurring every 28 days | d: Periods lasting less than 3 days',
  'Correct': 'b'},
 {'MCQ': 'What hormone may be present in excess in the body of someone with PCOS?',
  'Choices': 'a: Estrogen | b: Progesterone | c: Androgen | d: Insulin',
  'Correct': 'c'},
 {'MCQ': 'What is a possible cause of PCOS according to the text?',
  'Choices': 'a: Excessive vitamin intake | b: Low-grade inflammation | c: Regular exercise | d: High levels of vitamin D',
  'Correct': 'b'},
 {'MCQ': 'What is a potential complication of PCOS?',
  'Choices': 'a: Hypertension | b: Osteoporosis | c: Gallstones | d: Infertility',
  'Correct': 'd'},
 {'MCQ': 'What is a possible long-term complication of PCOS related to the liver?',
  'Choices': 'a: Liver cancer | b: Nonalcoholic steatohepatitis | c: Cirrhosis | d: Hepatitis A',
  'Correct': 'b'}]

In [75]:
##now put the above into a DF: 

quiz_df = pd.DataFrame(quiz_table_data)

quiz_df 

Unnamed: 0,MCQ,Choices,Correct
0,What is one of the common signs of PCOS?,a: Regular periods | b: Fewer than nine period...,b
1,What hormone may be present in excess in the b...,a: Estrogen | b: Progesterone | c: Androgen | ...,c
2,What is a possible cause of PCOS according to ...,a: Excessive vitamin intake | b: Low-grade inf...,b
3,What is a potential complication of PCOS?,a: Hypertension | b: Osteoporosis | c: Gallsto...,d
4,What is a possible long-term complication of P...,a: Liver cancer | b: Nonalcoholic steatohepati...,b


In [76]:
##Now put this quiz into a CSV: 

quiz_df.to_csv('pcos_mc_quiz.csv', index = False)


#### NOW WE CAN USE ALL OF THE ABOVE INTO A WEB APP!!

Maybe Utilize GPT 2?

In [9]:
from transformers import pipeline, set_seed

In [12]:
from dotenv import load_dotenv ## needed to access any keys that are in .env file

load_dotenv() ##Loads all values from .env file

KEY = os.getenv('HF_ACCESS_TOKEN') ##key is stored in .env file


In [14]:
#Check to see if key loaded
print(KEY)

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
