# Evaluating Curriculum Rigor
In my experience with high school curriculum, I have found a wide variation in the rigor of course material.  This project seeks to develop a tool for evaluating the alignment of a particular curriculum to the College Board's AP Computer Science A framework.  As a first step, this project poses two different questions:

1. Can a TF-IDF vectorization of College Board questions with a Logistic Regressor successfully classify an assessment question by Computational Thinking Practice?
2. If ChatGPT is supplied with the College Board Framework for Computational Thinking, can it successfully identify the particular thinking practice being assessmed by a question?

### Initial Conclusions:
1. The TF-IDF and Logistic Regression together classify questions with a 74% accuracy rate.
2. ChatGPT, supplied with the College Board Framework, classify with a 47% accuracy.  

### Next Steps:
1. Supply the entire assessment question, not just the question prompt, to each classifier to help with classification.
2. Determine whether the classifier can also identify the "Essential Knowledge" assessed by the question, not just the computational skill.
3. Attempt to generalize the classifiers to classify non-assessment questions such as lecture material, lab questions, and homework problems.
4. Create a visualization that shows the distribution of thinking skills and content assessed over the course of the curriculum.

## Read in all questions from multiple documents

In [None]:
from docx import Document
import pandas as pd
import os

document = Document("Data/U1Ll, TST-AP Comp Sci A_Unit 1_Week 4_L6_Unit 1 Test.docx")

In [None]:
data = []
document.paragraphs[9].text.split("")

In [None]:
data

In [None]:
pd.DataFrame(data)

In [None]:
import re
text = "Hi Jeff!  How are you?"

result = re.split(r"[.!?]", text)
result

In [None]:
p = re.compile('[A-Za-z ]+[?]')
p.findall(text)


In [None]:
questions = []

for filename in os.listdir("Data/"):
  document = Document("Data/" + filename)
  for paragraph in document.paragraphs:
    result = p.findall(paragraph.text)
    if len(result) > 0:
      questions.append(result[0])
questions

## Read in College Board Questions and Areas of Interest/Data Types

Quickly try to identify classification by taking questions independently - does not work well (30%)

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
from sklearn.model_selection import train_test_split

corpus = ["This is the first document.", "This second document is the second document.", "And this is the third one.",
          "Is this the first document?" ]

vectorizer = TfidfVectorizer(stop_words="english")
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()
print(X.toarray())
pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())


In [None]:
df = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")

In [None]:
df["Classification"].unique()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df["Prompt"], df["Classification"])

temp2 = pd.DataFrame(zip(X_train, y_train), columns=["Prompt", "Classification"])

temp = []
for question_class in temp2["Classification"].unique():
  temp_str = ""
  for x in df.loc[df.Classification == question_class, "Prompt"].values:
     temp_str += x
  temp.append({"class":question_class, "text": temp_str})
temp

In [None]:
df2 = pd.DataFrame(temp)
df2
vectorizer = TfidfVectorizer(stop_words="english")
X = vectorizer.fit_transform(df2["text"])
df2 = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out(), index=df2["class"])
#df3 = df3.drop(["10", "12", "15", "16", "987654321", "b1", "b2"], axis=1)
df2


In [None]:
import numpy as np
lr = LogisticRegression()
lr.fit(df2, df2.index)


In [None]:
lr.score(df2, df2.index)

In [None]:
df4 = pd.read_csv("Data/CollegeBoard/SamplePrompts-CED.csv")

In [None]:
df4

In [None]:
X_test_vect = vectorizer.transform(df4["Prompt"])
X_test_vect = pd.DataFrame(X_test_vect.toarray(), columns=vectorizer.get_feature_names_out())
#X_test2_vect = X_test2_vect.drop(["10", "12", "15", "16", "987654321", "b1", "b2"], axis=1)

y_test = df4["Classification"]

In [None]:
lr.score(X_test_vect, y_test)

In [None]:
y_test2_pred = lr.predict(X_test2_vect)
y_test2_pred

In [None]:
y_test2.values

In [None]:
df4.loc[0, "Prompt"]

In [None]:
from sklearn.metrics import confusion_matrix

confusion_matrix(y_test2_pred, y_test2)

In [None]:
df_2014 = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2014.csv")

In [None]:
X_test = vectorizer.transform(df_2014["Prompt"])
X_test = pd.DataFrame(X_test.toarray(), columns=vectorizer.get_feature_names_out())
y_test = df_2014["Classification"]

In [None]:
y_test

In [None]:
lr.score(X_test, y_test)


## Try training with both 2020 and 2014 questions, and then test on CED questions

In [None]:
df_2014 = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2014.csv")
#df_2020 = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")
#df = pd.concat([df_2014, df_2020])
classifiers_unique = df_2014["Classification"].unique()

In [None]:
training_text = []
for x in classifiers_unique:
    text_string = ""
    for y in df.loc[df.Classification == x, "Prompt"]:
        text_string += y
    training_text.append({"Classification":x, "Prompt":text_string})
training_text


In [None]:
df = pd.DataFrame(training_text)
df

In [None]:
vectorizer = TfidfVectorizer(stop_words="english")
X = vectorizer.fit_transform(df["Prompt"])
X = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out(), index=df["Classification"])

In [None]:
lr = LogisticRegression()
lr.fit(X, X.index )

### Test Logistic Regression

In [None]:
df_CED = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")
df_CED=df_CED[~df_CED["Classification"].isin(["2.D", "5.C", "5.D"])]
X_test = vectorizer.transform(df_CED["Prompt"])
X_test = pd.DataFrame(X_test.toarray(), columns=vectorizer.get_feature_names_out())
lr.score(X_test, df_CED["Classification"])
y_test_pred = lr.predict(X_test)

In [None]:
confusion_matrix(df_CED["Classification"], y_test_pred)

In [None]:
from sklearn.metrics import classification_report
print(classification_report(df_CED["Classification"], y_test_pred))

## OpenAI

In [None]:
from openai import OpenAI

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[{"role": "user", "content": "Why is Notre Dame football so famous?"}]
)

In [None]:
print(response.choices[0].message.content)

In [None]:
prompt_start = "Here are the categories for AP questions. \
1.B: Determine code that would be used to complete code segments \
1.C: Determine code that would be used to interact with completed program code. \
2.A: Apply the meaning of specific operators \
2.B: Determine the result or output based on statement execution order in a code segment without method calls (other than output) \
2.C: Determine the result or output based on the statement execution order in a code segment containing method calls. \
2.D: Determine the number of times a code segment will execute. \
4.A: Use test-cases to find errors or validate results. \
4.B: Identify errors in program code. \
4.C: Determine if two or more code segments yield equivalent results. \
5.A: Determine the behavior of a given segment of program code. \
5.B: Explain why a code segment will not compile or work as intended \
5.C: Explain how the result of program code changes, given a change to the initial code. \
5.D: Describe the initial conditions that must be met for a program segment to work as intended or described. \
Which of the categories above best classifies this question prompt below? "

In [None]:
prompt_start

In [None]:
def gpt_guess(prompt):
  response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[{"role": "user", "content": prompt_start + prompt}]
  )
  return response.choices[0].message.content


In [None]:
df = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")

for i in df.index:
    df.loc[i,"GPT_Pred"] = gpt_guess(df.loc[i,"Prompt"])
    

In [None]:
import pandas as pd
df = pd.read_csv("Data/output1.csv")

In [None]:
for i in df.index:
  print(df.loc[i, "GPT_Pred"])
  code = input("What is the code?")
  df.loc[i, "GPT_Code"] = code

In [None]:
df.loc[0, "GPT_Code"] = "1.B"

In [None]:
df

In [None]:
from sklearn.metrics import classification_report

print(classification_report(df["Classification"], df["GPT_Code"]))

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

cm = confusion_matrix(df["Classification"], df["GPT_Code"])
(ConfusionMatrixDisplay(cm)).plot()

### Try to use simplified prompt with work

In [None]:
prompt_start2 = "Here are the categories for AP questions. \
1.B: Determine code that would be used to complete code segments \
1.C: Determine code that would be used to interact with completed program code. \
2.A: Apply the meaning of specific operators \
2.B: Determine the result or output based on statement execution order in a code segment without method calls (other than output) \
2.C: Determine the result or output based on the statement execution order in a code segment containing method calls. \
2.D: Determine the number of times a code segment will execute. \
4.A: Use test-cases to find errors or validate results. \
4.B: Identify errors in program code. \
4.C: Determine if two or more code segments yield equivalent results. \
5.A: Determine the behavior of a given segment of program code. \
5.B: Explain why a code segment will not compile or work as intended \
5.C: Explain how the result of program code changes, given a change to the initial code. \
5.D: Describe the initial conditions that must be met for a program segment to work as intended or described. "

prompt_question = "Using the categories previously listed, determine the category for this question prompt:"

In [None]:

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[{"role": "user", "content": prompt_start2},
            {"role": "user", "content": prompt_question+df.loc[0,"Prompt"]},
            {"role": "user", "content": prompt_question+df.loc[1,"Prompt"]},
            {"role": "user", "content": prompt_question+df.loc[2,"Prompt"]}
  ]
  )

response.choices[1].message.content

In [None]:
response

## Try Improving TF-IDF by Include Full Question Text in Document

## Try Extracting Text from PDF Automatically

In [None]:
import PyPDF2

def extract_text_from_pdf(pdf_path):
    text = "" 
    with open(pdf_path, "rb") as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()
    return text

if __name__ == "__main__":
    pdf_path = "Data/CollegeBoard/ap-computer-science-a-2014-practice-exam.pdf"
    extract_text = extract_text_from_pdf(pdf_path)

In [None]:
extract_text = extract_text.replace("\n","")

In [None]:
number = 1
questions = []

for number in range(1,40):
  begin = extract_text.find(str(number)+".")
  end = extract_text.find(str(number+1)+"2.")

  question = extract_text[begin:end]
  option_e = question.find("(E)")
  question = question[:option_e]
  questions.append({"number":number, "text":question})

In [None]:
df_questions = pd.DataFrame(questions)

In [None]:
import pandas as pd 

df = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2014.csv")

In [None]:
df["Question_Num"] = df["Source"].str.slice(14).astype(int)


In [None]:
df = df.merge(df_questions, left_on="Question_Num", right_on="number")
df.to_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2014.csv")

## Try Extracting Text from 2020 Test as well

In [None]:
def extract_text_from_pdf(pdf_path):
    text = "" 
    with open(pdf_path, "rb") as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()
    return text

if __name__ == "__main__":
    pdf_path = "Data/CollegeBoard/ap-computer-science-a-2020-practice-exam-and-notes-1.pdf"
    extract_text = extract_text_from_pdf(pdf_path)

In [None]:
extract_text = extract_text.replace("\n","")

questions = []

for number in range(1,40):
  begin = extract_text.find(str(number)+".")
  end = extract_text.find(str(number+1)+"2.")

  question = extract_text[begin:end]
  option_e = question.find("(E)")
  question = question[:option_e]
  questions.append({"number":number, "text":question})

df_questions = pd.DataFrame(questions)

df = pd.read_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")



In [None]:
df["Question_Num"] = df["Source"].str.slice(14).astype(int)
df = df.merge(df_questions, left_on="Question_Num", right_on="number")
df.to_csv("Data/CollegeBoard/SamplePrompts-PracticeExam2020.csv")

## Try Training Model on 2020 and Classify 2014