# Evaluating Curriculum Rigor
## Background
In my experience with high school curriculum, I have found a wide variation in the rigor of course material.  This project seeks to develop a tool for evaluating the rigor of a curriculum, by measuring its alignment to the College Board's respective AP Course.  This project focuses on the College Board's AP Computer Science A course, which covers a first year Java and Object Orientied Design course.

For this course, the College Board defines a set of "Computational Thinking Practices" (skills) and content that will be assessed on a year-end summative assessment to determine student's mastery of the course.   

There are 5 main Computational Thinking Practices identified by the College Board, which it then breaks down into subskills:  

<img src="Reports/Images/Skills-List.png" width=600px> 

In addition, the College Board defines a set of "Essential Knowledge" (the content) to be assessed in the course, which it organizes under 5 "Big Ideas."  For example, the content for a lesson on iteration is: 

<img src="Reports/Images/Content-Sample.png" width=400px>

Every question on the College Board's end-of-course summative exam is aligned to a particular computational thinking skill and essential knowledge.  As a note, some school networks have found the College Board's standards to be very complete, and "backwards plan" their middle school and pre-AP high school courses to prepare students for the AP level work. 

As a first step, this project will focus on the assessment questions used in a particular curriculum, and measure how well they align to the College Board's Computational Thinking Practice and Curriculum Framework.  (As a note, AP classes in most subjects have an analagous set of  thinking practices and framework standards, so one day, this work may be generalized to assess curriculums in other subject areas.)

Two questions to assess are:  

1. Can a TF-IDF vectorization of College Board question prompt with a Logistic Regressor successfully classify an assessment question by Computational Thinking Practice?
2. If ChatGPT is supplied only with the College Board Framework for Computational Thinking, can it successfully identify the particular thinking practice being assessed by a question prompt?

### Initial Conclusions:
1. The TF-IDF and Logistic Regression together classify questions with a 74% accuracy rate when trained on just prompts.
2. ChatGPT, supplied with the College Board Framework, classify with a 47% accuracy. 
3. Logistic Regression does not benefit from supplying the entire test question, as it selects erroneous words as signifiers.  Need more test data.

### Next Steps:
1. Determine whether the classifier can also identify the "Essential Knowledge" assessed by the question, not just the computational skill.
2. Attempt to generalize the classifiers to classify non-assessment questions such as lecture material, lab questions, and homework problems.
3. Create a visualization that shows the distribution of thinking skills and content assessed over the course of the curriculum.

## Classifying Questions Using Logistic Regression
As first step, this section will try to classify prompts as assessing one of these two AP Computational Thinking Practices (CTP):
1. **CTP 2.A**: Apply the meaning of specific operators.  For example:  
*Consider the following code segment.*  
```
int x = 7;  
int y = 3;  
if ((x < 10) && (y < 0))  
  System.out.println(""Value is: "" + x * y);  
else  
  System.out.println(""Value is: "" + x / y) 
```
*What is printed as a result of executing the code segment?*

2. **CTP 2.B**: Determine the results or output based on statement execution order in a code segment without method calls (except for output).  For example:  

*Consider the following code segment.* 
```
int[] arr = {7, 2, 5, 3, 0, 10};  
for (int k = 0; k < arr.length - 1; k++)  {  
  if (arr[k] > arr[k + 1])  
    System.out.print(k + "" "" + arr[k] + "" "");  
} 
``` 
*What will be printed as a result of executing the code segment?*  

## Preprocessing the Data
In this first step, we will:
1. Read in example prompts for each category.
2. Preprocess the data: tokenize, lemmatize, and look at the token frequency distribution by question category.
3. Build and test a Logistic Regression Classifier.

In [39]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import RegexpTokenizer
from nltk import FreqDist

In [40]:
df1 = pd.read_csv("Data/Synthetic/Synthetic2A.csv")
df2 = pd.read_csv("Data/Synthetic/Synthetic2B.csv")
df = pd.concat([df1, df2])

In [41]:
basic_token_pattern = r"(?u)\b\w\w+\b"

tokenizer = RegexpTokenizer(basic_token_pattern)
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    temp = tokenizer.tokenize(text.lower())
    return [lemmatizer.lemmatize(w) for w in temp]

tfidf = TfidfVectorizer(strip_accents='ascii', tokenizer=lemmatize_text, stop_words="english", max_features=10)

result = tfidf.fit_transform(df["Question"])

X = pd.DataFrame(result.toarray(), columns=tfidf.get_feature_names_out())





In [42]:
df["Question_lem"] = df["Question"].apply(lemmatize_text)
df["Question_lem"]

0    [consider, the, following, code, segment, assu...
1    [consider, the, following, code, segment, assu...
2    [consider, the, following, code, segment, assu...
3    [consider, the, following, code, segment, assu...
4    [consider, the, following, code, segment, assu...
5    [consider, the, following, code, segment, assu...
6    [consider, the, following, code, segment, assu...
7    [consider, the, following, code, segment, assu...
8    [consider, the, following, code, segment, assu...
9    [consider, the, following, code, segment, assu...
0    [consider, the, following, code, segment, int,...
1    [consider, the, following, code, segment, int,...
2    [consider, the, following, code, segment, int,...
3    [consider, the, following, code, segment, int,...
4    [consider, the, following, code, segment, stri...
5    [consider, the, following, code, segment, int,...
6    [consider, the, following, code, segment, char...
7    [consider, the, following, code, segment, int,...
8    [cons

## Build and Test a Logistic Regression Classifier on the Questions

In [43]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [47]:
X_train, X_test, y_train, y_test = train_test_split(X, df["Classification"])

lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.score(X_train, y_train)

1.0

In [50]:
y_test_pred = lr.predict(X_test)
print(classification_report(y_test, y_test_pred))

              precision    recall  f1-score   support

         2.A       1.00      1.00      1.00         2
         2.B       1.00      1.00      1.00         3

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5

