In [44]:
import json
import os
import random
import tqdm
from copy import deepcopy

from our_datasets.tutoreval_dataset import TutorEval
from utils.config import Config

In [2]:


tutor_eval = TutorEval(config=Config('config.yaml', dataset_dir='data'))

In [3]:
tutor_eval.load()

In [4]:
samples = tutor_eval.get_test_samples()

In [5]:
closed = [s for s in samples if s['closed_book']]

In [6]:
len(samples), len(closed)

(834, 370)

In [7]:
# filter out misleading
print("Length before filtering", len(closed))
closed = [c for c in closed if c['misleading_question'] == False]
print("Length after filtering", len(closed))

Length before filtering 370
Length after filtering 322


In [8]:
shuffled_closed = random.sample(closed, len(closed))

In [9]:
shuffled_closed[0]['question']

'How can we determine if global warming is due to human activities, not natural cycles?'

In [10]:
open_ai_key = '???'

In [11]:
from openai import OpenAI
client = OpenAI(api_key=open_ai_key)
def ask_openai(prompt: str) -> str:


    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": prompt}
        ],
    )

    return completion.choices[0].message.content


In [12]:
my_samples = deepcopy(shuffled_closed)
for c in my_samples:
    if isinstance(c['key_points'], str):
        key_points = [k.strip() for k in c['key_points'].split('-') if k != '']
        c['key_points'] = key_points

In [14]:
from collections import Counter

Counter([len(c['key_points']) for c in my_samples])

Counter({2: 143, 3: 66, 1: 55, 4: 40, 5: 10, 6: 4, 7: 3, 9: 1})

In [27]:
idx = 10
print(shuffled_closed[idx]['question'], '\n\n', my_samples[idx]['key_points'],'\n\n', shuffled_closed[idx]['key_points'])

So far, we have been dealing with rational tangles. Are there other, more complicated types of tangles? 

 ['Point out the general definition of a tangle.', 'Give example of another class of tangles (e.g. algebraic tangle)'] 

 - Point out the general definition of a tangle.
- Give example of another class of tangles (e.g. algebraic tangle)


In [31]:
def rephrase_prompt(question: str, key_points: list) -> str:
    return f"""
    You are given the following key points: {'\n'.join(key_points)}

    Those are the key points to answer the question: {question}

    You are NOT allowed to answer the question by yourself. Use only the infromation from keypoints.

    Reformulate the given key points to imitate the answer to the question.

    Here is the answer with only provided key points:
    """

steps = 0
for c in my_samples:
    # randomly shuffle keypoints

    key_points = random.sample(c['key_points'], len(c['key_points']))

    r_promt = rephrase_prompt(c['question'], key_points)

    print("Question:", c['question'])
    print("Keypoint's:", '\n', '\n'.join(c['key_points']))

    response = ask_openai(r_promt)

    print("Response:", response)

    print('=' * 20)

    steps += 1
    if steps > 5:
        break

Question: How can we determine if global warming is due to human activities, not natural cycles?
Keypoint's: 
 Understanding the historical patterns of natural cycles helps to identify the differences


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: To determine if global warming is due to human activities rather than natural cycles, we need to understand the historical patterns of these natural cycles. By identifying the differences between past natural climate changes and current climate trends, we can assess the likelihood that recent global warming is primarily the result of human activities rather than natural processes.
Question: What is the easiest way to make a while loop that never stops executing?
Keypoint's: 
 explain that while loop will never stop executing if the condition is always satisfied
the easiest example is while (1 == 1), which is always true
an even easier example is while True, which is always true


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: The easiest way to make a while loop that never stops executing is to use a condition that is always true. The simplest example is `while (1 == 1)`, as the condition `1 == 1` is always true. An even more straightforward example is `while True`, as the condition `True` is inherently always true. In both cases, since the condition is constantly satisfied, the while loop will never stop executing.
Question: Does the underlying implementation of arrays in C++ have any underlying data stucture?
Keypoint's: 
 Arrays are implemented as contiguous blocks of memory.
Arrays do not depend on complicated data structures.


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: The underlying implementation of arrays in C++ consists of contiguous blocks of memory and does not rely on complicated data structures.
Question: What are the disadvantages of file-based systems?
Keypoint's: 
 Discuss the data redundancy and inconsistency
Discuss the data isolation, integrity problems, and security issues.


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: File-based systems often suffer from data redundancy and inconsistency. This occurs when the same piece of data is stored in multiple places, leading to unnecessary duplication and potential conflicts if any of the copies are updated independently of one another.

Additionally, data isolation can be a significant problem in file-based systems. Data is often stored in separate files, making it difficult to access and integrate information across different datasets. This fragmentation complicates data retrieval and hinders the ability to perform comprehensive data analysis.

Integrity problems are also a notable disadvantage. With data dispersed across multiple files, maintaining accuracy and consistency in data becomes challenging. The lack of centralized control can lead to discrepancies, making it difficult to ensure the data remains reliable over time.

Finally, security issues are another concern with file-based systems. The decentralized nature of storing data in various 

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: One of the main challenges in the study of Fourier series involves establishing the conditions under which these series converge. This is crucial because convergence determines whether or not the series accurately represents a function. Furthermore, it's also important to rigorously demonstrate when functions can be expressed as Fourier series, as this foundational aspect underpins much of the theory and application of Fourier analysis.

By addressing these challenges, students and researchers can gain significant geometric insights into the nature of function spaces. Understanding these spaces can provide deeper knowledge about the structure and behavior of functions, which is essential for advancing both mathematical theory and practical applications.
Question: Can artificial intelligence (AI) help improve solid waste management?
Keypoint's: 
 Yes, it's possible. In fact, they are already many examples in application.
Think about the sensorable properties of solid waste and

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Response: Yes, it's possible for artificial intelligence (AI) to help improve solid waste management. In fact, there are already numerous examples of AI being applied in this field. Consider the sensorable properties of solid waste and the overarching objectives of solid waste management to understand how AI can contribute to advancements.


In [32]:
multi_kps_qs = [s for s in my_samples if len(s['key_points']) > 1]

print(len(multi_kps_qs))

267


In [37]:
multi_kps_qs[0].keys()

dict_keys(['chapter', 'question', 'key_points', 'closed_book', 'answer_in_chapter', 'misleading_question', 'difficulty', 'domain', 'path_to_chapter'])

In [60]:
abcd = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

samples_to_grade = []

for s in multi_kps_qs:
    key_points = s['key_points']

    for masked in range(len(key_points)):
        # key_points[masked] will be masked
        key_points_mask = list(abcd[:len(key_points)])
        key_points_mask[masked] = '_'

        key_points_masked = deepcopy(key_points)
        key_points_masked.pop(masked)

        sample = deepcopy(s)
        sample['original_key_points'] = key_points
        sample['key_points'] = key_points_masked
        sample['key_points_mask'] = key_points_mask

        samples_to_grade.append(sample)

In [45]:
result_samples_to_grade = []

for s in tqdm.tqdm(samples_to_grade):
    r_promt = rephrase_prompt(s['question'], s['key_points'])

    response = ask_openai(r_promt)

    sample = deepcopy(s)

    sample['answer'] = response

    result_samples_to_grade.append(sample)

  0%|          | 0/748 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  0%|          | 1/748 [00:01<14:07,  1.13s/it]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  0%|          | 2/748 [00:02<12:32,  1.01s/it]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  0%|          | 3/748 [00:02<11:27,  1.08it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  1%|          | 4/748 [00:03<10:55,  1.14it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  1%|          | 5/748 [00:04<12:47,  1.03s/it]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  1%|          | 6/748 [00:06<14:09,  1.15s/it]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  1%|          | 7/748 [00:07<16:07,  1.31

In [64]:
for s_orig, s in zip(samples_to_grade, result_samples_to_grade):
    s['original_key_points'] = s_orig['original_key_points']

In [68]:
with open('result_samples_to_grade.json', 'w') as f:
    json.dump(result_samples_to_grade, f)

In [72]:
result_samples_to_grade[0]

{'chapter': 'Introduction to Test Before Loops\n\nThere are two commonly used test before loops in the iteration (or repetition) category of control structures. They are: while and for. This module covers the: while.\n\nUnderstanding Iteration in General - while\n\nThe concept of iteration is connected to possibly wanting to repeat an action. Like all control structures we ask a question to control the execution of the loop. The term loop comes from the circular looping motion that occurs when using flowcharting. The basic form of the while loop is as follows:\n\n```initialization of the flag\nwhile the answer to the question is true then do\nsome statements or action\nsome statements or action\nsome statements or action\nupdate the flag\n```\n\nIn almost all languages the question (called a test expression) is a Boolean expression. The Boolean data type has two values – true and false. Let\'s rewrite the structure to consider this:\n\n```initialization of the flag\nwhile the expressio

In [70]:
def grader_prompt(question: str, key_points: list, key_point_labels: list, answer: str) -> str:
    key_point_repr = [f"{label}: {p}" for label, p in zip(key_point_labels, key_points)]

    return f"""
    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: {'\n'.join(key_point_repr)}

    Those are the key points to answer the question: {question}

    Here is the student's answer: {answer}

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    """
steps = 0
for s in result_samples_to_grade:
    key_points = s['original_key_points']
    key_points_labels = list(abcd[:len(key_points)])
    real_labels = s['key_points_mask']

    print(key_points_labels, real_labels)

    g_prompt = grader_prompt(s['question'], key_points, key_points_labels, s['answer'])

    response = ask_openai(g_prompt)

    print(g_prompt)

    print(response)

    print('=' * 20)

    steps += 1
    if steps > 5:
        break


['A', 'B', 'C'] ['_', 'B', 'C']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: explain that while loop will never stop executing if the condition is always satisfied
B: the easiest example is while (1 == 1), which is always true
C: an even easier example is while True, which is always true

    Those are the key points to answer the question: What is the easiest way to make a while loop that never stops executing?

    Here is the student's answer: One way to create a while loop that never stops executing is by using `while (1 == 1)`, as this condition is always true. An even simpler example is `while True`, which is inherently always true and thus results in an infinite loop.

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    
[A, B, C]
['A', 'B', 'C'] ['A', '_', 'C']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: explain that while loop will never stop executing if the condition is always satisfied
B: the easiest example is while (1 == 1), which is always true
C: an even easier example is while True, which is always true

    Those are the key points to answer the question: What is the easiest way to make a while loop that never stops executing?

    Here is the student's answer: A while loop will never stop executing if the condition is always satisfied. An even easier example of a loop that never stops is using `while True`, which is always true.

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    
[A, C]
['A', 'B', 'C'] ['A', 'B', '_']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: explain that while loop will never stop executing if the condition is always satisfied
B: the easiest example is while (1 == 1), which is always true
C: an even easier example is while True, which is always true

    Those are the key points to answer the question: What is the easiest way to make a while loop that never stops executing?

    Here is the student's answer: To create a while loop that never stops executing, ensure the condition is always satisfied. The simplest example is `while (1 == 1)`, which is perpetually true.

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    
[A, B]
['A', 'B'] ['_', 'B']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: Arrays are implemented as contiguous blocks of memory.
B: Arrays do not depend on complicated data structures.

    Those are the key points to answer the question: Does the underlying implementation of arrays in C++ have any underlying data stucture?

    Here is the student's answer: The implementation of arrays in C++ does not rely on any complicated data structures.

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    
[B]
['A', 'B'] ['A', '_']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: Arrays are implemented as contiguous blocks of memory.
B: Arrays do not depend on complicated data structures.

    Those are the key points to answer the question: Does the underlying implementation of arrays in C++ have any underlying data stucture?

    Here is the student's answer: The underlying implementation of arrays in C++ consists of contiguous blocks of memory.

    Return the list of the key points that are mentioned in the answer in the following format: [A, C, ..]
    
[A]
['A', 'B'] ['_', 'B']


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



    You are the grader AI, your job is to review student answer and determine which key points in contains.

    You are given the following key points: A: Discuss the data redundancy and inconsistency
B: Discuss the data isolation, integrity problems, and security issues.

    Those are the key points to answer the question: What are the disadvantages of file-based systems?

    Here is the student's answer: File-based systems often suffer from data isolation issues, meaning the data is stored in separate files, making it challenging to access, share, or integrate data from different sources. This isolation can lead to problems with data integrity, where maintaining accurate and consistent data across files becomes difficult due to the lack of centralized control. Furthermore, these systems face security issues, as they typically do not provide comprehensive mechanisms for data protection, leading to potential vulnerabilities and unauthorized access.

    Return the list of the key p