1. Use bi-encoder to find the top 25 misconceptions given question

bi-encoder model candidates:

*   sentence-transformers/all-MiniLM-L6-v2 - lightweight model
*   sentence-transformers/msmarco-MiniLM-L6-cos-v5 - optimized for semantic search and retrieval, understanding context
*   sentence-transformers/paraphrase-MiniLM-L6-v2 - fine-tuned on paraphrase data, effective at identifying similar meanings

things to try:

*   try different question text arrangements -> including Subject and Construct gave better results
*   try all 3 bi-encoder models -> as miscon embedding is only encoded once (1 min), it only takes 1 min for test data. the first one was best
*   if cross-encoder can take more data, we can increase 25 to 50 -> as miscon embedding is encoded together with question, cross-encoder runs 8 secs per data even with 25 (1h for all test data)
*   simple algorithm's performance is better, try clustering misconceptions and pre-filtering before passing into bi-encoder
*   try fine-tuning bi-encoder


evaluation: on the test set, evaluate how many times the selected 25 misconceptions contain the correct one (can compare with the simple algorithm in no_llm.ipynb)


2. Use cross-encoder to rerank the 25 misconceptions

cross-encoder model candidates:

*   cross-encoder/ms-marco-MiniLM-L-6-v2 - designed for question answering and relevance tasks
*   cross-encoder/stsb-roberta-base - fine-tuned for detailed semantic matching
*   cross-encoder/nli-deberta-v3-small - strong performance on natural language inference tasks

evaluation: use mapk score directly

3. Use the result from #2 directly, use it as examples for LLM prompt for verification/steps, or use it to generate/retrieve questions based on misconception (this is an expanded usecase that can be included as project motivation for maths education)

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
#read data
import pandas as pd
import numpy as np
miscon_df = pd.read_csv('misconception_mapping.csv')
train_df = pd.read_csv('train_data.csv')
train_df = train_df.sort_values(by='QuestionId')
test_df = pd.read_csv('test_data.csv')
test_df = test_df.sort_values(by='QuestionId')

In [3]:
#process data to obtain question-answer pairs
def process_df(data_df):
  df = pd.DataFrame()

  misconception_map = pd.Series(miscon_df.MisconceptionName.values, index=miscon_df.MisconceptionId).to_dict()
  rows = []
  for _, row in data_df.iterrows():
      incorrect_answers = {
          'A': (row['AnswerAText'], row['MisconceptionAId']),
          'B': (row['AnswerBText'], row['MisconceptionBId']),
          'C': (row['AnswerCText'], row['MisconceptionCId']),
          'D': (row['AnswerDText'], row['MisconceptionDId'])
      }
      for answer_key in ['A', 'B', 'C', 'D']:
          if answer_key == row['CorrectAnswer']:
              continue

          answer_text, misconception_id = incorrect_answers[answer_key]

          misconception_name = misconception_map.get(misconception_id, "Unknown")

          if not misconception_name == "Unknown":
            rows.append({
                'QuestionId': row['QuestionId'],
                'SubjectName': row['SubjectName'],
                'ConstructName': row['ConstructName'],
                'QuestionText': row['QuestionText'],
                'AnswerText': answer_text,
                'MisconceptionId': misconception_id,
                'MisconceptionName': misconception_name
            })

  df = pd.DataFrame(rows)

  return df

In [4]:
train_df = process_df(train_df)


In [12]:
test_df = process_df(test_df)


In [13]:
pd.options.display.max_rows = None
print(train_df.iloc[7]['MisconceptionName'])

Does not know the properties of a rectangle


Step 1: Use bi-encoder to find the top 25 misconceptions given question

test 1 row of data:

QuestionText: The angles highlighted on this rectangle with different length sides can never be... !A rectangle with the diagonals drawn in. The angle on the right hand side at the centre is highlighted in red and the angle at the bottom at the centre is highlighted in yellow.()

MisconceptionName: Does not know the properties of a rectangle

AnswerText: acute

SubjectName: Properties of Quadrilaterals

ConstructName: Recall and use the intersecting diagonals properties of a rectangle



In [44]:
from transformers import AutoTokenizer, AutoModel
import torch
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def get_miscon_embeddings(model_name):
    #model_name = "sentence-transformers/all-MiniLM-L6-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)

    misconceptions = miscon_df['MisconceptionName'].tolist()

    #encode misconceptions
    with torch.no_grad():
        misconception_embeddings = []
        for misconception in misconceptions:
            inputs = tokenizer(misconception, return_tensors="pt", padding=True)
            embedding = model(**inputs).last_hidden_state.mean(dim=1)  # Mean pooling
            misconception_embeddings.append(embedding.squeeze().numpy())

    return model, np.array(misconception_embeddings)


In [42]:
mini_l6_v2_model, mini_l6_v2_miscon_embeddings = get_miscon_embeddings('sentence-transformers/all-MiniLM-L6-v2')

In [45]:
mini_l6_v5_model, mini_l6_v5_miscon_embeddings = get_miscon_embeddings('sentence-transformers/msmarco-MiniLM-L6-cos-v5')


tokenizer_config.json:   0%|          | 0.00/430 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

In [55]:
para_l6_v2_model, para_l6_v2_miscon_embeddings = get_miscon_embeddings('sentence-transformers/paraphrase-MiniLM-L6-v2')


tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

In [50]:
def bi_encoder_top_25_miscons(row, model, misconception_embeddings):
    #query = row['SubjectName'] + '. ' + row['ConstructName'] + '. ' + row['QuestionText'] + ' The student thinks the answer is ' + row['AnswerText']
    query = row['SubjectName'] + '. ' + row['ConstructName'] + '. The question is ' + row['QuestionText'] + ' The student thinks the answer is ' + row['AnswerText']
    #print(f'query: {query}\n\n')
    with torch.no_grad():
        inputs = tokenizer(query, return_tensors="pt", padding=True)
        query_embedding = model(**inputs).last_hidden_state.mean(dim=1).squeeze().numpy()

    #calculate similarities and retrieve top-k misconceptions
    similarities = cosine_similarity([query_embedding], misconception_embeddings).flatten()
    top_k_indices = np.argsort(similarities)[-25:][::-1]

    #print("Top-K Relevant Misconceptions:")
    #for index in top_k_indices:
        #print(f"{misconceptions[index]} - Similarity: {similarities[index]:.4f}")
    actual = row['MisconceptionName']
    #print(f'\n\nActual misconception: {actual}')
    #if int(row['MisconceptionId']) in top_k_indices:
        #print('\n\npredicted list contains correct id')
    #print('\n\n\n\n')

    return top_k_indices

In [83]:
def apk(actual, predicted, k=25):
    if not actual:
        return 0.0

    actual = [actual]
    #comment below line if predicted is already a list
    #predicted = list(map(int, predicted.split()))

    if len(predicted) > k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0

    for i, p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i + 1.0)
    print(score / min(len(actual), k))
    return score / min(len(actual), k)

def mapk(actual, predicted, k=25):
    return np.mean([apk(a, p, k) for a, p in zip(actual, predicted)])

In [51]:
def evaluate_bi_encoder(model, miscon_embeddings):
  contains_count = 0
  mini_l6_v2_preds = []
  actual = []
  for i, row in test_df.iterrows():
    pred_list = bi_encoder_top_25_miscons(row, model, miscon_embeddings)
    mini_l6_v2_preds.append(pred_list)
    actual.append(row['MisconceptionId'])
    if int(row['MisconceptionId']) in pred_list:
      contains_count += 1
  contains_ratio = contains_count / len(test_df)
  print(f'ratio of mini_l6_v2 bi-encoder\'s top 25 containing correct misconception: {contains_ratio}')
  apk_score = mapk(actual, mini_l6_v2_preds)
  print(f'mapk of mini_l6_v2 bi-encoder\'s top 25: {apk_score}')

In [58]:
evaluate_bi_encoder(mini_l6_v2_model, mini_l6_v2_miscon_embeddings)

ratio of mini_l6_v2 bi-encoder's top 25 containing correct misconception: 0.5388127853881278
mapk of mini_l6_v2 bi-encoder's top 25: 0.16730411196539172


In [54]:
evaluate_bi_encoder(mini_l6_v5_model, mini_l6_v5_miscon_embeddings)

ratio of mini_l6_v2 bi-encoder's top 25 containing correct misconception: 0.408675799086758
mapk of mini_l6_v2 bi-encoder's top 25: 0.13426757579103438


In [57]:
evaluate_bi_encoder(para_l6_v2_model, para_l6_v2_miscon_embeddings)

ratio of mini_l6_v2 bi-encoder's top 25 containing correct misconception: 0.4246575342465753
mapk of mini_l6_v2 bi-encoder's top 25: 0.12419211061113306


Step 2. Use cross-encoder to rerank misconceptions

In [117]:
ms_macro_model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L-12-v2')
roberta_model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/stsb-roberta-base')
deberta_model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/nli-deberta-v3-small')

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/568M [00:00<?, ?B/s]

In [150]:
from transformers import AutoModelForSequenceClassification

def cross_encoder_rank_miscons(row, misconceptions, model_name):
    #model_name = "cross-encoder/ms-marco-MiniLM-L-12-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)

    #row = train_df.iloc[7]
    #query = row['SubjectName'] + '. ' + row['ConstructName'] + '. The question is ' + row['QuestionText'] + ' The student thinks the answer is ' + row['AnswerText']
    #misconceptions = miscon_df['MisconceptionName'].tolist()

    scores = []
    for misconception_idx in misconceptions:
        misconception_text = miscon_df['MisconceptionName'].tolist()[misconception_idx]
        #print(misconception_idx)
        query = f"[Question] {row['SubjectName'] + '. ' + row['ConstructName'] + '. ' + row['QuestionText']} [SEP] [Wrong Answer] {row['AnswerText']} [SEP] [Misconception] {misconception_text}"
        #print(query)
        inputs = tokenizer(query, return_tensors="pt")

        with torch.no_grad():
            outputs = model(**inputs)
            #print(outputs)
            #print(type(outputs))
            score = outputs.logits.item()
            scores.append((misconception_idx, score))

    ranked_misconceptions = sorted(scores, key=lambda x: x[1], reverse=True)
    #print("Ranked Misconceptions by Relevance:")
    #for misconception, score in ranked_misconceptions:
        #print(f"{misconception}: {score}")
    return [item[0] for item in ranked_misconceptions]


In [145]:
from torch.nn.functional import softmax

def cross_encoder_logits_rank_miscons(row, misconceptions, model_name):
    #tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)

    #row = train_df.iloc[7]
    #query = row['SubjectName'] + '. ' + row['ConstructName'] + '. The question is ' + row['QuestionText'] + ' The student thinks the answer is ' + row['AnswerText']
    #misconceptions = miscon_df['MisconceptionName'].tolist()
    #question_context = row['SubjectName'] + '. ' + row['ConstructName'] + '. The question is ' + row['QuestionText']

    scores = []
    for misconception_idx in misconceptions:
        misconception_text = miscon_df['MisconceptionName'].tolist()[misconception_idx]
        #query = f"Determine if the misconception explains the wrong answer given the question context. [Question] {question_context} [SEP] [Wrong Answer] {row['AnswerText']} [SEP] [Misconception] {misconception_text}"
        query = f"For {row['QuestionText']}, a math question on {row['SubjectName']} {row['ConstructName']}, the student {misconception_text}"
        candidate = f"The student derives wrong answer {row['AnswerText']}"
        input_query = f'[CLS] {query} [SEP] {candidate} [SEP]'
        print(input_query)

        inputs = tokenizer(input_query, return_tensors="pt")

        with torch.no_grad():
            outputs = model(**inputs)
            #process output logits
            probs = softmax(outputs.logits)
            entailment_prob = probs[0][0].item()
            scores.append((misconception_idx, entailment_prob))

    ranked_misconceptions = sorted(scores, key=lambda x: x[1], reverse=True)
    #print("Ranked Misconceptions by Relevance:")
    #for misconception, score in ranked_misconceptions:
        #print(f"{misconception}: {score}")
    return [item[0] for item in ranked_misconceptions]

In [148]:
def evaluate_cross_encoder(model_name, sample_size):
  model_preds = []
  actual = []
  for i, row in test_df.head(sample_size).iterrows():
    pred_list = bi_encoder_top_25_miscons(row, mini_l6_v2_model, mini_l6_v2_miscon_embeddings)
    #print('pred list: ')
    #print(pred_list)
    #print('\n\n')
    #skip cross encoder if bi encoder res didn't even include correct answer to save run time
    if int(row['MisconceptionId']) in pred_list:
      if model_name == 'cross-encoder/nli-deberta-v3-small':
        cross_encoder_ranked = cross_encoder_rank_miscons(row, pred_list, model_name)
        print(cross_encoder_ranked)
      else:
        cross_encoder_ranked = cross_encoder_rank_miscons(row, pred_list, model_name)
      model_preds.append(cross_encoder_ranked)
    else:
      model_preds.append([-1])
    actual.append(int(row['MisconceptionId']))
  apk_score = mapk(actual, model_preds)
  print(f'mapk of mini_l6_v2 bi-encoder\'s top 25: {apk_score}')

In [152]:
print(len(test_df))

438


In [153]:
evaluate_cross_encoder('cross-encoder/ms-marco-MiniLM-L-12-v2', 438)

0.0
1.0
0.0
0.0
1.0
0.0
0.07142857142857142
0.0
0.0
0.0
0.5
0.5
0.07692307692307693
0.0
0.0
0.0
0.16666666666666666
0.058823529411764705
0.0
0.0
0.0
0.0
0.058823529411764705
0.0
0.0
0.2
0.25
0.058823529411764705
0.0
0.0
0.0625
0.0
0.07692307692307693
0.08333333333333333
1.0
0.25
0.3333333333333333
0.0
0.5
0.0
0.0
0.0
0.07692307692307693
0.06666666666666667
0.0
0.047619047619047616
0.0
0.125
0.125
0.16666666666666666
0.3333333333333333
0.0
0.3333333333333333
0.5
0.5
0.0
1.0
0.2
0.125
0.5
0.125
0.0625
0.16666666666666666
0.1
0.0
0.0
0.2
1.0
0.0
0.05263157894736842
0.0
0.0
1.0
0.0
0.0
0.3333333333333333
0.5
0.5
1.0
0.16666666666666666
0.16666666666666666
0.1
0.0
0.3333333333333333
0.0
0.0
0.05555555555555555
0.0
0.0
0.047619047619047616
0.0
0.0
0.1111111111111111
1.0
0.16666666666666666
0.0
0.0
0.08333333333333333
0.0
0.047619047619047616
0.043478260869565216
0.5
0.3333333333333333
0.3333333333333333
0.0
0.0
0.1111111111111111
0.0
0.0
0.07142857142857142
0.09090909090909091
0.047619047619

In [116]:
evaluate_cross_encoder('cross-encoder/stsb-roberta-base', 50)

0.0
0.3333333333333333
0.0
0.0
0.2
0.0
1.0
0.0
0.0
0.0
0.5
0.5
0.058823529411764705
0.0
0.0
0.0
0.058823529411764705
0.05263157894736842
0.0
0.0
0.0
0.0
0.041666666666666664
0.0
0.0
0.045454545454545456
0.043478260869565216
0.058823529411764705
0.0
0.0
0.1111111111111111
0.0
0.0625
0.05
1.0
0.25
0.2
0.0
0.1
0.0
0.0
0.0
0.1
0.1111111111111111
0.0
0.1111111111111111
0.0
0.16666666666666666
0.2
0.2
mapk of mini_l6_v2 bi-encoder's top 25: 0.11111069947013547


In [146]:
evaluate_cross_encoder('cross-encoder/nli-deberta-v3-small', 20)

[CLS] For Convert this percentage to a fraction
\( 62 \% \), a math question on Converting between Fractions and Percentages Convert two digit integer percentages to fractions, the student Converts a fraction to a percentage by placing a decimal point between the numerator and denominator [SEP] The student derives wrong answer \( \frac{6}{2} \) [SEP]


  probs = softmax(outputs.logits)


[CLS] For Convert this percentage to a fraction
\( 62 \% \), a math question on Converting between Fractions and Percentages Convert two digit integer percentages to fractions, the student Converts a fraction to a percentage by writing the numerator followed by the denominator [SEP] The student derives wrong answer \( \frac{6}{2} \) [SEP]
[CLS] For Convert this percentage to a fraction
\( 62 \% \), a math question on Converting between Fractions and Percentages Convert two digit integer percentages to fractions, the student Thinks they can convert a 2 digit percentage to a fraction by writing one number over the other. [SEP] The student derives wrong answer \( \frac{6}{2} \) [SEP]
[CLS] For Convert this percentage to a fraction
\( 62 \% \), a math question on Converting between Fractions and Percentages Convert two digit integer percentages to fractions, the student Converts fraction to percentage by putting % sign after the denominator [SEP] The student derives wrong answer \( \frac{6

  probs = softmax(outputs.logits)


[CLS] For How would you calculate the area of this triangle? ![A triangle, base 12m. All three sides are equal.](), a math question on Area of Simple Shapes Calculate the area of an equilateral triangle where the dimensions are given in the same units, the student Halves base but still divides product by 2 when finding area of a triangle [SEP] The student derives wrong answer \( \frac{12 \times 12 \times 12}{2} \) [SEP]
[CLS] For How would you calculate the area of this triangle? ![A triangle, base 12m. All three sides are equal.](), a math question on Area of Simple Shapes Calculate the area of an equilateral triangle where the dimensions are given in the same units, the student Does not have when calculating the area of a triangle [SEP] The student derives wrong answer \( \frac{12 \times 12 \times 12}{2} \) [SEP]
[CLS] For How would you calculate the area of this triangle? ![A triangle, base 12m. All three sides are equal.](), a math question on Area of Simple Shapes Calculate the ar

  probs = softmax(outputs.logits)


[CLS] For Solve
\[
2 n+3 \leq 3 n-7
\], a math question on Solving Linear Inequalities Solve linear inequalities in one variable where the unknown appears on boths sides of the equation, the student Believes inequalities can only include integers [SEP] The student derives wrong answer \( n \leq 10 \) [SEP]
[CLS] For Solve
\[
2 n+3 \leq 3 n-7
\], a math question on Solving Linear Inequalities Solve linear inequalities in one variable where the unknown appears on boths sides of the equation, the student Does not know how to solve linear equations involving fractions [SEP] The student derives wrong answer \( n \leq 10 \) [SEP]
[CLS] For Solve
\[
2 n+3 \leq 3 n-7
\], a math question on Solving Linear Inequalities Solve linear inequalities in one variable where the unknown appears on boths sides of the equation, the student Selects y>a solutions instead of y<a solutions for a quadratic inequality [SEP] The student derives wrong answer \( n \leq 10 \) [SEP]
[CLS] For Solve
\[
2 n+3 \leq 3 n-

  probs = softmax(outputs.logits)


[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Halves when asked to find the square root [SEP] The student derives wrong answer \( 728 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Squares when asked to find the square root [SEP] The student derives wrong answer \( 728 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Confuses square rooting and halving the number [SEP] The student derives wrong answer \( 728 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Forgets to multiply up for the correct time frame in a worded question [SEP] The student derives wrong answer \( 728 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Mixes u

  probs = softmax(outputs.logits)


[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Halves when asked to find the square root [SEP] The student derives wrong answer \( 730 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Squares when asked to find the square root [SEP] The student derives wrong answer \( 730 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student When finding the square root of a decimal, believes the answer should have the same number of decimal places [SEP] The student derives wrong answer \( 730 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, a math question on Time Convert from years to days, the student Answers as if there are 100 minutes in an hour when writing minutes as decimals [SEP] The student derives wrong answer \( 730 \) [SEP]
[CLS] For \( 2 \) leap years \( =\square \) days, 

  probs = softmax(outputs.logits)


[CLS] For \( 427 \times 6= \), a math question on Mental Multiplication and Division Multiply 3 digit integers by 1 digit integers using the grid method, the student Confusion over whether multiplication or division is the correct way to solve the problem. [SEP] The student derives wrong answer \( 240+120+42 \) [SEP]
[CLS] For \( 427 \times 6= \), a math question on Mental Multiplication and Division Multiply 3 digit integers by 1 digit integers using the grid method, the student Thinks you can multiply by a decimal to find a multiple of a number [SEP] The student derives wrong answer \( 240+120+42 \) [SEP]
[CLS] For \( 427 \times 6= \), a math question on Mental Multiplication and Division Multiply 3 digit integers by 1 digit integers using the grid method, the student When multiplying just writes the numbers next to each other [SEP] The student derives wrong answer \( 240+120+42 \) [SEP]
[CLS] For \( 427 \times 6= \), a math question on Mental Multiplication and Division Multiply 3 d

  probs = softmax(outputs.logits)


[CLS] For This is a conversion graph between kilograms and pounds.

Which of the following is the best estimate for the number of kilograms equivalent to \( 20 \) pounds? ![A graph on gridded paper with pounds (lb) on the y axis from 0 to 80 and kilograms (kg) on the x axis from 0 to 80. Both axis are labelled in increments of 20, with each increment split by gridlines into another 4 squares. A straight line goes from the origin and sloped upwards going through (25, 55) (not labelled - just a coordinate that it goes through) ](), a math question on Direct Proportion Read values off a real life graph, the student When asked to read a value from a graph, thinks they give the corresponding value on the other axis, rather than reading from the graph [SEP] The student derives wrong answer \( 44 \) kilograms [SEP]
[CLS] For This is a conversion graph between kilograms and pounds.

Which of the following is the best estimate for the number of kilograms equivalent to \( 20 \) pounds? ![A graph

  probs = softmax(outputs.logits)


[CLS] For What is the gradient of this line?
\(5 x+3 y=15\), a math question on Finding the Gradient and Intercept of a Line from the Equation Find the gradient of a line written in the form ax + by = c, the student Believes that for a line written in the form ax + by = c, a is the gradient [SEP] The student derives wrong answer \( 5 \) [SEP]
[CLS] For What is the gradient of this line?
\(5 x+3 y=15\), a math question on Finding the Gradient and Intercept of a Line from the Equation Find the gradient of a line written in the form ax + by = c, the student Believes the gradient of a line is given by an intercept [SEP] The student derives wrong answer \( 5 \) [SEP]
[CLS] For What is the gradient of this line?
\(5 x+3 y=15\), a math question on Finding the Gradient and Intercept of a Line from the Equation Find the gradient of a line written in the form ax + by = c, the student Believes the gradient of a line is given by the x intercept [SEP] The student derives wrong answer \( 5 \) [SEP]


The out-of-the-box bi-encoder had best mapk score of 0.167 (only 53% of predictions contained the correct misconception), and when fed into cross-encoder to rerank, the score dropped to 0.146 at best.

Attempt fine-tuning the bi-encoder to increase the percentage and mapk score in the following section.

In [154]:
!pip install sentence-transformers



In [155]:
#prepare data
from sentence_transformers import InputExample

# Load your data
data = pd.DataFrame({
    'QuestionText': ["What is 2+2?", "What is the capital of France?", ...],
    'AnswerText': ["4", "Paris", ...],
    'MisconceptionName': ["Confused addition with multiplication", "Confused capital with a nearby city", ...]
})

# Create training examples for bi-encoder
train_examples = []

for _, row in data.iterrows():
    # Query text: Concatenate question and answer text
    query_text = f"{row['QuestionText']} [SEP] {row['AnswerText']}"
    positive_example = row['MisconceptionName']

    # Create positive pair
    train_examples.append(InputExample(texts=[query_text, positive_example], label=1.0))

    # Create negative samples by choosing a random misconception that does not match the question
    negative_samples = data[data['MisconceptionName'] != row['MisconceptionName']].sample(1)
    for _, neg_row in negative_samples.iterrows():
        negative_example = neg_row['MisconceptionName']
        train_examples.append(InputExample(texts=[query_text, negative_example], label=0.0))


In [156]:
#configure model
from sentence_transformers import SentenceTransformer, losses, util
from torch.utils.data import DataLoader

# Load the pre-trained bi-encoder model
fine_tune_mini_l6_v2_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Define a DataLoader for training
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

# Define the training loss (Contrastive loss works well for retrieval)
train_loss = losses.CosineSimilarityLoss(model)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [160]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (1

In [161]:
from datasets import Dataset

In [163]:
!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.3.0-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.3.0-py3-none-any.whl (268 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.7/268.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
  Attempting uninstall: sentence-transformers
    Found existing installation: sentence-transformers 3.2.1
    Uninstalling sentence-transformers-3.2.1:
      Successfully uninstalled sentence-transformers-3.2.1
Successfully installed sentence-transformers-3.3.0


In [165]:
#train model
fine_tune_mini_l6_v2_model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    warmup_steps=int(len(train_dataloader) * 0.1),
    show_progress_bar=True
)

NameError: name 'Dataset' is not defined