<a href="https://colab.research.google.com/github/ranga-godhandaraman/LLM-Benchmark/blob/main/GSM8K.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
from transformers import AlbertForQuestionAnswering, AlbertTokenizer, DistilBertForQuestionAnswering, DistilBertTokenizer

# Load GSM8K dataset
custom_dataset = [
    {
        'question': 'Samantha has 25 marbles. She gave 8 marbles to her friend. How many marbles does Samantha have now?',
        'answer': 'How many marbles does Samantha have now? ** Samantha has 25 - 8 = <<25-8=17>>17 marbles now.\n#### 17'
    },
    {
        'question': 'Alex had 48 chocolates. He gave half of them to his sister. How many chocolates does Alex have now?',
        'answer': 'How many chocolates does Alex have now? ** Alex had 48/2 = <<48/2=24>>24 chocolates left.\n#### 24'
    },
    {
        'question': 'A box contained 60 pens. If 15 pens were taken out, how many pens remain in the box?',
        'answer': 'How many pens remain in the box? ** There are 60 - 15 = <<60-15=45>>45 pens remaining.\n#### 45'
    },
    {
        'question': 'There were 80 students in a class. If 25 students went on a field trip, how many students remained in the class?',
        'answer': 'How many students remained in the class? ** There were 80 - 25 = <<80-25=55>>55 students remaining.\n#### 55'
    },
    {
        'question': 'Rachel had 36 apples. She gave 12 apples to her brother. How many apples does Rachel have now?',
        'answer': 'How many apples does Rachel have now? ** Rachel has 36 - 12 = <<36-12=24>>24 apples now.\n#### 24'
    },
    {
        'question': 'There were 50 balloons. If 20 balloons popped, how many balloons were left?',
        'answer': 'How many balloons were left? ** There were 50 - 20 = <<50-20=30>>30 balloons left.\n#### 30'
    },
    {
        'question': 'Tommy bought 60 candies. He ate 25 candies. How many candies are left with Tommy?',
        'answer': 'How many candies are left with Tommy? ** Tommy has 60 - 25 = <<60-25=35>>35 candies left.\n#### 35'
    },
    {
        'question': 'A jar had 100 cookies. If 40 cookies were eaten, how many cookies are left in the jar?',
        'answer': 'How many cookies are left in the jar? ** There are 100 - 40 = <<100-40=60>>60 cookies left.\n#### 60'
    },
    {
        'question': 'Nancy had 80 books. She gave 30 books to her friend. How many books does Nancy have now?',
        'answer': 'How many books does Nancy have now? ** Nancy has 80 - 30 = <<80-30=50>>50 books now.\n#### 50'
    },
    {
        'question': 'There were 90 oranges in a basket. If 25 oranges were taken out, how many oranges are left in the basket?',
        'answer': 'How many oranges are left in the basket? ** There are 90 - 25 = <<90-25=65>>65 oranges left.\n#### 65'
    },
    {
        'question': 'David had 70 candies. He ate half of them. How many candies are left with David?',
        'answer': 'How many candies are left with David? ** David has 70/2 = <<70/2=35>>35 candies left.\n#### 35'
    },
    {
        'question': 'There were 120 pencils in a box. If 50 pencils were used, how many pencils are remaining in the box?',
        'answer': 'How many pencils are remaining in the box? ** There are 120 - 50 = <<120-50=70>>70 pencils remaining.\n#### 70'
    },
    {
        'question': 'Emma had 50 stickers. She gave 15 stickers to her sister. How many stickers does Emma have now?',
        'answer': 'How many stickers does Emma have now? ** Emma has 50 - 15 = <<50-15=35>>35 stickers now.\n#### 35'
    },
    {
        'question': 'A shelf contained 80 books. If 30 books were borrowed, how many books are left on the shelf?',
        'answer': 'How many books are left on the shelf? ** There are 80 - 30 = <<80-30=50>>50 books left.\n#### 50'
    },
    {
        'question': 'There were 100 candies in a jar. If 40 candies were taken out, how many candies remained in the jar?',
        'answer': 'How many candies remained in the jar? ** There were 100 - 40 = <<100-40=60>>60 candies left.\n#### 60'
    },
    {
        'question': 'John had 70 toys. He gave away 25 toys. How many toys does John have now?',
        'answer': 'How many toys does John have now? ** John has 70 - 25 = <<70-25=45>>45 toys now.\n#### 45'
    },
    {
        'question': 'A basket had 120 apples. If 50 apples were rotten, how many apples were good?',
        'answer': 'How many apples were good? ** There were 120 - 50 = <<120-50=70>>70 good apples.\n#### 70'
    },
    {
        'question': 'Emily had 60 pencils. She lost 20 pencils. How many pencils does Emily have now?',
        'answer': 'How many pencils does Emily have now? ** Emily has 60 - 20 = <<60-20=40>>40 pencils now.\n#### 40'
    },
    {
        'question': 'A box contained 150 chocolates. If 60 chocolates were eaten, how many chocolates are left in the box?',
        'answer': 'How many chocolates are left in the box? ** There are 150 - 60 = <<150-60=90>>90 chocolates left.\n#### 90'
    },
    {
        'question': 'Sarah had 80 balloons. She gave 35 balloons to her friend. How many balloons does Sarah have now?',
        'answer': 'How many balloons does Sarah have now? ** Sarah has 80 - 35 = <<80-35=45>>45 balloons now.\n#### 45'
    }
]

# Load ALBERT model and tokenizer
albert_model_name = 'albert-base-v2'
albert_tokenizer = AlbertTokenizer.from_pretrained(albert_model_name)
albert_model = AlbertForQuestionAnswering.from_pretrained(albert_model_name)

def answer_question(question, model, tokenizer):
    inputs = tokenizer(question, custom_dataset[0]['question'], return_tensors='pt', padding=True, truncation=True)
    start_logits, end_logits = model(**inputs).values()

    # Get the most probable start and end indices
    start_index = torch.argmax(start_logits)
    end_index = torch.argmax(end_logits) + 1

    # Get the tokens corresponding to the answer
    answer_tokens = inputs['input_ids'][0][start_index:end_index]

    # Decode the tokens into a string answer
    predicted_answer = tokenizer.decode(answer_tokens)

    return predicted_answer


# Perform inference on the dataset
for data in custom_dataset:
    question = data['question']
    answer = data['answer']
    predicted_answer = answer_question(question, model, tokenizer)
    print("Question:", question)
    print("Expected Answer:", answer)
    print("Predicted Answer:", predicted_answer)
    print()

Some weights of AlbertForQuestionAnswering were not initialized from the model checkpoint at albert-base-v2 and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: Samantha has 25 marbles. She gave 8 marbles to her friend. How many marbles does Samantha have now?
Expected Answer: How many marbles does Samantha have now? ** Samantha has 25 - 8 = <<25-8=17>>17 marbles now.
#### 17
Predicted Answer: 25 marbles. she gave 8 marbles to her friend. how many

Question: Alex had 48 chocolates. He gave half of them to his sister. How many chocolates does Alex have now?
Expected Answer: How many chocolates does Alex have now? ** Alex had 48/2 = <<48/2=24>>24 chocolates left.
#### 24
Predicted Answer: 

Question: A box contained 60 pens. If 15 pens were taken out, how many pens remain in the box?
Expected Answer: How many pens remain in the box? ** There are 60 - 15 = <<60-15=45>>45 pens remaining.
#### 45
Predicted Answer: 60

Question: There were 80 students in a class. If 25 students went on a field trip, how many students remained in the class?
Expected Answer: How many students remained in the class? ** There were 80 - 25 = <<80-25=55>>55 stud

In [None]:
# Load DistilBERT model and tokenizer
distilbert_model_name = 'distilbert-base-uncased'
distilbert_tokenizer = DistilBertTokenizer.from_pretrained(distilbert_model_name)
distilbert_model = DistilBertForQuestionAnswering.from_pretrained(distilbert_model_name)

def answer_question(question, model, tokenizer):
    inputs = tokenizer(question, custom_dataset[0]['question'], return_tensors='pt', padding=True, truncation=True)
    start_logits, end_logits = model(**inputs).values()

    # Get the most probable start and end indices
    start_index = torch.argmax(start_logits)
    end_index = torch.argmax(end_logits) + 1

    # Get the tokens corresponding to the answer
    answer_tokens = inputs['input_ids'][0][start_index:end_index]

    # Decode the tokens into a string answer
    predicted_answer = tokenizer.decode(answer_tokens)

    return predicted_answer

    # Perform inference on the dataset
for data in custom_dataset:
    question = data['question']
    answer = data['answer']
    predicted_answer = answer_question(question, model, tokenizer)
    print("Question:", question)
    print("Expected Answer:", answer)
    print("Predicted Answer:", predicted_answer)
    print()

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: Samantha has 25 marbles. She gave 8 marbles to her friend. How many marbles does Samantha have now?
Expected Answer: How many marbles does Samantha have now? ** Samantha has 25 - 8 = <<25-8=17>>17 marbles now.
#### 17
Predicted Answer: 25 marbles. she gave 8 marbles to her friend. how many

Question: Alex had 48 chocolates. He gave half of them to his sister. How many chocolates does Alex have now?
Expected Answer: How many chocolates does Alex have now? ** Alex had 48/2 = <<48/2=24>>24 chocolates left.
#### 24
Predicted Answer: 

Question: A box contained 60 pens. If 15 pens were taken out, how many pens remain in the box?
Expected Answer: How many pens remain in the box? ** There are 60 - 15 = <<60-15=45>>45 pens remaining.
#### 45
Predicted Answer: 60

Question: There were 80 students in a class. If 25 students went on a field trip, how many students remained in the class?
Expected Answer: How many students remained in the class? ** There were 80 - 25 = <<80-25=55>>55 stud

In [None]:
def evaluate_model(model, tokenizer, dataset):
    correct_predictions = 0
    total_questions = len(dataset)

    for data in dataset:
        question = data['question']
        expected_answer = data['answer']
        predicted_answer = answer_question(question, model, tokenizer)

        if predicted_answer.strip() == expected_answer.strip():
            correct_predictions += 1

    accuracy = (correct_predictions / total_questions)
    return accuracy

# Evaluate ALBERT model
albert_accuracy = evaluate_model(albert_model, albert_tokenizer, custom_dataset)
print("ALBERT Model Accuracy:", albert_accuracy)

# Evaluate DistilBERT model
distilbert_accuracy = evaluate_model(distilbert_model, distilbert_tokenizer, custom_dataset)
print("DistilBERT Model Accuracy:", distilbert_accuracy)

ALBERT Model Accuracy: 0.0
DistilBERT Model Accuracy: 0.0
