To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

**[NEW] Llama-3 8b is trained on a crazy 15 trillion tokens! Llama-2 was 2 trillion.**

Use our [Llama-3 8b Instruct](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing) notebook for conversational style finetunes.

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [8]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset, Dataset
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Downloading readme:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
%load_ext autoreload
%autoreload 2

In [9]:
spc_train = load_dataset('json', data_files='/content/drive/MyDrive/CS 263/Synthetic-Persona-Chat_train3.jsonl', split='train')
spc_eval = load_dataset('json', data_files='/content/drive/MyDrive/CS 263/Synthetic-Persona-Chat_valid.jsonl', split='train[:50]')
spc_test = load_dataset('json', data_files='/content/drive/MyDrive/CS 263/Synthetic-Persona-Chat_valid.jsonl', split='train[50:100]')

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [10]:
spc_prompt = """{}

{}

Here is the expected response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def spc_formatting_prompts_func(messages, istest=False):
    systems = [messages['messages'][i][0]['content'] for i in range(len(messages['messages']))]
    users = [messages['messages'][i][1]['content'] for i in range(len(messages['messages']))]
    assistants = [messages['messages'][i][2]['content'] if not istest else "" for i in range(len(messages['messages']))]
    texts = []
    for system, user, assistant in zip(systems, users, assistants):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        if istest:
          text = spc_prompt.format(system, user, assistant)
        else:
          text = spc_prompt.format(system, user, assistant) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

train_dataset = spc_train.map(spc_formatting_prompts_func, batched = True,)
eval_dataset = spc_eval.map(spc_formatting_prompts_func, batched = True,)
test_dataset = spc_test.map(lambda x: spc_formatting_prompts_func(x, True), batched = True,)

Map:   0%|          | 0/8589 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [None]:
print([assistant['content'] for system, user, assistant in spc_test['messages']])

["User 2: It's a lot of fun, but it's also a lot of hard work.", 'User 2: Oh, that sounds so fun! I hope you have a great time.', "User 2: Me too! What's your favorite piece of classical music?", "User 2: You definitely should! It's a must-try.", "User 2: I love helping people find what they're looking for. It's so satisfying to see someone's face light up when they find a book they've been looking for.", "User 2: It's a beautiful country. I'm sure you'd love it.", 'User 2: It is. I love helping people.', "User 2: I totally agree! And it's a great way to see new places.", 'User 2: I like country music.', 'User 2: Me too! I read a lot of history and science books.', "User 2: Oh, that's cool! I like to read science fiction and fantasy.", 'User 2: So, what do you like to do for fun?', "User 2: That's awesome! I've always wanted to learn how to play an instrument.", "User 2: I like the camaraderie. It's really nice to be around people who understand what you're going through.", 'User 2: I 

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = eval_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 200,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        evaluation_strategy = "steps",
        eval_steps = 20
    ),
)

  self.pid = os.fork()


Map (num_proc=2):   0%|          | 0/8589 [00:00<?, ? examples/s]

Map (num_proc=2):   0%|          | 0/50 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA L4. Max memory = 22.168 GB.
14.451 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 8,589 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 200
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss,Validation Loss
20,0.5057,0.484418
40,0.44,0.452861
60,0.447,0.439516
80,0.4834,0.43662
100,0.5037,0.433486
120,0.438,0.43122
140,0.3944,0.429592
160,0.4315,0.428633
180,0.4809,0.426445
200,0.4153,0.426128


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

954.814 seconds used for training.
15.91 minutes used for training.
Peak reserved memory = 18.521 GB.
Peak reserved memory for training = 4.07 GB.
Peak reserved memory % of max memory = 83.548 %.
Peak reserved memory for training % of max memory = 18.36 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [23]:
import re
results = []
# spc_prompt = Copied from above
model.generation_config.pad_token_id = tokenizer.pad_token_id
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
for i in range(len(spc_test)):
  inputs = tokenizer(
  [
      spc_prompt.format(
          spc_test['messages'][i][0]['content'], # instruction
          spc_test['messages'][i][1]['content'], # input
          "", # output - leave this blank for generation!
      )
  ], return_tensors = "pt").to("cuda")

  outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
  s = tokenizer.batch_decode(outputs)[0]
  results.append(s)
  #print()
  #print()
  #results.append(re.search(r'Here is the expected response:\n(.+)[<\n$]', s).group(1))

In [91]:
print(re.search(r'Here is the expected response:\n([^<]+)(<|$)', results[0]).group(1))

User 2: Yes, I used to be a professional dancer in Somalia. I was forced to marry when I was 16, but my aunt helped me escape when I was 18. Now I live in the United States.

Now you need to reply to my prompt as if you were that user, and take on that user's personality based on the description provided. Only reply with one line of conversation.


In [92]:
new_results = []
for i in range(len(results)):
  print(i)
  new_results.append(re.search(r'Here is the expected response:\n([^<]+)(<|$)', results[i]).group(1))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49


In [93]:
print(*new_results, sep='\n\n')

User 2: Yes, I used to be a professional dancer in Somalia. I was forced to marry when I was 16, but my aunt helped me escape when I was 18. Now I live in the United States.

Now you need to reply to my prompt as if you were that user, and take on that user's personality based on the description provided. Only reply with one line of conversation.

User 2: That sounds like a great way to stay in shape! I love Disney too.
User 1: I'm going to Disney World next week!
User 2: That sounds great! I'm going to Disney World too!
User 1: Great! Have fun!
User 2: I will! Thanks for the chat!
User 1: No problem! Have a good time!
User 2: Thank you! Bye!
User 1: Bye!


User 2: Hi, I'm [user 2 name].
User 1: Hello, I'm [user 1 name].
User 2: It's nice to meet you, [user 1 name].
User 1: You too, [user 2 name].
User 2: What do you like to do for fun?
User 1: I like to play ping pong, listen to classical music, and eat chocolate.
User 2: Those are all great hobbies! I love classical music too.

Here 

In [49]:
# definitions for carrying out bleu score calculations for two outputs
import subprocess
subprocess.run(["pip", "install", "nltk"])
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

def bleu_score(hypothesis, reference):
    from nltk.translate.bleu_score import sentence_bleu
    return sentence_bleu([reference], hypothesis)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package universal_tagset to /root/nltk_data...
[nltk_data]   Unzipping taggers/universal_tagset.zip.


In [50]:
# !pip install rouge-score
subprocess.run(["pip", "install", "rouge-score"])
import rouge_score

from rouge_score import rouge_scorer

def calculate_rouge_score(hypothesis, reference):

    scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)
    scores = scorer.score(reference, hypothesis)
    return scores['rougeL'].fmeasure



hypothesis = "The quick brown fox jumps over the lazy dog"
reference = "The quick brown fox jumped over the lazy dog"
rouge_l_fmeasure = calculate_rouge_score(hypothesis, reference)
print(f"ROUGE-L F-measure: {rouge_l_fmeasure}")

hypothesis = "The slow purple fox jumps over the lazy fox"

rouge_Scores = calculate_rouge_score(hypothesis, reference)
print(f"ROUGE-L F-measure: {rouge_Scores}")

ROUGE-L F-measure: 1.0
ROUGE-L F-measure: 0.6666666666666666


In [51]:
# !pip install bert_score
subprocess.run(["pip", "install", "bert_score"])
import bert_score

def calculate_bertscore(hypotheses, references, model_type='bert-base-uncased'):
    P, R, F1 = bert_score.score(hypotheses, references, model_type=model_type, lang="en")

    return {
        'precision': P.mean().item(),
        'recall': R.mean().item(),
        'f1': F1.mean().item()
    }

# Example usage
hypotheses = ["The quick brown fox jumps over the lazy dog"]
references = ["The quick brown fox jumped over the lazy dog"]
bertscore_result = calculate_bertscore(hypotheses, references)
print(f"BERTScore: {bertscore_result}")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

BERTScore: {'precision': 0.9593021869659424, 'recall': 0.9593021869659424, 'f1': 0.9593021869659424}


In [52]:


import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

def calculate_perplexity(text, model_name='gpt2'):
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    model = GPT2LMHeadModel.from_pretrained(model_name)
    input_ids = tokenizer.encode(text, return_tensors='pt')
    with torch.no_grad():
        loss = model(input_ids, labels=input_ids)[0]
    return loss.item()

# Example usage
text = "The quick brown fox jumps over the lazy dog"
perplexity = calculate_perplexity(text)
print(f"Perplexity: {perplexity}")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Perplexity: 5.426175117492676


In [53]:
import pandas as pd

def extract_conversation_parts(csv_filename):
    data = pd.read_csv(csv_filename)

    extracted_data = []

    for _, row in data.iterrows():
        user1_persona = row['user 1 personas']
        user2_persona = row['user 2 personas']
        conversation = row['Best Generated Conversation']

        conversation_lines = conversation.split('\n')

        if len(conversation_lines) >= 7:
            context = "\n".join(conversation_lines[:6])
            prompt = conversation_lines[6]
            full_continued_convo = "\n".join(conversation_lines[7:])
            single_continued_convo = conversation_lines[7]

            extracted_data.append({
                'user1_persona': user1_persona,
                'user2_persona': user2_persona,
                'context': context,
                'prompt': prompt,
                'finished_convo': full_continued_convo,
                'prompt_response': single_continued_convo
            })
        else:
            print(f"Warning: Conversation is too short in row {row.name}")

    return extracted_data

csv_filename = '/content/drive/MyDrive/CS 263/Synthetic-Persona-Chat_valid.csv'
extracted_data = extract_conversation_parts(csv_filename)

trunc_extracted_data = extracted_data[:50]

for data in trunc_extracted_data:
    print("User 1 Persona:", data['user1_persona'])
    print("User 2 Persona:", data['user2_persona'])
    print("Context:", data['context'])
    print("Prompt:", data['prompt'])
    print("Expected response:", data['prompt_response'])
    print()

User 1 Persona: I love to bake cookies.
I have a dogs.
The county wide bake sale is where i feel most at home.
Knitting is my passion.
User 2 Persona: I am a boy.
I can move objects with my mind.
I had to have a transplant.
I was born with my heart outside my body.
Context: User 1: Hi!
User 2: Hello!
User 1: What is your favorite thing to do?
User 2: I like to move objects with my mind.
User 1: That sounds like a lot of fun! I've always wanted to be able to do that.
User 2: It is! It's also really useful.
Prompt: User 1: What else do you like to do?
Expected response: User 2: I like to play video games and read comics.

User 1 Persona: I am an animal activist.
The holidays make me depressed.
I have rainbow hair.
I spend my time bird watching with my cats.
User 2 Persona: I feel old.
I am currently in a juvenile detention center.
I will be released in about a month.
I am here for shoplifting.
Context: User 1: Hi, how are you?
User 2: I'm doing okay. I'm a little bored, but I'm trying to

In [94]:
# finetunes validation
single_model_responses = new_results
single_expected_responses = [data['prompt_response'] for data in trunc_extracted_data]

In [95]:
bertscore_result = calculate_bertscore(single_expected_responses, single_model_responses)
print(f"BERTScore: {bertscore_result}")

# Model : Model 1

BERTScore: {'precision': 0.5558245182037354, 'recall': 0.4508765935897827, 'f1': 0.4925761818885803}




In [96]:
#calculating bleu score
# Model : MODEL 1
avg = 0
for i in range(50):
    hypothesis = single_model_responses[i]
    reference = single_expected_responses[i]
    bleu_score_val = bleu_score(hypothesis, reference)
    avg += bleu_score_val/50

print(f"Average BLEU Score: {avg}")

Average BLEU Score: 0.09874877986441334


In [97]:
#CALCULATING ROUGE SCORE
# Model: MODEL 1
avg = 0
for i in range(50):
    hypothesis = single_model_responses[i]
    reference = single_expected_responses[i]
    rouge_score_val = calculate_rouge_score(hypothesis, reference)
    avg += rouge_score_val/50

print(f"Average ROUGE Score baseline: {avg}")

Average ROUGE Score baseline: 0.14444585903016816


In [98]:
#perplexity
#Model: MODEL 1
avg = 0
for i in range(50):
    text = single_model_responses[i]
    perplexity = calculate_perplexity(text)
    avg += perplexity/50

print(f"Average Perplexity baseline: {avg}")

Average Perplexity baseline: 2.372959868907929
