# LLM Finetuning

Data is 10-row sustainability data with prompt+completion.
Model is EleutherAI/pythia-70m. We use AutoTokenizer for tokenization and AutoModelForCausalLM for model training. 


- Data Preparation
    - Collect data
    - Tokenize data (pad - truncate)
    - Split data into train test
- Use Base Model
- Train
    - Train, save model
- Inference
    - Load model
    - Make predictions
- Evaluation
    - Load model
    - Calculate bleu score on test data

## Data Preparation

**Collect prompt completion pairs and create a jsonl file**

In [1]:
import pandas as pd
import numpy as np
import datasets

In [2]:
df = pd.read_excel("data_30.xlsx")
df.head()

Unnamed: 0,prompt,completion
0,What is gender equality?,"Gender equality refers to the equal rights, re..."
1,Why is gender equality important in the workpl...,Gender equality in the workplace is crucial be...
2,How does gender equality benefit society?,Gender equality benefits society by promoting ...
3,What are some common misconceptions about gend...,Some common misconceptions about gender equali...
4,How can education play a role in promoting gen...,Education is a powerful tool for promoting gen...


In [3]:
# Define the output JSONL file name
filename = 'output.jsonl'

# Iterate through the rows and write each row as a JSON object to the JSONL file
with open(filename, 'w') as jsonl_file:
    for _, row in df.iterrows():
        json_data = row.to_json(orient='columns')
        jsonl_file.write(json_data + '\n')

**Create a tokenizer**

In [4]:
#!pip install transformers
from transformers import AutoTokenizer

In [5]:
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")

**Tokenize the jsonl data**

In [6]:
def tokenize_function(examples):
    if "question" in examples and "answer" in examples:
        text = examples["question"][0] + examples["answer"][0]
    elif "input" in examples and "output" in examples:
        text = examples["input"][0] + examples["output"][0]
    elif "prompt" in examples and "completion" in examples:
        text = examples["prompt"][0] + examples["completion"][0]
    else:
        text = examples["text"][0]

    # Add 0 for short sentences
    tokenizer.pad_token = tokenizer.eos_token
    tokenized_inputs = tokenizer(
        text,
        return_tensors="np",
        padding=True,
    )
    
    # find the max length after padding, select the min
    max_length = min(
        tokenized_inputs["input_ids"].shape[1],
        2048
    )
    
    # truncate if the sentence is longer than 2048
    tokenizer.truncation_side = "left"
    tokenized_inputs = tokenizer(
        text,
        return_tensors="np",
        truncation=True,
        max_length=max_length
    )

    return tokenized_inputs

In [7]:
finetuning_dataset_loaded = datasets.load_dataset("json", data_files=filename, split="train")

tokenized_dataset = finetuning_dataset_loaded.map(
    tokenize_function,
    batched=True,
    batch_size=1,
    drop_last_batch=True
)

print(tokenized_dataset)

Using custom data configuration default-47a5361835ced184


Downloading and preparing dataset json/default to C:/Users/pelin/.cache/huggingface/datasets/json/default-47a5361835ced184/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to C:/Users/pelin/.cache/huggingface/datasets/json/default-47a5361835ced184/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


  0%|          | 0/30 [00:00<?, ?ba/s]

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask'],
    num_rows: 30
})


In [8]:
tokenized_dataset = tokenized_dataset.add_column("labels", tokenized_dataset["input_ids"])

**Analyse tokenized dataset**

In [9]:
tokenized_dataset

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 30
})

In [10]:
tokenized_dataset["prompt"][0]

'What is gender equality?'

In [11]:
tokenized_dataset["completion"][0]

'Gender equality refers to the equal rights, responsibilities, and opportunities of all individuals, regardless of their gender. It implies that the interests, needs, and priorities of both women and men are taken into consideration, recognizing the diversity of different groups of women and men.'

In [12]:
tokenized_dataset["input_ids"][0]

[1276,
 310,
 8645,
 13919,
 32,
 40945,
 13919,
 10770,
 281,
 253,
 4503,
 3570,
 13,
 19715,
 13,
 285,
 9091,
 273,
 512,
 4292,
 13,
 10159,
 273,
 616,
 8645,
 15,
 733,
 8018,
 326,
 253,
 6284,
 13,
 3198,
 13,
 285,
 23971,
 273,
 1097,
 2255,
 285,
 1821,
 403,
 2668,
 715,
 8180,
 13,
 26182,
 253,
 9991,
 273,
 1027,
 2390,
 273,
 2255,
 285,
 1821,
 15]

In [13]:
tokenized_dataset["attention_mask"][0]

[1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1]

**Train test split**

In [14]:
split_dataset = tokenized_dataset.train_test_split(test_size=0.1, shuffle=True, seed=123)
print(split_dataset)

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 27
    })
    test: Dataset({
        features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 3
    })
})


In [15]:
train_dataset = split_dataset["train"]
test_dataset = split_dataset["test"]

print(train_dataset)
print(test_dataset)

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 27
})
Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 3
})


**Push to hub**

In [16]:
# This is how to push your own dataset to your Huggingface hub
# !pip install huggingface_hub
# !huggingface-cli login
# split_dataset.push_to_hub(dataset_path_hf)

##  Use Base Model 

In [17]:
import datasets
import logging
import random
import logging
import torch
import transformers
import pandas as pd

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import TrainingArguments, Trainer

  device: Optional[torch.device] = torch.device("cuda"),


In [18]:
model_name = "EleutherAI/pythia-70m"

In [19]:
base_model = AutoModelForCausalLM.from_pretrained(model_name)

device_count = torch.cuda.device_count()
if device_count > 0:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
    
base_model.to(device)
print(device)

cpu


In [20]:
test_text = test_dataset[0]['prompt']
max_input_tokens = 1000
max_output_tokens=100
# Tokenize
input_ids = tokenizer.encode(
      test_text,
      return_tensors="pt",
      truncation=True,
      max_length=max_input_tokens
)

# Generate
device = base_model.device
generated_tokens_with_prompt = base_model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)

# Decode
generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

# Strip the prompt
generated_text_answer = generated_text_with_prompt[0][len(test_text):]


print("Question input (test):", test_text)
print(f"Correct answer from docs: {test_dataset[0]['completion']}")
print("Model's answer: ")
print(generated_text_answer)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question input (test): How does gender equality benefit society?
Correct answer from docs: Gender equality benefits society by promoting social cohesion, economic growth, and sustainable development. When both men and women have equal opportunities to contribute, societies can tap into a broader range of talents, ideas, and perspectives, leading to more comprehensive solutions to complex challenges.
Model's answer: 


A:

The only way to get rid of this is to use the "gender equality" option.

A:

The only way to get rid of this is to use the "gender equality" option.

A:

You can use the "gender equality" option.

A:

You can use the "gender equality" option.

A:

You can use the "gender equality


## Train

In [56]:
from transformers import TrainingArguments, Trainer

In [57]:
max_steps = 100

trained_model_name = f"lamini_docs_{max_steps}_steps"
output_dir = trained_model_name

In [58]:
training_args = TrainingArguments(

  # Learning rate
  learning_rate=1.0e-5,

  # Number of training epochs
  num_train_epochs=1,

  # Max steps to train for (each step is a batch of data)
  # Overrides num_train_epochs, if not -1
  max_steps=max_steps,

  # Batch size for training
  per_device_train_batch_size=1,

  # Directory to save model checkpoints
  output_dir=output_dir,

  # Other arguments
  overwrite_output_dir=False, # Overwrite the content of the output directory
  disable_tqdm=False, # Disable progress bars
  eval_steps=120, # Number of update steps between two evaluations
  save_steps=120, # After # steps model is saved
  warmup_steps=1, # Number of warmup steps for learning rate scheduler
  per_device_eval_batch_size=1, # Batch size for evaluation
  evaluation_strategy="steps",
  save_strategy="steps",
  logging_strategy="steps",
  logging_steps=1,
  optim="adafactor",
  gradient_accumulation_steps = 4,
  gradient_checkpointing=False,

  # Parameters for early stopping
  load_best_model_at_end=True,
  save_total_limit=1,
  metric_for_best_model="eval_loss",
  greater_is_better=False
)



trainer = Trainer(
    model=base_model,
    # model_flops=model_flops,
    # total_steps=max_steps,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
max_steps is given, it will override any value given in num_train_epochs


In [59]:
# Uncomment the next line 
training_output = trainer.train()

The following columns in the training set don't have a corresponding argument in `GPTNeoXForCausalLM.forward` and have been ignored: completion, prompt. If completion, prompt are not expected by `GPTNeoXForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 27
  Num Epochs = 17
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 4
  Total optimization steps = 100
  Number of trainable parameters = 70426624


Step,Training Loss,Validation Loss




Training completed. Do not forget to share your model on huggingface.co/models =)




**Save Model**

In [60]:
# Uncomment the next 3 lines
save_dir = f'{output_dir}/final'
trainer.save_model(save_dir)
print("Saved model to:", save_dir)

Saving model checkpoint to lamini_docs_100_steps/final
Configuration saved in lamini_docs_100_steps/final\config.json
Configuration saved in lamini_docs_100_steps/final\generation_config.json
Model weights saved in lamini_docs_100_steps/final\pytorch_model.bin


Saved model to: lamini_docs_100_steps/final


**Load Model**

In [61]:
max_steps = 100
device_count = torch.cuda.device_count()
if device_count > 0:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

trained_model_name = f"lamini_docs_{max_steps}_steps"
output_dir = trained_model_name
save_dir = f'{output_dir}/final'



finetuned_slightly_model = AutoModelForCausalLM.from_pretrained(save_dir, local_files_only=True)
finetuned_slightly_model.to(device) 

base_model = AutoModelForCausalLM.from_pretrained(model_name, local_files_only=True)
base_model.to(device) 

loading configuration file lamini_docs_100_steps/final\config.json
Model config GPTNeoXConfig {
  "_name_or_path": "lamini_docs_100_steps/final",
  "architectures": [
    "GPTNeoXForCausalLM"
  ],
  "bos_token_id": 0,
  "eos_token_id": 0,
  "hidden_act": "gelu",
  "hidden_size": 512,
  "initializer_range": 0.02,
  "intermediate_size": 2048,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 2048,
  "model_type": "gpt_neox",
  "num_attention_heads": 8,
  "num_hidden_layers": 6,
  "rotary_emb_base": 10000,
  "rotary_pct": 0.25,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.26.1",
  "use_cache": true,
  "use_parallel_residual": true,
  "vocab_size": 50304
}

loading weights file lamini_docs_100_steps/final\pytorch_model.bin
Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

All model checkpoint weights were used when initializing GPTNeoXForCausalLM.

All the weights of GPTNeoXFo

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 512)
    (layers): ModuleList(
      (0): GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=512, out_features=1536, bias=True)
          (dense): Linear(in_features=512, out_features=512, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
          (dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
          (act): GELUActivation()
        )
      )
      (1): GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (

## Inference

In [62]:
# Prediction
test_question = test_dataset[0]['prompt']
print("Question input (test):", test_question)

# Tokenize
input_ids = tokenizer.encode(
      test_question,
      return_tensors="pt",
      truncation=True,
      max_length=max_input_tokens
)

# Generate
device = finetuned_slightly_model.device
generated_tokens_with_prompt = finetuned_slightly_model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)

# Decode
generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

# Strip the prompt
generated_text_answer = generated_text_with_prompt[0][len(test_question):]


print(f"Correct answer from docs: {test_dataset[0]['completion']}")
print(" ")
print("Fine-tuned Model's answer: ")
print(generated_text_answer)


Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question input (test): How does gender equality benefit society?
Correct answer from docs: Gender equality benefits society by promoting social cohesion, economic growth, and sustainable development. When both men and women have equal opportunities to contribute, societies can tap into a broader range of talents, ideas, and perspectives, leading to more comprehensive solutions to complex challenges.
 
Fine-tuned Model's answer: 
Adefactually,abusive and degrading her traditional gender norms has resulted in a series of drastic changes in society, including gender-based gender reassignment surgery, gender parity surgery, and gender parity-adjustment surgery. By contrast, gender-based gender reassignment surgery has yielded a more equitable and equitable treatment for women and girls, according to the World Health Organization.

In 2022, the United Nations General Assembly passed a resolution condemning gender


In [63]:
generated_tokens_with_prompt_base = base_model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)
generated_tokens_with_prompt_base = tokenizer.batch_decode(generated_tokens_with_prompt_base, skip_special_tokens=True)
generated_text_answer_base = generated_tokens_with_prompt_base[0][len(test_question):]
print("Base Model's answer: ")
print(generated_text_answer_base)
print(" ")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Base Model's answer: 


A:

The only way to get rid of this is to use the "gender equality" option.

A:

The only way to get rid of this is to use the "gender equality" option.

A:

You can use the "gender equality" option.

A:

You can use the "gender equality" option.

A:

You can use the "gender equality
 


In [64]:
# Prediction from scratch
test_question = "What is the studies on tech companies in terms of gender equality?"
print("Question input (test):", test_question)

# Tokenize
input_ids = tokenizer.encode(
      test_question,
      return_tensors="pt",
      truncation=True,
      max_length=max_input_tokens
)

# Generate
device = finetuned_slightly_model.device
generated_tokens_with_prompt = finetuned_slightly_model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)

# Decode
generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

# Strip the prompt
generated_text_answer = generated_text_with_prompt[0][len(test_question):]
print("Model's answer: ")
print(generated_text_answer)

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question input (test): What is the studies on tech companies in terms of gender equality?
Model's answer: 
Research articles on tech companies in terms of gender equality found that more women are gynaecologists and engineers than ever before. In 2022, tech companies in 2023 reached the top of the female lead in women's tech. And just like most tech companies, tech companies in terms of gender equality are still finding ways to make sure they have the resources they need to fight for gender equality. And just like most tech companies, tech


In [65]:
generated_tokens_with_prompt_base = base_model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)
generated_tokens_with_prompt_base = tokenizer.batch_decode(generated_tokens_with_prompt_base, skip_special_tokens=True)
generated_text_answer_base = generated_tokens_with_prompt_base[0][len(test_question):]
print("Base Model's answer: ")
print(generated_text_answer_base)
print(" ")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Base Model's answer: 


The research is being conducted in the UK, with the aim of helping to understand the impact of gender equality on the UK’s economy.

The research is being conducted in the UK, with the aim of helping to understand the impact of gender equality on the UK’s economy.

The research is being conducted in the UK, with the aim of helping to understand the impact of gender equality on the UK’
 


## Evaluate

In [66]:
def generate_output(test_question, model):

    # Tokenize
    input_ids = tokenizer.encode(
          test_question,
          return_tensors="pt",
          truncation=True,
          max_length=max_input_tokens
    )

    # Generate
    device = model.device
    generated_tokens_with_prompt = model.generate(input_ids=input_ids.to(device), max_length=max_output_tokens)

    # Decode
    generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

    # Strip the prompt
    generated_text_answer = generated_text_with_prompt[0][len(test_question):]
    return generated_text_answer

In [67]:
test_q = test_dataset[0]['prompt']
completion_q = test_dataset[0]['completion']
predicted_text = generate_output(test_question, finetuned_slightly_model)
base_predicted_text = generate_output(test_question, base_model)

print('Question:')
print(test_q)
print("--------------------------------------")
print('Actual Completion:')
print(completion_q)
print("--------------------------------------")
print('Fine-tuned prediction')
print(predicted_text)
print("--------------------------------------")
print('Base prediction:')
print(base_predicted_text)
print("--------------------------------------")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question:
How does gender equality benefit society?
--------------------------------------
Actual Completion:
Gender equality benefits society by promoting social cohesion, economic growth, and sustainable development. When both men and women have equal opportunities to contribute, societies can tap into a broader range of talents, ideas, and perspectives, leading to more comprehensive solutions to complex challenges.
--------------------------------------
Fine-tuned prediction
Research articles on tech companies in terms of gender equality found that more women are gynaecologists and engineers than ever before. In 2022, tech companies in 2023 reached the top of the female lead in women's tech. And just like most tech companies, tech companies in terms of gender equality are still finding ways to make sure they have the resources they need to fight for gender equality. And just like most tech companies, tech
--------------------------------------
Base prediction:


The research is bein

In [68]:
test_q = test_dataset[2]['prompt']
completion_q = test_dataset[2]['completion']
predicted_text = generate_output(test_question, finetuned_slightly_model)
base_predicted_text = generate_output(test_question, base_model)

print('Question:')
print(test_q)
print("--------------------------------------")
print('Actual Completion:')
print(completion_q)
print("--------------------------------------")
print('Fine-tuned prediction')
print(predicted_text)
print("--------------------------------------")
print('Base prediction:')
print(base_predicted_text)
print("--------------------------------------")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question:
What role do tech companies play in promoting gender equality within the industry?
--------------------------------------
Actual Completion:
Tech companies play a crucial role in shaping industry norms. By implementing inclusive hiring practices, offering mentorship programs, and promoting women in leadership roles, they can set a standard for gender equality. Additionally, by addressing workplace cultures that may perpetuate bias, tech companies can foster more inclusive environments.
--------------------------------------
Fine-tuned prediction
Research articles on tech companies in terms of gender equality found that more women are gynaecologists and engineers than ever before. In 2022, tech companies in 2023 reached the top of the female lead in women's tech. And just like most tech companies, tech companies in terms of gender equality are still finding ways to make sure they have the resources they need to fight for gender equality. And just like most tech companies, tech

In [69]:
# Collect the predictions
tuned_predicted_text_list = []
actual_test_list = []
base_predicted_text_list = []
for i in range(len(test_dataset)):
    test_q = test_dataset[i]['prompt']
    completion_q = test_dataset[i]['completion']
    predicted_text = generate_output(test_question, finetuned_slightly_model)
    base_predicted_text = generate_output(test_question, base_model)
    actual_test_list.append(completion_q)
    tuned_predicted_text_list.append(predicted_text)
    base_predicted_text_list.append(base_predicted_text)

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain 

**Calculate the bleu score**

In [70]:
# !pip install evaluate

import evaluate
bleu = evaluate.load("bleu")

results = bleu.compute(predictions=base_predicted_text_list, references=actual_test_list)
print("Base Model Predictions Results")
print(results)

results = bleu.compute(predictions=tuned_predicted_text_list, references=actual_test_list)
print("Fine Tuned Model Predictions Results")
print(results)

Base Model Predictions Results
{'bleu': 0.0, 'precisions': [0.12280701754385964, 0.0044444444444444444, 0.0, 0.0], 'brevity_penalty': 1.0, 'length_ratio': 1.5724137931034483, 'translation_length': 228, 'reference_length': 145}
Fine Tuned Model Predictions Results
{'bleu': 0.022615029629288786, 'precisions': [0.1934156378600823, 0.025, 0.012658227848101266, 0.004273504273504274], 'brevity_penalty': 1.0, 'length_ratio': 1.6758620689655173, 'translation_length': 243, 'reference_length': 145}


## Examine the Process

**How to read the jsonl file?**

In [71]:
instruction_dataset_df = pd.read_json(filename, lines=True)
instruction_dataset_df

Unnamed: 0,prompt,completion
0,What is gender equality?,"Gender equality refers to the equal rights, re..."
1,Why is gender equality important in the workpl...,Gender equality in the workplace is crucial be...
2,How does gender equality benefit society?,Gender equality benefits society by promoting ...
3,What are some common misconceptions about gend...,Some common misconceptions about gender equali...
4,How can education play a role in promoting gen...,Education is a powerful tool for promoting gen...
5,What is the difference between gender equality...,"While both terms aim for fairness, gender equa..."
6,How do cultural norms impact gender equality?,Cultural norms play a significant role in shap...
7,Why is it essential to involve men and boys in...,Involving men and boys in the fight for gender...
8,How does gender equality relate to other forms...,Gender equality is interconnected with other f...
9,What are some actionable steps individuals can...,Individuals can promote gender equality by edu...


**Turn the file into dict**

In [72]:
examples = instruction_dataset_df.to_dict()
examples['prompt']

{0: 'What is gender equality?',
 1: 'Why is gender equality important in the workplace?',
 2: 'How does gender equality benefit society?',
 3: 'What are some common misconceptions about gender equality?',
 4: 'How can education play a role in promoting gender equality?',
 5: 'What is the difference between gender equality and gender equity?',
 6: 'How do cultural norms impact gender equality?',
 7: 'Why is it essential to involve men and boys in the fight for gender equality?',
 8: 'How does gender equality relate to other forms of equality?',
 9: 'What are some actionable steps individuals can take to promote gender equality in their communities?',
 10: 'How does media representation impact perceptions of gender roles?',
 11: 'What role do governments play in ensuring gender equality?',
 12: 'How does economic empowerment relate to gender equality?',
 13: 'What challenges do LGBTQ+ individuals face in the context of gender equality?',
 14: 'How can organizations foster a culture of ge

In [73]:
finetuning_dataset_loaded['prompt']

['What is gender equality?',
 'Why is gender equality important in the workplace?',
 'How does gender equality benefit society?',
 'What are some common misconceptions about gender equality?',
 'How can education play a role in promoting gender equality?',
 'What is the difference between gender equality and gender equity?',
 'How do cultural norms impact gender equality?',
 'Why is it essential to involve men and boys in the fight for gender equality?',
 'How does gender equality relate to other forms of equality?',
 'What are some actionable steps individuals can take to promote gender equality in their communities?',
 'How does media representation impact perceptions of gender roles?',
 'What role do governments play in ensuring gender equality?',
 'How does economic empowerment relate to gender equality?',
 'What challenges do LGBTQ+ individuals face in the context of gender equality?',
 'How can organizations foster a culture of gender equality?',
 'What is the significance of int