**Instruction fine tuning on News Question Answer Pairs Dataset**


1. Choose an LLM that can be finetuned using any freely available GPU based machine/
environment.
2. Prepare dataset using QA pairs generated in assignment 2 inline with an existing instruction
tuning dataset format like alpaca etc.
3. Finetune model on different dataset sizes for example using 1000, 2000, 5000 questions
etc. Must have at least 4 variants.
4. Give comparison of the model output for same using original LLM, as well as all finetuned
checkpoints to be logged and shared with assignment report.
5. Assignment report to have details of model selected, fine tuning strategy as well as outputs.
6. Select any evaluation benchmark available online and evaluate


mount to google drive for storing check points and logs

In [None]:
# mount to the google drive
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


insatllation of required packages

In [None]:
!pip install transformers
!pip install datasets
!pip install wandb
!pip install rouge_score

In [None]:
# install Unsloth for loading and fine tuning Llama3 model (may take 1-2 mins)
%%capture
# Xformers (Flash Attention) and other pacakges!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes

Path and working directory set up

In [None]:
# make sure the path exits
# path to the qa generated dataset in the assignmet 2
qa_pairs_path = "/gdrive/MyDrive/qa_pairs_gpt35_turbo1.csv" # path to your assignment2 QA csv file path
import os
if os.path.exists(qa_pairs_path):
  print('the path exists!!')
else:
  print('the path doesnt exists')

the path exists!!


In [None]:
# create assignmetn 3 directory
root_dir = "/gdrive/MyDrive/"
directory_name = "genai_assignment3"
assignment3_dir_path = os.path.join(root_dir, directory_name)

# create directory if already does not exists and don't create if exists
os.makedirs(assignment3_dir_path, exist_ok = True)

# navigate to the assignment3 directory
os.chdir(assignment3_dir_path)

In [None]:
%pwd

'/gdrive/MyDrive/genai_assignment3'

In [None]:
%ls

In [None]:
%rm -rf huggingface_tokenizers_cache/ outputs/

Chunking the dataset in to different sizes like 1k, 2k, 5k, or any other value and then creating instruction dataset like the Stanford alpaca dataset

In [None]:
# chunk the qa pairs into 1000, 3000, or any other value
from typing import Tuple, List, Union
from pathlib import Path
import json
import csv
import pandas as pd
import random
import sys
import time

# specify the chunk size here
class Chunker:

  def __init__(self, data_file: Path = None) -> None:
    self.data_file = data_file

  # read and return a pandas data frame
  def read_dataset(self):
    try:
      df = pd.read_csv(self.data_file)
      return df
    except FileNotFoundError:
      print('File does not exist')
      sys.exit()

  # shuffle the data frame content inplace and reset the index
  def randomize_dataset(self):
    df = self.read_dataset()
    random_df = df.sample(frac=1).reset_index(drop=True)
    return random_df

  # chunk the dataset now
  def get_data_chunk(self, chunk_size: int, save: bool = True):
    random_data = self.randomize_dataset()
    chunked_data = random_data.iloc[:chunk_size]
    if save:
      os.makedirs("chunked_data", exist_ok = True)
      chunked_data.to_csv(f"chunked_data/qa_chunked_data_{chunk_size}.csv", index=False)
    return chunked_data


# specify chunk size and run
chunk_size = 1000 # tip: specify the chunk size as 8500 so you can later save checkpoints after each 2000 examples with four different variants
chunker = Chunker(qa_pairs_path)
df = chunker.get_data_chunk(chunk_size)
print(len(df))


1000


In [None]:
df.head()

Unnamed: 0,Question,Answer
0,What was Wanindu Hasaranga's bowling figures i...,Wanindu Hasaranga took 4 wickets for 45 runs.
1,What is the current status of the comments sec...,The comments section is undergoing an overhaul...
2,What did activists do to a portrait at the Uni...,Activists sprayed color on a Balfour portrait ...
3,What are the possible ways in which the four p...,"They can act individually, collusively, or col..."
4,Who are now in parliament as referred to in th...,Some of the lawyers who contested the general ...


In [None]:
# create Alpaca like dataset

def create_dataset(chunked_data: Path):
  """
  Prepare the QA dataset in the Alpaca instruction fine-tunnig format
  -------------------------------------------------------------------

  Parameters
  ----------
  chunked_data: splitted qa pairs dataset with 1000, 2000, or any other size
  """
  alpaca_format_data = []

  # read a csv file and store question answer pairs in the dataset list
  with open(chunked_data, 'r', encoding='utf-8') as file:
      reader = csv.reader(file)
      # skip header
      next(reader)

      # in this case we don't have any input so keep it empty
      for row in reader:
          question, answer = row
          alpaca_format_data.append({
              "instruction": question,
              "input": "",
              "output": answer
          })

  # Write data to a json file with same name as the chunked file name
  save_data = chunked_data.split('/')[-1].split('.')[-2] + '.json'
  fine_tune_dataset_dir = "fine_tuning_dataset"
  os.makedirs(fine_tune_dataset_dir, exist_ok = True)
  with open(os.path.join(fine_tune_dataset_dir, save_data), 'w', encoding='utf-8') as file:
      json.dump(alpaca_format_data, file, ensure_ascii=False, indent=4)

  print("Dataset successfully converted and written to", save_data)


specify the chunked dataset path and create insturction QA dataset stored in a json file

In [None]:
# test dataset preparation code
chunked_dataset_path = "/gdrive/MyDrive/genai_assignment3/chunked_data/qa_chunked_data_1000.csv" # path to chunked data
create_dataset(chunked_dataset_path)

Dataset successfully converted and written to qa_chunked_data_1000.json


read the instruction fine tuning QA dataset

In [None]:
fine_tunning_dataset = "/gdrive/MyDrive/genai_assignment3/fine_tuning_dataset/qa_chunked_data_1000.json" # path to fine tunnig data created in alpaca format
with open(fine_tunning_dataset, 'r') as f:
  dataset = json.load(f)

# show first example
print(dataset[0])
print(len(dataset))

{'instruction': "What was Wanindu Hasaranga's bowling figures in the match?", 'input': '', 'output': 'Wanindu Hasaranga took 4 wickets for 45 runs.'}
1000


By following the same technique, the dataset can be converted to Alpaca format with different chunk size.

logs loss and other logs to weights adn biases

login to huggingface to create repo to push model and checkpoins adn access

In [None]:
# log resluts to weifghts and baises for nice visualiztaions of loss, and utilizations of resources
import wandb
import random

# start a new wandb run to track this script
wandb.init(
    # set the wandb project where this run will be logged
    project="llama3 fine tunning",

    # track hyperparameters and run metadata
    config={
    "learning_rate": 2e-5,
    "architecture": "llama2",
    "dataset": "QA in alpaca format",
    "epochs": 1,
    }
)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
# from huggingface_hub import login
# # login to hugging face using your google account, go to settings and and copy access token and paste in the following token space
# login(token = "your hugging face login token")

**Loading an LLM - Llama3 8B in QLora**


In [None]:
# load model in 4 bit
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True # Use 4bit quantization to reduce memory usage

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
tokenizer.padding_side = 'right'

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0,  Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  # Rank stablization LORA
    loftq_config = None,
)

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


load, preprocess, and split the dataset into training and evaluation sets with 20% evaluation data

In [None]:
# prompt
QA_prompt = """
Below is an instruction that describes a task.  Write a response that appropriately completes the request.

### Instruction:
{}


### Response:
{}
"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    outputs      = examples["output"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        # EOS_TOKEN added, otherwise model will generate forevoer
        text = QA_prompt.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

pass

from datasets import load_dataset
dataset = load_dataset("json", data_files= fine_tunning_dataset, split = "train").train_test_split(test_size = 0.02) # tip: with 8500 example specify test_Size as 0.01 to keep only the smaller examples in test set as google has limitations
dataset = dataset.map(formatting_prompts_func, batched = True,)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/980 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

In [None]:
print(dataset['train']) # data properties

Dataset({
    features: ['output', 'instruction', 'input', 'text'],
    num_rows: 980
})


In [None]:
print(dataset['test'])

Dataset({
    features: ['output', 'instruction', 'input', 'text'],
    num_rows: 20
})


In [None]:
# show some evaluation samples for testing model with and without fine tuning
# to see how closer it goes to the actual output
for i, data in enumerate(dataset['test']):
  print(f"Instruction: {data['instruction']}")
  print(f"Output: {data['output']}")
  print()

  if i >=5:
    break


Instruction: When is the current nine-month IMF facility expected to end?
Output: The current nine-month facility is expected to end soon.

Instruction: What is the sentiment of younger voters in general towards the current political climate?
Output: Younger voters in general are becoming more aware and are making decisions based on the reality of the situation rather than fear.

Instruction: How has a hung parliament impacted the PTI's position?
Output: The hung parliament has increased the PTI's leverage to play tough and negotiate better deals.

Instruction: Will the comments section be accessible soon?
Output: Yes, the comments section is undergoing maintenance and will return shortly.

Instruction: What is journalist Umber Khairi's opinion on Israel's actions in the Gaza war?
Output: Umber Khairi stated that Isreal was obliterating traces of Palestinian life and culture in the Gaza war.

Instruction: What happened a month before the rally when farmers tried to march outside New De

running model before fine tunning to check its generation adn comparing it with the acdtual output

model output before fine tunning

In [None]:
# run inference on LLm without fine tunning on the dataset
import warnings
warnings.filterwarnings("ignore")

# quetsion
instruction = "What is the capital of Pakistan?"

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    QA_prompt.format(
        f"{instruction}", # instruction
        "",
    )
], return_tensors = "pt").to("cuda")
# model.generation_config.pad_token_ids = tokenizer.pad_token_id
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True, pad_token_id=tokenizer.eos_token_id)


response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0] #tokenizer.batch_decode(gen_tokens[:, input_ids.shape[1]:])[0]
print(response)



Below is an instruction that describes a task.  Write a response that appropriately completes the request.

### Instruction:
What is the capital of Pakistan?


### Response:

Pakistan's capital is Islamabad.



In [None]:
# post porcess model response to get only the response
def get_response_text(text):
  """
  This function extracts the text within the ### Response: section, stopping at the next section marker (###).

  Args:
      text: The text containing the response section.

  Returns:
      The extracted response text, or an empty string if no response is found.
  """
  lines = text.splitlines()
  response_start = None

  for i, line in enumerate(lines):
    if line.startswith("### Response:"):
      response_start = i + 1
      break

  if response_start is not None:
    # Find the next line that starts with "#" (indicating the end of Response)
    for j in range(response_start, len(lines)):
      if lines[j].startswith("###"):
        response_end = j
        break
      else:
        response_end = len(lines)  # Set end to last line if no next section marker

    # Extract the response text between the start and end lines.
    return "\n".join(lines[response_start:response_end])
  else:
    # No response section found.
    return ""

In [None]:
# clean the model response to only extract desired output
model_output = get_response_text(response)
print(model_output)


Pakistan's capital is Islamabad.


In [None]:
total_epochs = 1 # tip: one epoch is enough, don't increase it, otherise colab will crash

tips:
- chunk your dataset in 1k, 2k, 3k, and 5k (four variants)
- if the dataset is small 'total_epochs' value should be greater between 5 - 10
- if the dataset size is large, keep it low to avoid crashing the colab (because free gpu has usage limit), total_epochs value 3 should be fine



In [None]:
# eval model
import torch
from datasets import load_metric
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load evaluation metrics
rouge = load_metric("rouge")
bleu = load_metric("bleu")

def evaluate_model(model, tokenizer, test_dataset):
    model.eval()
    predictions = []
    references = []

    for example in test_dataset:
        instruction = example["instruction"]
        reference = example["output"]

        inputs = tokenizer([
        QA_prompt.format(
        f"{instruction}", # instruction
        "")
          ], return_tensors = "pt").to("cuda")


        outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True, pad_token_id=tokenizer.eos_token_id)

        prediction = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0] #tokenizer.batch_decode(gen_tokens[:, input_ids.shape[1]:])[0]
        prediction = get_response_text(prediction)
        predictions.append(prediction)
        references.append(reference)
    # calculate scores now
    rouge_result = rouge.compute(predictions=predictions, references=references)
    bleu_result = bleu.compute(predictions=[pred.split() for pred in predictions],
                               references=[[ref.split()] for ref in references])


    results = {
        "rouge": rouge_result,
        "bleu": bleu_result
    }

    return results

# call eval function
def compute_metrics(eval_preds):
    model, tokenizer, test_dataset = eval_preds
    return evaluate_model(model, tokenizer, test_dataset)


In [None]:
# adding custom callback to save checkpoints at different example intervals

from transformers import TrainerCallback, TrainerState, TrainerControl
import os
import math

class CustomCheckpointCallback(TrainerCallback):
    def __init__(self, examples_interval,  test_dataset):
        self.examples_interval = examples_interval  # Number of examples between checkpoints
        self.test_dataset = test_dataset
        # self.tokenizer = tokenizer
        self.steps_to_save = []  # To be calculated based on batch size and accumulation steps
        self.results = []

    def on_train_begin(self, args, state, control, **kwargs):
        # Calculate the steps at which to save checkpoints based on examples
        max_examples = state.max_steps * (args.per_device_train_batch_size * args.gradient_accumulation_steps)
        self.steps_to_save = [math.ceil(i / (args.per_device_train_batch_size * args.gradient_accumulation_steps))
                              for i in range(self.examples_interval, max_examples + 1, self.examples_interval)]

    def on_step_end(self, args, state: TrainerState, control: TrainerControl, **kwargs):
        if state.global_step in self.steps_to_save:
            control.should_save = True  # Save checkpoint at this step
        else:
            control.should_save = False  # Do not save checkpoint at this step

        return control

    def on_save(self, args, state, control, **kwargs):
        # Rename the checkpoint directory after saving
        step = state.global_step
        if step in self.steps_to_save:
            # Calculate the example count based on the current step
            example_count = step * (args.per_device_train_batch_size * args.gradient_accumulation_steps)
            checkpoint_dir = os.path.join(args.output_dir, f"checkpoint-{state.global_step}")
            new_checkpoint_dir = os.path.join(args.output_dir, f"checkpoint-{example_count}")

            if os.path.exists(checkpoint_dir):
                os.rename(checkpoint_dir, new_checkpoint_dir)
                print(f"Checkpoint saved and renamed to: {new_checkpoint_dir}")

                # Evaluate the current state of the model
                results = evaluate_model(kwargs['model'], kwargs['tokenizer'], self.test_dataset)
                print(f"Evaluation results for checkpoint-{example_count}: {results}")

                # Store results with checkpoint name
                self.results.append({
                    "checkpoint": f"checkpoint-{example_count}",
                    "results": results
                })

    def on_train_end(self, args, state, control, **kwargs):
        # Save results to a file or return them as needed
        results_file = os.path.join(args.output_dir, "evaluation_results.json")
        with open(results_file, "w") as f:
            json.dump(self.results, f, indent=4)
        print(f"Saved evaluation results to {results_file}")


# Define the interval for saving checkpoints based on examples processed
examples_interval = 250 # tip: for 8500 exapmles you chunked, specify example interval 2000, so you save checkpoins after each 2000 examples
batch_size = 2
gradient_accumulation_steps = 4
# Calculate the number of steps to save checkpoints based on the interval
effective_batch_size = batch_size * gradient_accumulation_steps
steps_per_checkpoint = math.ceil(examples_interval / effective_batch_size)

In [None]:
# supervisd trainer from hugging face
from trl import SFTTrainer
from transformers import TrainingArguments

custom_callback = CustomCheckpointCallback(
    examples_interval=examples_interval,
    test_dataset=dataset['test'],
)


trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset['train'],
    eval_dataset = dataset['test'],
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        num_train_epochs = total_epochs,
        per_device_train_batch_size = batch_size,
        gradient_accumulation_steps = gradient_accumulation_steps,
        warmup_steps = 5,
        learning_rate = 2e-5,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        report_to = "wandb",
        output_dir = "outputs",
        save_steps=steps_per_checkpoint,  # Default value, actual saving handled by callback
        save_total_limit=5,
    ),
     callbacks=[custom_callback],
)

In [None]:
# Show current memory stats before fine tuning
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.92 GB of memory reserved.


In [None]:
# start fine tuning, run when everything is ready
run_trainer = True
if run_trainer:
  trainer_stats = trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 980 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 122
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,3.4381
2,3.3702
3,3.6869
4,3.4337
5,3.6276
6,3.6417
7,3.4763
8,3.5709
9,3.6952
10,3.5886


Checkpoint saved and renamed to: outputs/checkpoint-256
Evaluation results for checkpoint-256: {'rouge': {'rouge1': AggregateScore(low=Score(precision=0.15654146993893603, recall=0.3865647481603365, fmeasure=0.21867097511251327), mid=Score(precision=0.23365967539906432, recall=0.48702328717034593, fmeasure=0.29937807782880577), high=Score(precision=0.32459718746214483, recall=0.5908896173271174, fmeasure=0.39455445706748354)), 'rouge2': AggregateScore(low=Score(precision=0.06766396715689575, recall=0.1729091678338002, fmeasure=0.09265387787906192), mid=Score(precision=0.13724042629485583, recall=0.27635727018079953, fmeasure=0.17319010813309338), high=Score(precision=0.21942184939644208, recall=0.38284342373312963, fmeasure=0.267563991844835)), 'rougeL': AggregateScore(low=Score(precision=0.13548812177352423, recall=0.3422256230289319, fmeasure=0.19073594316939022), mid=Score(precision=0.21501960619926988, recall=0.4348104775126834, fmeasure=0.2740878461962327), high=Score(precision=0.

In [None]:
# Show final memory and time stats after fine tunning
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory/max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)


print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

590.604 seconds used for training.
9.84 minutes used for training.
Peak reserved memory = 7.105 GB.
Peak reserved memory for training = 1.185 GB.
Peak reserved memory % of max memory = 48.176 %.
Peak reserved memory for training % of max memory = 8.035 %.


In [None]:
from transformers import TextStreamer

def stream(instruction, model, tokenizer):

    FastLanguageModel.for_inference(model) # Enable native 2x faster inference
    inputs = tokenizer([
        QA_prompt.format(
            f"{instruction}", # instruction
            "",
        )], return_tensors = "pt").to("cuda")

    streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

    # Despite returning the usual output, the streamer will also print the generated text to stdout.
    _ = model.generate(**inputs, streamer=streamer, max_new_tokens=64)


output = stream("Where is islamabad located?", model, tokenizer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Islamabad is located in Pakistan.




model output is close to actual label, checkpoints and logs will be logged to genai_assignmet3 directory with model_save_name

loading checkpoints and tokenizer model from the directory, should be at the model root directory like assignment3 directory

In [None]:
%pwd

'/gdrive/MyDrive/genai_assignment3'

plot evaluation resluts on differnt checkpoints saved

In [None]:
import json
import pandas as pd

# json path that stores results with different checkpoints
results_file = "/gdrive/MyDrive/genai_assignment3/outputs/evaluation_results.json"# path to saved results for evaluation


with open(results_file, "r") as f:
    data = json.load(f)

# json file to pandas Dtaa Frame
rows = []
for entry in data:
    checkpoint = entry["checkpoint"]
    results = entry["results"]

    rouge = results["rouge"]
    bleu = results["bleu"]

    row = {
        "checkpoint": checkpoint,
        "rouge1_precision": rouge["rouge1"][0][0],
        "rouge1_recall": rouge["rouge1"][1][1],
        "rouge1_f1": rouge["rouge1"][2][2],
        "rouge2_precision": rouge["rouge2"][0][0],
        "rouge2_recall": rouge["rouge2"][1][1],
        "rouge2_f1": rouge["rouge2"][2][2],
        "rougeL_precision": rouge["rougeL"][0][0],
        "rougeL_recall": rouge["rougeL"][1][1],
        "rougeL_f1": rouge["rougeL"][2][2],
        "rougeLsum_precision": rouge["rougeLsum"][0][0],
        "rougeLsum_recall": rouge["rougeLsum"][1][1],
        "rougeLsum_f1": rouge["rougeLsum"][2][2],
        "bleu_score": bleu["bleu"],
        "bleu_precision_1": bleu["precisions"][0],
        "bleu_precision_2": bleu["precisions"][1],
        "bleu_precision_3": bleu["precisions"][2],
        "bleu_precision_4": bleu["precisions"][3],
        "brevity_penalty": bleu["brevity_penalty"],
        "length_ratio": bleu["length_ratio"],
        "translation_length": bleu["translation_length"],
        "reference_length": bleu["reference_length"]
    }
    rows.append(row)

df = pd.DataFrame(rows)
df.head()


Unnamed: 0,checkpoint,rouge1_precision,rouge1_recall,rouge1_f1,rouge2_precision,rouge2_recall,rouge2_f1,rougeL_precision,rougeL_recall,rougeL_f1,...,rougeLsum_f1,bleu_score,bleu_precision_1,bleu_precision_2,bleu_precision_3,bleu_precision_4,brevity_penalty,length_ratio,translation_length,reference_length
0,checkpoint-256,0.156541,0.487023,0.394554,0.067664,0.276357,0.267564,0.135488,0.43481,0.373951,...,0.371041,0.067811,0.175841,0.08189,0.048701,0.030151,1.0,2.564706,654,255
1,checkpoint-504,0.191801,0.517457,0.447206,0.109355,0.30365,0.330719,0.168087,0.477158,0.415697,...,0.421841,0.070118,0.173217,0.088323,0.052388,0.030159,1.0,2.694118,687,255
2,checkpoint-752,0.146409,0.502901,0.372702,0.061441,0.268931,0.237017,0.118072,0.438423,0.334316,...,0.332876,0.056161,0.154667,0.068399,0.04073,0.023088,1.0,2.941176,750,255


In [None]:
# load pretrained model
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference


# alpaca_prompt = You MUST copy from above!
inputs = tokenizer(
[
    QA_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
# tokenizer.batch_decode(outputs)
print("\nModel Response:")
print((tokenizer.batch_decode(outputs)[0]))


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Model Response:
<|begin_of_text|>
Below is an instruction that describes a task.  Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?


### Response:

The Eiffel Tower is a famous tall tower in Paris.
<|end_of_text|>


checkpoints and logged will be stored in the genai_assignment3 directory.

In [None]:
import cv2
import torch
import urllib.request
from torchvision.transforms import Compose, Normalize, ToTensor

# Download MiDaS model
model_url = "https://github.com/isl-org/MiDaS/releases/download/v2_1/model-small.onnx"
urllib.request.urlretrieve(model_url, "model-small.onnx")

# Load MiDaS model
model = torch.hub.load("intel-isl/MiDaS", "MiDaS_small")
model.eval()

# Define preprocessing function
transform = Compose([
    ToTensor(),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load and preprocess image
image = cv2.imread("image_path.jpg")
input_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_image = transform(input_image).unsqueeze(0)

# Predict depth map
with torch.no_grad():
    depth_map = model(input_image)

# Convert depth map to distance (example scaling)
depth_map = depth_map.squeeze().numpy()
depth_map = (depth_map - depth_map.min()) / (depth_map.max() - depth_map.min())  # Normalizing depth map
depth_map = depth_map * max_distance  # Scale to max distance if camera parameters are known

# Display depth map
import matplotlib.pyplot as plt
plt.imshow(depth_map, cmap="plasma")
plt.colorbar()
plt.show()
