# Task 0 - Fine tuning

The objective of this task is to fine-tune any transformer based model(LLM included) of your choice on the following [dataset](https://www.kaggle.com/datasets/marawanxmamdouh/dialogsum). Perform Rouge and bleu evaluation.


# Task 1 - Model tweaking

Post obtaining the scores improve the scores further by implementing feature engineering / Data engineering strategies on the dataset. Showcase atleaset 10% improvement from the base score. And explain in breif why the particular stratey was chosen.


# Task 2 - Model Optimisation
The objective of this task is to optimize your model and bring down its latency and size to atleast 40 percent of its base model (tweaked model). You are free to use any optimization strategy and framework.

It is also your responsibility to showcase and compare the latency and model size of between the base model and optimized model using any python tool of your choice.

Also perform Rouge and bleu evaluation on the optimized model  and compare it with the base model (tweaked model) .

# Loading Data

In [7]:
## Importing required libraries for EDA
import os
import numpy as np
import pandas as pd
from glob import glob
from tqdm import tqdm_notebook


from transformers import set_seed
seed = 42
set_seed(seed)

In [8]:
BASE_PATH = "/Documents/assignments/"
DATA_PATH = f"{BASE_PATH}data/CSV/"



In [9]:
DATA = {}
for data_path in glob(DATA_PATH+"**.csv"):
    data_name = data_path.split("/")[-1].replace(".csv", "")
    DATA[data_name] = pd.read_csv(data_path)

In [10]:
DATA["train"]

Unnamed: 0,id,dialogue,summary,topic
0,train_0,"#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. ...","Mr. Smith's getting a check-up, and Doctor Haw...",get a check-up
1,train_1,"#Person1#: Hello Mrs. Parker, how have you bee...",Mrs Parker takes Ricky for his vaccines. Dr. P...,vaccines
2,train_2,"#Person1#: Excuse me, did you see a set of key...",#Person1#'s looking for a set of keys and asks...,find keys
3,train_3,#Person1#: Why didn't you tell me you had a gi...,#Person1#'s angry because #Person2# didn't tel...,have a girlfriend
4,train_4,"#Person1#: Watsup, ladies! Y'll looking'fine t...",Malik invites Nikki to dance. Nikki agrees if ...,dance
...,...,...,...,...
12455,train_12455,#Person1#: Excuse me. You are Mr. Green from M...,Tan Ling picks Mr. Green up who is easily reco...,pick up someone
12456,train_12456,#Person1#: Mister Ewing said we should show up...,#Person1# and #Person2# plan to take the under...,conference center
12457,train_12457,#Person1#: How can I help you today?\n#Person2...,#Person2# rents a small car for 5 days with th...,rent a car
12458,train_12458,#Person1#: You look a bit unhappy today. What'...,#Person2#'s mom lost her job. #Person2# hopes ...,job losing


## Approach:
Since this is an summary generation, it is better to use LLM fine tuning. We'll first check if the LLM model gives expected results from zero shot, one shot fine tuning and consider those prompt fine tuned results as base results.

# Seting Up Mflow Tracking For Model Fine Tuning

## Base Model

In [11]:
## Importing model finetuning related packages

import torch
from huggingface_hub import snapshot_download
import transformers
import evaluate
import mlflow

from mlflow.models.signature import ModelSignature, infer_signature
from mlflow.types import DataType, Schema, ColSpec, ParamSchema, ParamSpec


In [12]:
DEVICE =  torch.device("cuda" if torch.cuda.is_available() else "cpu")
MODEL_NAME = "google/flan-t5-small"


In [13]:
# Download the MPT-7B instruct model and tokenizer to a local directory cache
MODEL_NAME = "google/flan-t5-small"

snapshot_location = snapshot_download(repo_id=MODEL_NAME, local_dir="../mlflow_practice/models/flan_t5/")

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

In [8]:
# SOURCE : MLFlow Documentation
class Summarization(mlflow.pyfunc.PythonModel):
    def __init__(self, model_name):
        self.model_name = model_name
        
    def load_context(self, context):
        """
        This method initializes the tokenizer.
        """
        
        # Initialize tokenizer and language model
        self.tokenizer = transformers.AutoTokenizer.from_pretrained(
            context.artifacts["snapshot"], padding_side="left"
        )

        config = transformers.AutoConfig.from_pretrained(
            context.artifacts["snapshot"], trust_remote_code=True
        )
      
        self.model = transformers.AutoModelForSeq2SeqLM.from_pretrained(
            context.artifacts["snapshot"],
            config=config,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )

         # If running on a GPU-compatible environment if available.   
        self.model.to(device=DEVICE)

        self.model.eval()


    def predict_each(self, context, model_input, params=None):
        """
        This method generates prediction for the given input.
        """
        if params is None:
            params = {}
        conv_log = model_input

        # Retrieve or use default values for temperature and max_tokens
        temperature = params.get("temperature") or 0.2
        max_tokens = params.get("max_tokens") or 1000

        print(f"using temperature {temperature} and max_tokens {max_tokens}")

        # Build the prompt
        prompt = conv_log

        # Encode the input and generate prediction
        # NB: Sending the tokenized inputs to the GPU here explicitly will not work if your system does not have CUDA support.
        # If attempting to run this with GPU support, change 'cpu' to 'cuda' for maximum performance
        
        encoded_input = self.tokenizer.encode(prompt, return_tensors="pt").to(DEVICE)
        output = self.model.generate(
            encoded_input,
            do_sample=True,
            temperature=temperature,
            max_new_tokens=max_tokens,
        )

        
        # Decode the prediction to text
        generated_text = self.tokenizer.decode(output[0], skip_special_tokens=True)

        # Removing the prompt from the generated text
        prompt_length = len(self.tokenizer.encode(prompt, return_tensors="pt")[0])
        generated_response = self.tokenizer.decode(
            output[0], skip_special_tokens=True
        )

        return generated_response

    def predict(self, context, model_input, params=None):
        """
        This method generates prediction for the given input.
        """
        generated_response = []
        
        for inputs in model_input["dialogue"].values:
            
            pred_ = self.predict_each(context, inputs, params)
            generated_response.append(pred_)

        print(f'returning generated response as : {generated_response}')
        return {"candidates": generated_response}

In [9]:
signature = infer_signature(DATA["train"]["dialogue"],DATA["train"]["summary"])
# Define input example
input_example = pd.DataFrame({"prompt": ["#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor."]})


In [10]:
approach_name="base"
base_tracker = mlflow.set_experiment(experiment_name=approach_name)


2024/01/26 19:04:58 INFO mlflow.tracking.fluent: Experiment with name 'base' does not exist. Creating a new experiment.


## Tracking Model

#### Logging base model metrics

In [11]:
## Evaluation the base model without finetuning

import numpy as np
from bleu import list_bleu
from nltk.translate import bleu_score
from datasets import load_metric


bleu_metric = evaluate.load("bleu")
rouge_metric = load_metric("rouge")

def eval_fn(predictions, targets, metrics):
    predictions = list(predictions.values)

    for i, pred in enumerate(predictions):
        if pred:
            if (len(pred) == 0) or len(pred.strip()) == 0:
                predictions[i] = '<unk>'
        else:
            predictions[i] = '<unk>'
    targets = list(targets.values)
    if len(targets) == 1:
        targets = [targets]
    else:
        targets = [[i] for i in targets]
        
    print(f'comput bleu pred: {predictions}, true  ; {targets}')
    bleu_metric.add_batch(predictions=predictions, references=targets)
    report = bleu_metric.compute()
    
    # report = mlflow.metrics.MetricValue(
    #     scores= report,
    #     # aggregate_results={"mean": np.mean(scores), "sum": np.sum(scores)},
    # )

    report.pop("precisions")
    return report


# Below metric can be added to mlflow extra_metrics
# bleu_score_metric = mlflow.metrics.make_metric(eval_fn=eval_fn, greater_is_better=True, name="bleu_score")


  rouge_metric = load_metric("rouge")


In [12]:
# Get the current base version of torch that is installed, without specific version modifiers
torch_version = torch.__version__.split("+")[0]
t5_model=Summarization("t5_model")

In [13]:

# Start an MLflow run context and log the model wrapper along with the param-included signature to
# allow for overriding parameters at inference time
with mlflow.start_run(experiment_id=base_tracker.experiment_id, run_name="model"):
    model_info = mlflow.pyfunc.log_model(
        "model",
        python_model=t5_model,
        # NOTE: the artifacts dictionary mapping is critical! This dict is used by the load_context() method in our MPT() class.
        artifacts={"snapshot": snapshot_location},
        pip_requirements=[
            f"torch=={torch_version}",
            f"transformers=={transformers.__version__}"],
        input_example=input_example,
        signature=signature
    )

Downloading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

2024/01/26 19:05:02 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


In [14]:
def basic_prompt(df):
    INSTRUCTION = "### Instruction:  For given below conversation between Person1 and Person2 generate summary"
    SAMPLE_RESPONSE = "### SUMMARY:"
    return  INSTRUCTION + df.dialogue + "\n" + SAMPLE_RESPONSE + df.summary

In [15]:
temp_data = DATA['test']
temp_data['prompt'] = basic_prompt(temp_data)

In [16]:
temp_data

Unnamed: 0,id,dialogue,summary,topic,prompt
0,test_0_1,"#Person1#: Ms. Dawson, I need you to take a di...",Ms. Dawson helps #Person1# to write a memo to ...,communication method,### Instruction: For given below conversation...
1,test_0_2,"#Person1#: Ms. Dawson, I need you to take a di...",In order to prevent employees from wasting tim...,company policy,### Instruction: For given below conversation...
2,test_0_3,"#Person1#: Ms. Dawson, I need you to take a di...",Ms. Dawson takes a dictation for #Person1# abo...,dictation,### Instruction: For given below conversation...
3,test_1_1,#Person1#: You're finally here! What took so l...,#Person2# arrives late because of traffic jam....,public transportation,### Instruction: For given below conversation...
4,test_1_2,#Person1#: You're finally here! What took so l...,#Person2# decides to follow #Person1#'s sugges...,transportation,### Instruction: For given below conversation...
...,...,...,...,...,...
1495,test_498_2,#Person1#: Matthew? Hi!\n#Person2#: Steve! Hav...,Matthew and Steve meet after a long time. Stev...,finding a house,### Instruction: For given below conversation...
1496,test_498_3,#Person1#: Matthew? Hi!\n#Person2#: Steve! Hav...,Steve has been looking for a place to live. Ma...,find a house,### Instruction: For given below conversation...
1497,test_499_1,"#Person1#: Hey, Betsy, did you hear the great ...",Frank invites Besty to the party to celebrate ...,party invitation,### Instruction: For given below conversation...
1498,test_499_2,"#Person1#: Hey, Betsy, did you hear the great ...",Frank invites Betsy to the big promotion party...,promotion party invitation,### Instruction: For given below conversation...


In [17]:
with mlflow.start_run(experiment_id=base_tracker.experiment_id, run_name="zero_shot"):
    results = mlflow.evaluate(
            model_info.model_uri,
            temp_data.iloc[:20],
            targets="summary",
            model_type="text-summarization",
        )

    zeroshot_results = results.tables['eval_results_table'][['summary', 'candidates']]
    zeroshot_bleu_score = eval_fn(zeroshot_results.candidates, zeroshot_results.summary, {})
    mlflow.log_metrics(zeroshot_bleu_score)
    mlflow.end_run()

  string_columns = trimmed_df.columns[(df.applymap(type) == str).all(0)]
  data = data.applymap(_hash_array_like_element_as_bytes)
  data = data.applymap(_hash_array_like_element_as_bytes)
2024/01/26 19:05:38 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/01/26 19:05:38 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.


using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000


2024/01/26 19:05:57 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...


returning generated response as : ['#Person2#: Thank you for your help.', '#Person2#: Thank you.', '#Person2#: Thank you for your help.', "It's not good for you, but for the environment.", "It's not good for me, but for the environment.", "It's not good for you, but for the environment.", 'After the divorce, they will split up.', "It's early in the New Year.", "It's a good time to start a new year.", "Brian, I'm so happy you remember, Brian, you look beautiful today.", 'Brian, thanks for the party.', "Brian, I'm so happy you had a great party.", '#Person1#: Oh, I saw it!', "#Person1#: Oh, I'm sorry. I'm not sure.", "#Person1#: Well, I'm going to take a picture of the Olympic park.", '#Person1#: I am going to start a business and I want to start a business.', "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", '#Person1#: I am going to start a business! I am going to write up a business plan! I am going to write up a busines

Using default facebook/roberta-hate-speech-dynabench-r4-target checkpoint
2024/01/26 19:06:04 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/01/26 19:06:04 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge1
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge2
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeL
2024/01/26 19:06:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeLsum


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

comput bleu pred: ['#Person2#: Thank you for your help.', '#Person2#: Thank you.', '#Person2#: Thank you for your help.', "It's not good for you, but for the environment.", "It's not good for me, but for the environment.", "It's not good for you, but for the environment.", 'After the divorce, they will split up.', "It's early in the New Year.", "It's a good time to start a new year.", "Brian, I'm so happy you remember, Brian, you look beautiful today.", 'Brian, thanks for the party.', "Brian, I'm so happy you had a great party.", '#Person1#: Oh, I saw it!', "#Person1#: Oh, I'm sorry. I'm not sure.", "#Person1#: Well, I'm going to take a picture of the Olympic park.", '#Person1#: I am going to start a business and I want to start a business.', "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", '#Person1#: I am going to start a business! I am going to write up a business plan! I am going to write up a business plan! I am goi

In [19]:
results.tables['eval_results_table']

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,dialogue,topic,prompt,summary,candidates,token_count,toxicity/v1/score,rouge1/v1/score,rouge2/v1/score,rougeL/v1/score,rougeLsum/v1/score
0,test_0_1,"#Person1#: Ms. Dawson, I need you to take a di...",communication method,### Instruction: For given below conversation...,Ms. Dawson helps #Person1# to write a memo to ...,#Person2#: Thank you for your help.,10,0.000284,0.0,0.0,0.0,0.0
1,test_0_2,"#Person1#: Ms. Dawson, I need you to take a di...",company policy,### Instruction: For given below conversation...,In order to prevent employees from wasting tim...,#Person2#: Thank you.,7,0.001147,0.0,0.0,0.0,0.0
2,test_0_3,"#Person1#: Ms. Dawson, I need you to take a di...",dictation,### Instruction: For given below conversation...,Ms. Dawson takes a dictation for #Person1# abo...,#Person2#: Thank you for your help.,10,0.000284,0.060606,0.0,0.060606,0.060606
3,test_1_1,#Person1#: You're finally here! What took so l...,public transportation,### Instruction: For given below conversation...,#Person2# arrives late because of traffic jam....,"It's not good for you, but for the environment.",12,0.000142,0.125,0.066667,0.125,0.125
4,test_1_2,#Person1#: You're finally here! What took so l...,transportation,### Instruction: For given below conversation...,#Person2# decides to follow #Person1#'s sugges...,"It's not good for me, but for the environment.",12,0.000145,0.068966,0.0,0.068966,0.068966
5,test_1_3,#Person1#: You're finally here! What took so l...,discuss transportation,### Instruction: For given below conversation...,#Person2# complains to #Person1# about the tra...,"It's not good for you, but for the environment.",12,0.000142,0.074074,0.0,0.074074,0.074074
6,test_2_1,"#Person1#: Kate, you never believe what's happ...",divorce,### Instruction: For given below conversation...,#Person1# tells Kate that Masha and Hero get d...,"After the divorce, they will split up.",9,0.000143,0.076923,0.0,0.076923,0.076923
7,test_2_2,"#Person1#: Kate, you never believe what's happ...",divorce,### Instruction: For given below conversation...,#Person1# tells Kate that Masha and Hero are g...,It's early in the New Year.,8,0.000244,0.0,0.0,0.0,0.0
8,test_2_3,"#Person1#: Kate, you never believe what's happ...",discuss divorce,### Instruction: For given below conversation...,#Person1# and Kate talk about the divorce betw...,It's a good time to start a new year.,11,0.000194,0.0,0.0,0.0,0.0
9,test_3_1,"#Person1#: Happy Birthday, this is for you, Br...",birthday party,### Instruction: For given below conversation...,#Person1# and Brian are at the birthday party ...,"Brian, I'm so happy you remember, Brian, you l...",16,0.000137,0.133333,0.0,0.133333,0.133333


## FineTuning

### Prompt Fine Tuning

#### Zero Shot

In [20]:
fine_tune_tracker = mlflow.set_experiment(experiment_name="summarization_prompt_fine_tune")

2024/01/26 19:09:16 INFO mlflow.tracking.fluent: Experiment with name 'summarization_prompt_fine_tune' does not exist. Creating a new experiment.


In [21]:
def tuned_prompt(df):
    """
    """
    INSTRUCTION = "Given a conversation between two persons. Generate summary of the conversation that includes key points from the conversation which helps reader to understand the gist of conversation"
    SAMPLE_RESPONSE = "### SUMMARY:\n"
    return   INSTRUCTION + df.dialogue + "\n" + SAMPLE_RESPONSE + df.summary

In [22]:
temp_data = DATA['test']
temp_data['prompt'] = tuned_prompt(temp_data)

In [23]:

with mlflow.start_run(experiment_id=fine_tune_tracker.experiment_id, run_name="zero_shot_prompt"):
    
    results = mlflow.evaluate(
            model_info.model_uri,
            temp_data.iloc[:20],
            targets="summary",  # specify which column corresponds to the expected output
            model_type="text-summarization",  # model type indicates which metrics are relevant for this task
            # extra_metrics=[bleu_score_metric]
        )

    mlflow.log_table(results.tables['eval_results_table'][[
        "dialogue", "summary", "candidates"
    ]],artifact_file="zero_shot_results.json")

    zeroshot_results = results.tables['eval_results_table'][['summary', 'candidates']]
    zeroshot_bleu_score = eval_fn(zeroshot_results.candidates, zeroshot_results.summary, {})
    mlflow.log_metrics(zeroshot_bleu_score)
    mlflow.end_run()
    

  string_columns = trimmed_df.columns[(df.applymap(type) == str).all(0)]
  data = data.applymap(_hash_array_like_element_as_bytes)
  data = data.applymap(_hash_array_like_element_as_bytes)
2024/01/26 19:09:31 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/01/26 19:09:31 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.


using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000


2024/01/26 19:09:36 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/01/26 19:09:36 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/01/26 19:09:36 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity


returning generated response as : ['#Person2#: I will get this memo sent to all employees before 4 pm.', '#Person2#: I will be sending the memo to all employees before 4 pm.', '#Person2#: Thank you for your information.', "#Person1#: I'm not going to drive to work.", "It's not good for me, but for the environment.", "#Person1#: I'm not going to drive to work, but I'm going to be a little more relaxed.", 'The New Year is coming.', "It's not the first time that the couple are divorced.", 'The New Year is over.', '#Person1#: Happy Birthday, Brian.', 'Brian, thank you for coming in.', '#Person1#: Happy Birthday, Brian.', '#Person1#: I think it is a good sign.', '#Person1#: I think it is a good sign for foreign visitors.', "#Person1#: I think it's a good sign.", "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", "#Person2#: I think I 'll be able to start a business.", "#Person1#: I think I 'll just stick to my old job and save 

2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge1
2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge2
2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeL
2024/01/26 19:09:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeLsum


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

comput bleu pred: ['#Person2#: I will get this memo sent to all employees before 4 pm.', '#Person2#: I will be sending the memo to all employees before 4 pm.', '#Person2#: Thank you for your information.', "#Person1#: I'm not going to drive to work.", "It's not good for me, but for the environment.", "#Person1#: I'm not going to drive to work, but I'm going to be a little more relaxed.", 'The New Year is coming.', "It's not the first time that the couple are divorced.", 'The New Year is over.', '#Person1#: Happy Birthday, Brian.', 'Brian, thank you for coming in.', '#Person1#: Happy Birthday, Brian.', '#Person1#: I think it is a good sign.', '#Person1#: I think it is a good sign for foreign visitors.', "#Person1#: I think it's a good sign.", "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", "#Person2#: I think I 'll be able to start a business.", "#Person1#: I think I 'll just stick to my old job and save myself all the h

#### One Shot Prompt

In [26]:
def one_shot_prompt(df):
    """
    """
    print(f"predicting with one shot tuned prompt")
    INSTRUCTION = "Below is the conversation between two persons. Generate summary of the conversation,"
    INSTRUCTION2 = "Below is summary of another conversation, Generate the summary of the conversation "
    SAMPLE_RESPONSE = "### SUMMARY:\n"
    EXAMPLE_CONV = DATA["test"].iloc[5].dialogue
    EXAMPLE_SUMM = DATA["test"].iloc[5].summary
    return   INSTRUCTION + "\n" + EXAMPLE_CONV + "\n" + SAMPLE_RESPONSE +  "\n\n"  + EXAMPLE_SUMM + "\n\n" + INSTRUCTION2 + "\n\n" + df.summary + "\n\n"  + SAMPLE_RESPONSE


In [27]:
temp_data = DATA['test']
temp_data['prompt'] = one_shot_prompt(temp_data)

predicting with one shot tuned prompt


In [28]:

with mlflow.start_run(experiment_id=fine_tune_tracker.experiment_id, run_name="one_shot_prompt"):
    
    results = mlflow.evaluate(
            model_info.model_uri,
            temp_data.iloc[:20],
            targets="summary",  # specify which column corresponds to the expected output
            model_type="text-summarization",  # model type indicates which metrics are relevant for this task
        )

    mlflow.log_table(results.tables['eval_results_table'][[
        "dialogue", "summary", "candidates"
    ]],artifact_file="one_shot_results.json")

    oneshot_results = results.tables['eval_results_table'][['summary', 'candidates']]
    oneshot_bleu_score = eval_fn(oneshot_results.candidates, oneshot_results.summary, {})
    mlflow.log_metrics(zeroshot_bleu_score)
    
    mlflow.end_run()

  string_columns = trimmed_df.columns[(df.applymap(type) == str).all(0)]
  data = data.applymap(_hash_array_like_element_as_bytes)
  data = data.applymap(_hash_array_like_element_as_bytes)
2024/01/26 19:10:16 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/01/26 19:10:16 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.


using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000
using temperature 0.2 and max_tokens 1000


2024/01/26 19:10:40 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/01/26 19:10:40 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/01/26 19:10:40 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity


returning generated response as : ['#Person2#: Thank you for your help.', '#Person2#: Thank you for your time.', '#Person2#: I will send you a copy of the memo.', "It's not good for me, but for the environment.", "It's not good for you or for the environment.", "It's not good for me, but for the environment.", 'The New Year is coming.', 'The New Year is approaching.', 'The New Year is coming.', "Brian, I'm so happy you had a wonderful party.", "Brian, I'm so happy you have a good time.", 'Brian, thanks for coming in and enjoying the party.', "#Person1#: I think it is a good sign, but I don't think it is a good sign.", '#Person1#: Oh, I thought it would be great!', '#Person1#: I think it is a great sign.', '#Person1#: I am going to start a business and I am going to start a business.', '#Person1#: I am not going to start a business. I am going to start a business.', "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", "#Perso

2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge1
2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge2
2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeL
2024/01/26 19:10:42 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeLsum


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

comput bleu pred: ['#Person2#: Thank you for your help.', '#Person2#: Thank you for your time.', '#Person2#: I will send you a copy of the memo.', "It's not good for me, but for the environment.", "It's not good for you or for the environment.", "It's not good for me, but for the environment.", 'The New Year is coming.', 'The New Year is approaching.', 'The New Year is coming.', "Brian, I'm so happy you had a wonderful party.", "Brian, I'm so happy you have a good time.", 'Brian, thanks for coming in and enjoying the party.', "#Person1#: I think it is a good sign, but I don't think it is a good sign.", '#Person1#: Oh, I thought it would be great!', '#Person1#: I think it is a great sign.', '#Person1#: I am going to start a business and I am going to start a business.', '#Person1#: I am not going to start a business. I am going to start a business.', "#Person2#: I think I 'll just stick to my old job and save myself all the hassle of trying to start a business!", "#Person1#: I'm not a b

In [29]:
results.tables['eval_results_table']

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,dialogue,topic,prompt,summary,candidates,token_count,toxicity/v1/score,rouge1/v1/score,rouge2/v1/score,rougeL/v1/score,rougeLsum/v1/score
0,test_0_1,"#Person1#: Ms. Dawson, I need you to take a di...",communication method,Below is the conversation between two persons....,Ms. Dawson helps #Person1# to write a memo to ...,#Person2#: Thank you for your help.,10,0.000284,0.0,0.0,0.0,0.0
1,test_0_2,"#Person1#: Ms. Dawson, I need you to take a di...",company policy,Below is the conversation between two persons....,In order to prevent employees from wasting tim...,#Person2#: Thank you for your time.,10,0.000304,0.047619,0.0,0.047619,0.047619
2,test_0_3,"#Person1#: Ms. Dawson, I need you to take a di...",dictation,Below is the conversation between two persons....,Ms. Dawson takes a dictation for #Person1# abo...,#Person2#: I will send you a copy of the memo.,14,0.00151,0.162162,0.0,0.162162,0.162162
3,test_1_1,#Person1#: You're finally here! What took so l...,public transportation,Below is the conversation between two persons....,#Person2# arrives late because of traffic jam....,"It's not good for me, but for the environment.",12,0.000145,0.125,0.066667,0.125,0.125
4,test_1_2,#Person1#: You're finally here! What took so l...,transportation,Below is the conversation between two persons....,#Person2# decides to follow #Person1#'s sugges...,It's not good for you or for the environment.,11,0.000139,0.068966,0.0,0.068966,0.068966
5,test_1_3,#Person1#: You're finally here! What took so l...,discuss transportation,Below is the conversation between two persons....,#Person2# complains to #Person1# about the tra...,"It's not good for me, but for the environment.",12,0.000145,0.074074,0.0,0.074074,0.074074
6,test_2_1,"#Person1#: Kate, you never believe what's happ...",divorce,Below is the conversation between two persons....,#Person1# tells Kate that Masha and Hero get d...,The New Year is coming.,6,0.00016,0.083333,0.0,0.083333,0.083333
7,test_2_2,"#Person1#: Kate, you never believe what's happ...",divorce,Below is the conversation between two persons....,#Person1# tells Kate that Masha and Hero are g...,The New Year is approaching.,6,0.000168,0.0,0.0,0.0,0.0
8,test_2_3,"#Person1#: Kate, you never believe what's happ...",discuss divorce,Below is the conversation between two persons....,#Person1# and Kate talk about the divorce betw...,The New Year is coming.,6,0.00016,0.076923,0.0,0.076923,0.076923
9,test_3_1,"#Person1#: Happy Birthday, this is for you, Br...",birthday party,Below is the conversation between two persons....,#Person1# and Brian are at the birthday party ...,"Brian, I'm so happy you had a wonderful party.",12,0.00014,0.142857,0.0,0.142857,0.142857


##### One Shot Topic Stratagy(Please Ignore)

`Given an dialouge, lets try to give one example summary of same topic and see if we yield better results `

In [35]:
def one_shot_stratagy_prompt(df):
    """
    """
    topic = "shopping"
    print(f"predicting with one shot tuned prompt")
    INSTRUCTION = "### INSTRUCTION Below is the conversation between two persons. Generate summary of the conversation,"
    INSTRUCTION2 = "### INSTRUCTION2: Below is summary of another conversation, Generate the summary of the conversation "
    SAMPLE_RESPONSE = "### SUMMARY:\n"
    if topic in list(DATA['train'].topic.values):
        print(f"trying to {topic} prompt from training data")
        EXAMPLE_CONV = DATA['train'][DATA['train']['topic'] == topic].iloc[0].dialogue
        EXAMPLE_SUMM = DATA['train'][DATA['train']['topic'] == topic].iloc[0].summary
    else:
        EXAMPLE_CONV = DATA["train"].iloc[0].dialogue
        EXAMPLE_SUMM = DATA["train"].iloc[0].summary
    
    return   INSTRUCTION + "\n" + EXAMPLE_CONV + "\n" + SAMPLE_RESPONSE +  "\n\n"  + EXAMPLE_SUMM + "\n\n" + INSTRUCTION2 + "\n\n" + df.dialogue + "\n\n" + df.summary + "\n\n"  + SAMPLE_RESPONSE


In [36]:
temp_data = DATA['test']
temp_data = temp_data[temp_data.topic == 'shopping'].reset_index()
temp_data.drop(['index'], axis=1,inplace=True)
temp_data['prompt'] = one_shot_stratagy_prompt(temp_data)

predicting with one shot tuned prompt
trying to shopping prompt from training data


In [82]:
# fine_tuned_model._build_tuned_prompt = one_shot_stratagy_prompt

# # Start an MLflow run context and log the MPT-7B model wrapper along with the param-included signature to
# # allow for overriding parameters at inference time
# with mlflow.start_run(experiment_id=fine_tune_tracker.experiment_id, run_name="model"):
#     model_info = mlflow.pyfunc.log_model(
#         "model",
#         python_model=fine_tuned_model,
#         # NOTE: the artifacts dictionary mapping is critical! This dict is used by the load_context() method in our MPT() class.
#         artifacts={"snapshot": snapshot_location},
#         pip_requirements=[
#             f"torch=={torch_version}",
#             f"transformers=={transformers.__version__}"],
#         input_example=input_example,
#         signature=signature
#     )
    

In [37]:
temp_data.prompt.iloc[0]

"### INSTRUCTION Below is the conversation between two persons. Generate summary of the conversation,\n#Person1#: Ten sheets of rice paper, 25 brushes, two boxes of oil color and two boxes of water color. All these come up to $ 35. 50, sir.\n#Person2#: Ok, here is $ 50. Oh, can you make out an invoice for me?\n#Person1#: Sure, just a minute. Are you an artist, sir?\n#Person2#: No, I am a teacher. I teach art.\n#Person1#: That must be a very interesting job.\n#Person2#: It is. You must be new here. I do my shopping here regularly, once a week.\n#Person1#: Do you? Nice to meet you! And here is the invoice and your change.\n#Person2#: Thank you. Nice to meet you, too.\n### SUMMARY:\n\n\n#Person2# buys some drawing tools and asks for an invoice with #Person1#'s assistance.\n\n### INSTRUCTION2: Below is summary of another conversation, Generate the summary of the conversation \n\n#Person1#: Can I help you?\n#Person2#: I'd like to buy a new mobile phone please.\n#Person1#: Ok, would you like

In [84]:
temp_data

Unnamed: 0,id,dialogue,summary,topic,prompt
0,test_19_2,#Person1#: Can I help you?\n#Person2#: I'd lik...,#Person2# wants to buy a new mobile phone from...,shopping,### INSTRUCTION Below is the conversation betw...
1,test_19_3,#Person1#: Can I help you?\n#Person2#: I'd lik...,#Person2# wants to buy a new mobile phone from...,shopping,### INSTRUCTION Below is the conversation betw...
2,test_72_2,#Person1#: We need to do a group report tomorr...,#Person1# and #Person2# make a shopping list a...,shopping,### INSTRUCTION Below is the conversation betw...
3,test_164_1,#Person1#: Does it look like a good fit?\n#Per...,#Person1# buys some nice clothes by credit car...,shopping,### INSTRUCTION Below is the conversation betw...
4,test_233_1,#Person1#: What do you think of my new suit?\n...,#Person1# bought a new suit with $ 150 and #Pe...,shopping,### INSTRUCTION Below is the conversation betw...
5,test_249_2,#Person1#: How can I help you?\n#Person2#: wel...,#Person2# goes to shop for clothes and is told...,shopping,### INSTRUCTION Below is the conversation betw...
6,test_249_3,#Person1#: How can I help you?\n#Person2#: wel...,#Person2# buys summer clothes with 20% off at ...,shopping,### INSTRUCTION Below is the conversation betw...
7,test_268_2,"#Person1#: Can I help you, sir?\n#Person2#: Ye...",#Person2# is surprised at a low price of produ...,shopping,### INSTRUCTION Below is the conversation betw...
8,test_268_3,"#Person1#: Can I help you, sir?\n#Person2#: Ye...",#Person2# buys gifts for his children and wife...,shopping,### INSTRUCTION Below is the conversation betw...
9,test_283_2,#Person1#: How do you like this brown dress? I...,#Person1# gives #Person2# a few suggestions on...,shopping,### INSTRUCTION Below is the conversation betw...


In [38]:
with mlflow.start_run(experiment_id=fine_tune_tracker.experiment_id, run_name="one_shot_prompt_similar_one_shot_prompt_example"):
    
    one_results = mlflow.evaluate(
            model_info.model_uri,
            temp_data.iloc[:20],
            targets="summary",  # specify which column corresponds to the expected output
            model_type="text-summarization",  # model type indicates which metrics are relevant for this task
        )

    mlflow.log_table(one_results.tables['eval_results_table'][[
        "dialogue", "summary", "candidates"
    ]],artifact_file="one_shot_results_with_examples.json")

    oneshot_results = one_results.tables['eval_results_table'][['summary', 'candidates']]
    oneshot_bleu_score = eval_fn(oneshot_results.candidates, oneshot_results.summary, {})
    mlflow.log_metrics(zeroshot_bleu_score)

    mlflow.end_run()

  string_columns = trimmed_df.columns[(df.applymap(type) == str).all(0)]
  data = data.applymap(_hash_array_like_element_as_bytes)
  data = data.applymap(_hash_array_like_element_as_bytes)
2024/01/26 11:18:04 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/01/26 11:18:04 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/01/26 11:18:08 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/01/26 11:18:08 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/01/26 11:18:08 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity


returning generated response as : ["#Person2#: I'd like a phone with a camera and MP3 player.", "#Person2#: I'd like a phone with a camera and MP3 player.", "You're not sighted.", "#Person1#: I'm sorry, I'm not sure.", "I don't think it's a good bargain.", 'You are welcome.', 'You are welcome.', "#Person1#: Okay, I'll take it.", "#Person1#: Okay, I'll take it.", 'What color would you like?', 'What color would you like?', "#Person1#: I'm sorry. I forgot to show it to you.", "#Person1#: I'm sorry. I forgot to show it to you.", "#Person2#: I'm going to buy a pair of shoes.", "#Person2#: I'll charge you.", "#Person2#: I'll charge you for that.", "#Person1#: I'm sorry, but I'm not interested in the traditional Chinese arts and crafts.", "#Person1#: I'm sorry.", "#Person1#: I'm sorry. I'm not interested in the sandalwood fan."]


2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge1
2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rouge2
2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeL
2024/01/26 11:18:09 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: rougeLsum


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

comput bleu pred: ["#Person2#: I'd like a phone with a camera and MP3 player.", "#Person2#: I'd like a phone with a camera and MP3 player.", "You're not sighted.", "#Person1#: I'm sorry, I'm not sure.", "I don't think it's a good bargain.", 'You are welcome.', 'You are welcome.', "#Person1#: Okay, I'll take it.", "#Person1#: Okay, I'll take it.", 'What color would you like?', 'What color would you like?', "#Person1#: I'm sorry. I forgot to show it to you.", "#Person1#: I'm sorry. I forgot to show it to you.", "#Person2#: I'm going to buy a pair of shoes.", "#Person2#: I'll charge you.", "#Person2#: I'll charge you for that.", "#Person1#: I'm sorry, but I'm not interested in the traditional Chinese arts and crafts.", "#Person1#: I'm sorry.", "#Person1#: I'm sorry. I'm not interested in the sandalwood fan."], true  ; [['#Person2# wants to buy a new mobile phone from #Person1#.'], ['#Person2# wants to buy a new mobile phone from #Person1#.'], ['#Person1# and #Person2# make a shopping list

In [39]:
results.tables['eval_results_table']

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,dialogue,topic,prompt,summary,candidates,token_count,toxicity/v1/score,rouge1/v1/score,rouge2/v1/score,rougeL/v1/score,rougeLsum/v1/score
0,test_0_1,"#Person1#: Ms. Dawson, I need you to take a di...",communication method,Below is the conversation between two persons....,Ms. Dawson helps #Person1# to write a memo to ...,#Person2#: Thank you for your help.,10,0.000284,0.0,0.0,0.0,0.0
1,test_0_2,"#Person1#: Ms. Dawson, I need you to take a di...",company policy,Below is the conversation between two persons....,In order to prevent employees from wasting tim...,#Person2#: Thank you for your help.,10,0.000284,0.0,0.0,0.0,0.0
2,test_0_3,"#Person1#: Ms. Dawson, I need you to take a di...",dictation,Below is the conversation between two persons....,Ms. Dawson takes a dictation for #Person1# abo...,#Person2#: Thank you for your help.,10,0.000284,0.060606,0.0,0.060606,0.060606
3,test_1_1,#Person1#: You're finally here! What took so l...,public transportation,Below is the conversation between two persons....,#Person2# arrives late because of traffic jam....,"It's not good for me, but for the environment.",12,0.000145,0.125,0.066667,0.125,0.125
4,test_1_2,#Person1#: You're finally here! What took so l...,transportation,Below is the conversation between two persons....,#Person2# decides to follow #Person1#'s sugges...,"It's not good for me, but for the environment.",12,0.000145,0.068966,0.0,0.068966,0.068966
5,test_1_3,#Person1#: You're finally here! What took so l...,discuss transportation,Below is the conversation between two persons....,#Person2# complains to #Person1# about the tra...,"It's not good for you, but for the environment.",12,0.000142,0.074074,0.0,0.074074,0.074074
6,test_2_1,"#Person1#: Kate, you never believe what's happ...",divorce,Below is the conversation between two persons....,#Person1# tells Kate that Masha and Hero get d...,The New Year is coming.,6,0.00016,0.083333,0.0,0.083333,0.083333
7,test_2_2,"#Person1#: Kate, you never believe what's happ...",divorce,Below is the conversation between two persons....,#Person1# tells Kate that Masha and Hero are g...,The New Year is coming.,6,0.00016,0.0,0.0,0.0,0.0
8,test_2_3,"#Person1#: Kate, you never believe what's happ...",discuss divorce,Below is the conversation between two persons....,#Person1# and Kate talk about the divorce betw...,It's early in the New Year.,8,0.000244,0.071429,0.0,0.071429,0.071429
9,test_3_1,"#Person1#: Happy Birthday, this is for you, Br...",birthday party,Below is the conversation between two persons....,#Person1# and Brian are at the birthday party ...,"Brian, I'm so happy you had a great party.",12,0.000141,0.214286,0.0,0.142857,0.142857


In [58]:
DATA['train'].topic.value_counts()

topic
shopping                 174
job interview            161
daily casual talk        125
phone call                89
order food                79
                        ... 
eat ice creams             1
marriage predicaments      1
ways of commuting          1
food comment               1
baggage pack               1
Name: count, Length: 7434, dtype: int64

id                                                  train_166
dialogue    #Person1#: Ten sheets of rice paper, 25 brushe...
summary     #Person2# buys some drawing tools and asks for...
topic                                                shopping
Name: 166, dtype: object

In [43]:
from transformers import BitsAndBytesConfig, Trainer
from datasets import load_dataset,  Dataset, DatasetDict, load_metric

from torch import float16

from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import TrainingArguments
from trl import SFTTrainer

In [9]:
# https://huggingface.co/docs/transformers/v4.20.1/en/main_classes/callback#transformers.integrations.MLflowCallback

os.environ["MLFLOW_EXPERIMENT_NAME"] = "lora_summarization_model"
os.environ["MLFLOW_FLATTEN_PARAMS"] = "1"

In [175]:
DATA.keys()

dict_keys(['hiddentest_dialogue', 'hiddentest_topic', 'test', 'train', 'validation'])

In [184]:
def clean_summary(summary):
    summary = summary.strip()
    summary = summary.replace("#'s", "s")
    summary = summary.replace("#", "")
    return summary
    
for key, value in DATA.items():
    if key in ['train', 'test', 'validation']:
        value.summary = value.summary.apply(lambda x:clean_summary(x))
    DATA[key] = value
    

### Lora Fine Tuning

In [14]:
import transformers
from transformers import BitsAndBytesConfig
from datasets import load_dataset,  Dataset, DatasetDict, load_metric
import evaluate
from torch import float16
import mlflow
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import TrainingArguments, Seq2SeqTrainingArguments
from trl import SFTTrainer


lora_tracker = mlflow.set_experiment(experiment_name="lora_summarization")

In [15]:
from transformers import BitsAndBytesConfig
from torch import float16
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize


# Our 4-bit configuration to load the LLM with less GPU memory
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # 4-bit quantization
    bnb_4bit_quant_type='nf4',  # Normalized float 4
    bnb_4bit_use_double_quant=True,  # Second quantization after the first
    bnb_4bit_compute_dtype=float16  # Computation type
)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [16]:
# Below is to load directly from HuggingFace or to load from cache directory
flan_model = transformers.AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME, device_map="auto", cache_dir="../mlflow_practice/models/flan_t5/")
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME,cache_dir="../mlflow_practice/models/flan_t5/")


In [17]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

In [18]:
peft_config = LoraConfig(
    lora_alpha=32,
    lora_dropout = 0.05,
    r=32,
    bias="none",
    task_type="CAUSAL_LM",
     target_modules=[
        "q",
        "v",
    ],
)


In [19]:
model = prepare_model_for_kbit_training(flan_model)
lora_model = get_peft_model(model, peft_config)

In [20]:
TrainingArguments(
    num_train_epochs=10,
    output_dir="flan_summary",
    per_device_train_batch_size=8,
    per_device_eval_batch_size = 16,
    warmup_steps=500,
    logging_steps=2,
    save_steps=5,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    # eval_steps=1,
    learning_rate=2e-4,
    bf16=True,
    lr_scheduler_type="constant",    
    # eval_accumulation_steps = 1,
)


TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_l

In [21]:
training_args = TrainingArguments(
    num_train_epochs=10,
    output_dir="flan_summary",
    per_device_train_batch_size=6,
    per_device_eval_batch_size = 2,
    warmup_steps=500,
    logging_steps=2,
    save_steps=5,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    # eval_steps=1,
    learning_rate=2e-4,
    bf16=True,
    lr_scheduler_type="constant",    
    eval_accumulation_steps = 1,
)

# training_args = Seq2SeqTrainingArguments(
#     output_dir="flan_summary",
#     max_steps=100,
#     per_device_train_batch_size=2,
#     per_device_eval_batch_size = 1,
#     warmup_steps=0.03,
#     logging_steps=2,
#     save_steps=5,
#     save_strategy="epoch",
#     evaluation_strategy="steps",
#     eval_steps=4,
#     learning_rate=2e-4,
#     bf16=True,
#     lr_scheduler_type="constant",    
#     eval_accumulation_steps = 20,
#     predict_with_generate = True
# )

# training_args.set_training(batch_size=8)
training_args

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=1,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_leng

In [22]:
def formatting_func(data_point):
    print(type(data_point))
    inputs =  data_point['prompt_data']
    print(f"returning inputs as : {inputs} {type(inputs)} {len(inputs)}")
    return inputs

In [23]:
def prompt_for_finetuning(data_df):
    """
    """
    
    INSTRUCTION = "### INSTRUCTION: \nBelow is the conversation between two persons. Generate summary of the conversation\n\n"
    SAMPLE_RESPONSE = "### SUMMARY:\n"
    END_OF_SUMMARY = " <END>"
    data_df["prompt_data"] = INSTRUCTION + data_df["dialogue"] +  SAMPLE_RESPONSE +  data_df["summary"] + END_OF_SUMMARY
    return data_df

In [24]:
train_data = Dataset.from_pandas(prompt_for_finetuning(DATA["train"]))
test_data = Dataset.from_pandas(prompt_for_finetuning(DATA["test"].iloc[:20]))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_df["prompt_data"] = INSTRUCTION + data_df["dialogue"] +  SAMPLE_RESPONSE +  data_df["summary"] + END_OF_SUMMARY


In [25]:
from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["prompt_data"],
        max_length=max_length,
        truncation=True,
        return_tensors="pt",
        padding=True
    )

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )
    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) <= max_length)
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

In [26]:
max_length = get_max_length(flan_model)
print(max_length)



train_dataset = preprocess_dataset(tokenizer, max_length,seed, train_data)
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, test_data)

Found max lenth: 512
512


Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Filter:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Filter:   0%|          | 0/20 [00:00<?, ? examples/s]

In [27]:

logits_labels = {}

bleu_metric = evaluate.load("bleu")
rouge_metric = load_metric("rouge")

def compute_bleu(y_pred, y_true):
    print('comput bleu')
    bleu_metric.add_batch(predictions=y_pred, references=y_true)
    report = bleu_metric.compute()
    return report
    
def compute_rouge(predictions, actual):
    
    # Compute ROUGE scores
    result = rouge_metric.compute(
        predictions=predictions, references=actual, use_stemmer=True
    )
    # Extract the median scores
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    return {k: round(v, 4) for k, v in result.items()}
    

def compute_metrics(eval_pred):
    print(f"computing metrics")
    logits, labels = eval_pred
    logits_labels['logits'] = logits
    logits_labels['labels'] = labels
    
    # predictions = np.argmax(logits, axis=-1)
    predictions = np.argmax(logits[0], -1)

    # Decode generated summaries into text
    decoded_preds = tokenizer.batch_decode(predictions)
    # Replace -100 in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    # Decode reference summaries into text
    decoded_labels = tokenizer.batch_decode(labels)
    # ROUGE expects a newline after each sentence
    decoded_preds = ["\n".join(sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = [["\n".join(sent_tokenize(label.strip()))] for label in decoded_labels]

    logits_labels["decoded_preds"] = decoded_preds
    logits_labels["decoded_labels"] = decoded_labels
    
    bleu_score = compute_bleu(decoded_preds, decoded_labels)
    
    rouge_score = compute_rouge(decoded_preds, decoded_labels)
    scores =  {**bleu_score,
               **rouge_score}
    logits_labels["score"] = scores
    print(f"===========actual {decoded_labels} \n\n =============Pred {decoded_preds}\n\n\n\n")
    # print(f"returning : {scores}")
    return scores

  rouge_metric = load_metric("rouge")


In [28]:
max_seq_len = 500
trainer = SFTTrainer(
    args=training_args,
    model = lora_model,
    peft_config=peft_config,
    max_seq_length=max_seq_len,
    tokenizer=tokenizer,
    # packing=True,
    formatting_func= formatting_func,
    train_dataset=train_dataset,
    eval_dataset = eval_dataset,
    compute_metrics=compute_metrics
)


Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



<class 'datasets.formatting.formatting.LazyBatch'>


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



Map:   0%|          | 0/20 [00:00<?, ? examples/s]

<class 'datasets.formatting.formatting.LazyBatch'>


In [29]:
with mlflow.start_run(experiment_id=lora_tracker.experiment_id, run_name="lora_finetuning"):
    trainer.train()
    mlflow.end_run()

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Bleu,Precisions,Brevity Penalty,Length Ratio,Translation Length,Reference Length,Rouge1,Rouge2,Rougel,Rougelsum
1,0.0606,0.003671,0.989382,"[0.9954476479514416, 0.993916349809886, 0.9923780487804879, 0.9908326967150497]",0.996214,0.996221,1318,1323,93.1358,85.2093,93.1358,88.7756
2,0.0473,0.001006,0.993178,"[0.9984802431610942, 0.9984767707539984, 0.9984732824427481, 0.9984697781178271]",0.994695,0.994709,1316,1323,93.322,85.4892,93.322,88.9618
3,0.0314,0.000502,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
4,0.0169,0.000427,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
5,0.0308,0.000321,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
6,0.0149,0.000298,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
7,0.0172,0.000256,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
8,0.0278,0.000191,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
9,0.0132,0.000151,0.990137,"[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]",0.992413,0.992441,1313,1323,93.2723,85.4356,93.2723,88.9101
10,0.0121,0.000137,0.989376,"[0.9977134146341463, 0.9977081741787625, 0.9977029096477795, 0.9976976208749041]",0.991651,0.991686,1312,1323,93.1788,85.3418,93.1788,88.8166


computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9954476479514416, 0.993916349809886, 0.9923780487804879, 0.9908326967150497]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9954476479514416, 0.993916349809886, 0.9923780487804879, 0.9908326967150497]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9984802431610942, 0.9984767707539984, 0.9984732824427481, 0.9984697781178271]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9984802431610942, 0.9984767707539984, 0.9984732824427481, 0.9984697781178271]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977151561309977, 0.9977099236641221, 0.9977046671767407, 0.9976993865030674]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.







computing metrics
comput bleu


Trainer is attempting to log a value of "[0.9977134146341463, 0.9977081741787625, 0.9977029096477795, 0.9976976208749041]" of type <class 'list'> for key "eval_precisions" as a metric. MLflow's log_metric() only accepts float and int types so we dropped this attribute.
Trainer is attempting to log a value of "[0.9977134146341463, 0.9977081741787625, 0.9977029096477795, 0.9976976208749041]" of type <class 'list'> for key "eval/precisions" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.









## Saving Model

In [30]:
lora_model_path="lora_model2/"

trainer.model.save_pretrained(lora_model_path)
tokenizer.save_pretrained(lora_model_path)

('lora_model2/tokenizer_config.json',
 'lora_model2/special_tokens_map.json',
 'lora_model2/spiece.model',
 'lora_model2/added_tokens.json',
 'lora_model2/tokenizer.json')