# Model Inference and Evaluation on Validation Set. Inference on Test Set.

Evaluation On Validation Set:
1. Perform inference from base (mistral instruct v0.3) and fine tuned model on validation set.
2. Compute BLEU, Rouge, BERTScore between both models and human responses.

For Test Set:
1. Perform inference from base (mistral instruct v0.3) and fine tuned model on test set generated using GPT-4o (see notebook 'pqr').
2. We output the models' responses to a file.
3. In notebook 'xyz', we use GPT-5 as LLM-as-a-Judge to compare base (mistral instruct v0.3) and the fine tuned model.


In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U git+https://github.com/huggingface/trl.git
!pip install flash-attn --no-build-isolation
!pip install quanto
!pip install evaluate bert-score
!pip install rouge_score

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m39.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for accelerate (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies .

In [None]:
# Mount GDrive - will prompt authentication
from google.colab import drive
drive.mount('/content/drive')

## Get HF tokenizer
with open("/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/hftoken.txt") as f:
    HF_TOKEN = f.read().strip()


BASE_MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3"


Mounted at /content/drive


In [None]:
import re
from datasets import load_dataset

# build prompt
# We modify the system instructions a bit
def build_prompt(prompt):
  return f'''<s>[INST] You are a friendly parenting companion who gives helpful advice like a fellow parent would. You sound warm and practical — not robotic or formal.
Keep your answers under 150 words.
Stay focused on the user’s question.
Avoid Reddit-style responses like 'Edit:', smiley faces, or overly casual tone.
User's Prompt: {prompt} [/INST]'''


# Clean input
def get_q_and_a(text):
  qasplit = text.split("[/INST]")
  q = qasplit[0]
  q = q.replace('[INST]', '').replace('<s>', '').strip()
  if len(qasplit)>1:
    a = qasplit[1]
    a = a.replace('</s>', '').strip()
  else:
    a = None
  return (q, a)

# Clean Output
def clean_text(text):
    # Cut anything after 'Source'
    text = re.split(r'\b(Source:)', text)[0].strip()
    # Remove the last incomplete sentence (if any)
    sentences = re.split(r'(?<=[.!?])\s+', text)
    if sentences and not text.strip().endswith(('.', '!', '?')):
        sentences = sentences[:-1]  # remove the last incomplete part
    return ' '.join(sentences)


In [None]:
# load validation set of 100 samples
# Load training and validation sets as hugging face datasets
validation_df = load_dataset('json', data_files='/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/reddit_dataset_val_100.jsonl', split='train')
print (f"Number of validation examples: {len(validation_df)}")


In [None]:
# load test set of GPT-generated 100 samples
# Load training and validation sets as hugging face datasets
test_df = load_dataset('json', data_files='/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/gpt_reddit_test_samples.jsonl', split='train')
print (f"Number of test examples: {len(test_df)}")


Generating train split: 0 examples [00:00, ? examples/s]

Number of test examples: 100


In [None]:
# Load Eval Metrics
import evaluate
import numpy as np
import pandas as pd


bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")
bertscore = evaluate.load("bertscore")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

# Load Base model and Evaluate it


**BLEU Score**

{'bleu': 0.006949776517672412, 'precisions': [0.16613720271314503, 0.014029618082618862, 0.0022713374683323143, 0.0004406451044328897], 'brevity_penalty': 1.0, 'length_ratio': 1.588949522510232, 'translation_length': 11647, 'reference_length': 7330}

**ROUGE SCORE**

{'rouge1': np.float64(0.17742945129434573), 'rouge2': np.float64(0.01612920672542411), 'rougeL': np.float64(0.09766495290266788), 'rougeLsum': np.float64(0.1039609198966609)}


BERTScore:
**bold text**- BERT avg precision: 0.8333736592531205
- BERT avg recall: 0.8439619988203049
- BERT avg recall: 0.8385502642393112




In [None]:
# LOAD THE BASE MODEL IN 4-BIT PRECISION WITH DOUBLE QUANTIZATION

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

torch.backends.cuda.matmul.allow_tf32 = True
torch.set_float32_matmul_precision("high")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, # loads base model in 4-bit precision
    bnb_4bit_use_double_quant=True, # double quantization saves VRAM
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2", # FA2 is fastest on A100
    token=HF_TOKEN # login to hugging face
)




config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
# load tokenizer - we will infer for 1 prompt at a time - so no padding needed
tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL_ID,
    add_bos_token=False,
    add_eos_token=False,
    token = HF_TOKEN
)
tokenizer.pad_token = tokenizer.eos_token


tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

## Evaluating the Base Model on the Validation Set

In [None]:
# Run inference on all prompts in the validation set.
# And compute BLEU, ROUGE, BERTScore with human responses.
import time

predictions = []
references = []
ques_list = []
for i, ex in enumerate(validation_df):
  print (f"Running for example {i+1}")
  st = time.time()
  ques, human_answer = get_q_and_a(ex["text"])
  with torch.no_grad():
    inputs = tokenizer(build_prompt(ques), return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        eos_token_id=tokenizer.eos_token_id,
    )
    raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
    _, ai_answer = get_q_and_a(raw_output)
    ai_answer = clean_text(ai_answer)
    predictions.append(ai_answer)
    references.append(human_answer)
    ques_list.append(ques)
    print (f"Time taken for example {i+1}: {(time.time()-st)/60.0}")



In [None]:
blue_results = bleu.compute(predictions=predictions, references=references)
print (blue_results)

rouge_results = rouge.compute(predictions=predictions, references=references)
print (rouge_results)

bert_results = bertscore.compute(predictions=predictions, references=references, lang='en')
print (bert_results)

print (f"BERT avg precision: {np.mean(bert_results['precision'])}")
print (f"BERT avg recall: {np.mean(bert_results['recall'])}")
print (f"BERT avg f1: {np.mean(bert_results['f1'])}")


In [None]:
# save predictions
validation_df_pd = validation_df.to_pandas()
validation_df_pd['user_prompt'] = ques_list
validation_df_pd['ai_answer'] = predictions
validation_df_pd['human_answer'] = references
validation_df_pd["bertscore_f1"] = bert_results["f1"]
validation_df_pd["bertscore_precision"] = bert_results["precision"]
validation_df_pd["bertscore_recall"] = bert_results["recall"]
validation_df_pd.to_json('/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/reddit_dataset_val_100_base_response.jsonl', orient='records', lines=True)


## Inferencing the Base Model on GPT Generated Test Set

This will be used to evaluate using LLM-as-a-Judge

In [None]:
# Run inference on all prompts in the validation set.
# And compute BLEU, ROUGE, BERTScore with human responses.
import time

predictions = []
ques_list = []
for i, ex in enumerate(test_df):
  print (f"Running for example {i+1}")
  st = time.time()
  ques, _ = get_q_and_a(ex["text"])
  with torch.no_grad():
    inputs = tokenizer(build_prompt(ques), return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        eos_token_id=tokenizer.eos_token_id,
    )
    raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
    _, ai_answer = get_q_and_a(raw_output)
    ai_answer = clean_text(ai_answer)
    predictions.append(ai_answer)
    ques_list.append(ques)
    print (f"Time taken for example {i+1}: {(time.time()-st)/60.0}")



Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Running for example 1


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 1: 0.23644356727600097
Running for example 2


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 2: 0.20345929861068726
Running for example 3


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 3: 0.20603098074595133
Running for example 4


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 4: 0.20357153018315632
Running for example 5


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 5: 0.20414671500523884
Running for example 6


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 6: 0.2051494280497233
Running for example 7


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 7: 0.20295066038767498
Running for example 8


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 8: 0.20422160625457764
Running for example 9


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 9: 0.20366268555323283
Running for example 10


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 10: 0.2033764918645223
Running for example 11


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 11: 0.20396262009938557
Running for example 12


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 12: 0.20346210797627767
Running for example 13


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 13: 0.2047345240910848
Running for example 14


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 14: 0.20469858249028525
Running for example 15


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 15: 0.18796231349309286
Running for example 16


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 16: 0.2048474351565043
Running for example 17


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 17: 0.20364723205566407
Running for example 18


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 18: 0.2043881098429362
Running for example 19


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 19: 0.20467071533203124
Running for example 20


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 20: 0.20365196863810223
Running for example 21


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 21: 0.20387011369069416
Running for example 22


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 22: 0.20481015841166178
Running for example 23


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 23: 0.2046429673830668
Running for example 24


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 24: 0.20529175996780397
Running for example 25


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 25: 0.2040270249048869
Running for example 26


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 26: 0.20328095356623332
Running for example 27


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 27: 0.20440402428309123
Running for example 28


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 28: 0.2039291739463806
Running for example 29


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 29: 0.20385926961898804
Running for example 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 30: 0.2040327310562134
Running for example 31


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 31: 0.20463963349660239
Running for example 32


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 32: 0.2049741268157959
Running for example 33


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 33: 0.20472443103790283
Running for example 34


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 34: 0.2040330688158671
Running for example 35


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 35: 0.20410813490549723
Running for example 36


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 36: 0.20474815766016644
Running for example 37


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 37: 0.20378491878509522
Running for example 38


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 38: 0.20426583290100098
Running for example 39


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 39: 0.20341067711512248
Running for example 40


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 40: 0.2037330985069275
Running for example 41


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 41: 0.20522861083348592
Running for example 42


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 42: 0.20461246172587078
Running for example 43


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 43: 0.20400940974553425
Running for example 44


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 44: 0.20505865414937338
Running for example 45


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 45: 0.20467219750086466
Running for example 46


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 46: 0.2039954900741577
Running for example 47


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 47: 0.2037189245223999
Running for example 48


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 48: 0.20350040992101034
Running for example 49


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 49: 0.20340958833694459
Running for example 50


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 50: 0.20414713621139527
Running for example 51


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 51: 0.20485878785451253
Running for example 52


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 52: 0.20476671059926352
Running for example 53


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 53: 0.20357400973637899
Running for example 54


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 54: 0.20436669985453287
Running for example 55


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 55: 0.20475477774937947
Running for example 56


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 56: 0.20414079030354818
Running for example 57


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 57: 0.20302375952402751
Running for example 58


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 58: 0.20310880343119303
Running for example 59


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 59: 0.20360588630040485
Running for example 60


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 60: 0.20386915206909179
Running for example 61


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 61: 0.20389184951782227
Running for example 62


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 62: 0.20475250482559204
Running for example 63


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 63: 0.20485320091247558
Running for example 64


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 64: 0.2038296858469645
Running for example 65


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 65: 0.20421618620554607
Running for example 66


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 66: 0.20315723816553752
Running for example 67


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 67: 0.20434638659159343
Running for example 68


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 68: 0.2042916973431905
Running for example 69


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 69: 0.2030983050664266
Running for example 70


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 70: 0.20387378136316936
Running for example 71


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 71: 0.20305878321329754
Running for example 72


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 72: 0.20450908740361531
Running for example 73


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 73: 0.20438723564147948
Running for example 74


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 74: 0.20344038009643556
Running for example 75


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 75: 0.20546515782674155
Running for example 76


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 76: 0.20500051577885944
Running for example 77


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 77: 0.20389140049616497
Running for example 78


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 78: 0.2036415974299113
Running for example 79


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 79: 0.2042702595392863
Running for example 80


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 80: 0.20356504917144774
Running for example 81


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 81: 0.20477003653844197
Running for example 82


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 82: 0.20419395367304485
Running for example 83


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 83: 0.20417921543121337
Running for example 84


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 84: 0.20433381001154582
Running for example 85


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 85: 0.2032723863919576
Running for example 86


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 86: 0.20340210994084676
Running for example 87


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 87: 0.20358201265335082
Running for example 88


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 88: 0.2044443686803182
Running for example 89


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 89: 0.20510170857111612
Running for example 90


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 90: 0.20382037957509358
Running for example 91


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 91: 0.2051255663235982
Running for example 92


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 92: 0.2036635438601176
Running for example 93


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 93: 0.20374617179234822
Running for example 94


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 94: 0.20486133098602294
Running for example 95


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 95: 0.2040680448214213
Running for example 96


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 96: 0.2042020797729492
Running for example 97


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 97: 0.20438342094421386
Running for example 98


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 98: 0.20314055681228638
Running for example 99


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 99: 0.20354623794555665
Running for example 100
Time taken for example 100: 0.20428076187769573


In [None]:
# save predictions
test_df_pd = test_df.to_pandas()
test_df_pd['user_prompt'] = ques_list
test_df_pd['ai_answer'] = predictions
test_df_pd.to_json('/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/gpt_reddit_test_samples_base_response.jsonl', orient='records', lines=True)


# Load the Fine Tuned Model and Evaluate it

In [None]:
# Load the adapter weights for the fine tuned model
from peft import PeftModel

run_name = "parentpalai"
output_dir = "/content/drive/MyDrive/ColabNotebooks/ParentPalAI/model/" + run_name
model = PeftModel.from_pretrained(model, output_dir+"/checkpoint-1600")

model.config.use_cache = True
model.eval() # sets the model into evaluation mode. Disables dropout, layernorm, etc.



PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32768, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_pro

In [None]:
# # Lets also push the model to HF
# new_name = 'PretendParentAI'
# model.push_to_hub(new_name, token=HF_TOKEN)
# tokenizer.push_to_hub(new_name, token=HF_TOKEN)



In [None]:
## Run inference on example prompt
prompt = '''How to respond to 'I don't love you' -
My 2.5 year-old daughter has pretty much always been a daddy's girl. It's understandable because her dad is great, and much more patient than I am. But in the last month or so it's escalated to her saying 'I don't want you, Mama', 'I don't like you, Mama', and 'I don't love you, Mama', sometimes followed by 'I love Dada'. I find this really hard to deal with and I don't think I'm responding in the best way. Sometimes I try to ignore it, but sometimes I make a sad face and ask why, but of course she's not able to articulate that at this age.

Yesterday I broke down crying because she'd been saying it a lot. Her dad encouraged her to give me a hug. Today she was doing it again and her dad said no, we don't say that. Then I said, 'why do you say that? It makes me sad because I love you very much. Remember when Mama was crying, why do you think she was crying?' She didn't seem to have a response.

I know it's problematic to make children feel like they're responsible for a parent's emotional well-being. But is it appropriate for them to know they've made you sad and why? What is the best way to respond to these kinds of comments from a toddler?
'''
inputs = tokenizer(build_prompt(prompt), return_tensors="pt").to(model.device)

# top_p: determines nucleus sampling
## Instead of picking only the single most likely next token, model samples from a pool of tokens whose probabilities sum to 90%.
## Prevents the model from always picking the most boring “safe” token.
## Lower top_p (e.g., 0.7) = more focused, deterministic.
## Higher top_p (e.g., 0.95–1.0) = more diverse, creative, but risk of nonsense.
## 0.9 is a balanced default for natural-sounding answers.


# repetition_penalty: Discourages the model from repeating the same words/phrases.
## It re-weights probabilities: tokens that already appeared get penalized.
## 1.0 = no penalty (default).
## >1.0 = stronger penalty → less repetition.
## Too high (like >1.5) = model may avoid repeating even when it should (“the the” becomes impossible).
## 1.15 is a gentle nudge to stop the “looping answers” problem.


with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150, # lets go for shorter answers
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        eos_token_id=tokenizer.eos_token_id,
    )

raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
raw_output = raw_output.split("</s>")[0]

ques, ans = get_q_and_a(raw_output)
print ("\nQuestion")
print (ques)

print ("\nCleaned Answer")
ans_clean = clean_text(ans)
print (ans_clean)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Question
You are a friendly parenting companion who gives helpful advice like a fellow parent would. You sound warm and practical — not robotic or formal.
Keep your answers under 150 words.
Stay focused on the user’s question.
Avoid Reddit-style responses like 'Edit:', smiley faces, or overly casual tone.
User's Prompt: How to respond to 'I don't love you' -
My 2.5 year-old daughter has pretty much always been a daddy's girl. It's understandable because her dad is great, and much more patient than I am. But in the last month or so it's escalated to her saying 'I don't want you, Mama', 'I don't like you, Mama', and 'I don't love you, Mama', sometimes followed by 'I love Dada'. I find this really hard to deal with and I don't think I'm responding in the best way. Sometimes I try to ignore it, but sometimes I make a sad face and ask why, but of course she's not able to articulate that at this age.

Yesterday I broke down crying because she'd been saying it a lot. Her dad encouraged her t

## Evaluating Fine Tuned Model on Validation Set

**BLEU (Bilingual Evaluation Understudy):**
It's the overlap between machine translation output and reference output. More suited for short tasks like machine transalation. The list contains precision for unigrams, bigrams, trigrams, 4-grams. The BLEU score is very poor but that's because this is not an appropriate metric for this task.

Length is 1.73x the reference. So, we know that our model generates longer responses. Brevity 1 means no penalty applied for brevity because generated outputs are not too short.

{'bleu': 0.0062420097799231605, 'precisions': [0.17395070477990393, 0.016906103659020556, 0.0016001280102408192, 0.00032260666182756673], 'brevity_penalty': 1.0, 'length_ratio': 1.732469304229195, 'translation_length': 12699, 'reference_length': 7330}

**ROUGE (Recall-Oriented Understudy for Gisting Evaluation)**
Measures the overlap between the generated text and human-written reference texts. Suitable for tasks like translation and summarization. It's less strict than BLEU. ROUGE-1: Measures the overlap of unigrams (individual words). ROUGE-2: Measures the overlap of bigrams (two-word sequences). ROUGE-L: Focuses on the longest common subsequence to capture sentence-level structure and semantic similarity.

{'rouge1': np.float64(0.2068109926799131), 'rouge2': np.float64(0.021508314761075942), 'rougeL': np.float64(0.10574799893449807), 'rougeLsum': np.float64(0.1058553007217165)}

- ROUGE-1 i.e. unigram overlap is 20% which is moderate
- ROUGE-2 i.e. consecutive word overlap is 2.15% which is very low
- ROUGE-L measures the longest common subsequence. 0.10 is very low.


**BERTScore**
For each sample, embeds each token in prediction and reference using BERT. Then computes cosine similarity between each token in the prediction and reference.

Precision: How well do tokens in the prediction match those in the reference? For each prediction token, find the most similar reference token and get the cosine similarity.

Recall: How well do tokens in the reference match those in the prediction? For each reference token, find the most similar prediction token and get the cosine similarity.

F1: Harmonic mean of precision and recall.

We average prediction, recall, and F1 across all samples, to get the following:

- BERT avg precision: 0.8323308438062668
- BERT avg recall: 0.8462447559833527
- BERT avg f1: 0.8391281771659851

In [None]:
# Run inference on all prompts in the validation set.
# And compute BLEU, ROUGE, BERTScore with human responses.
import time

predictions = []
references = []
ques_list = []
for i, ex in enumerate(validation_df):
  print (f"Running for example {i+1}")
  st = time.time()
  ques, human_answer = get_q_and_a(ex["text"])
  with torch.no_grad():
    inputs = tokenizer(build_prompt(ques), return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        eos_token_id=tokenizer.eos_token_id,
    )
    raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
    _, ai_answer = get_q_and_a(raw_output)
    ai_answer = clean_text(ai_answer)
    predictions.append(ai_answer)
    references.append(human_answer)
    ques_list.append(ques)
    print (f"Time taken for example {i+1}: {(time.time()-st)/60.0}")


In [None]:
blue_results = bleu.compute(predictions=predictions, references=references)
print (blue_results)

rouge_results = rouge.compute(predictions=predictions, references=references)
print (rouge_results)

bert_results = bertscore.compute(predictions=predictions, references=references, lang='en')
print (bert_results)

print (f"BERT avg precision: {np.mean(bert_results['precision'])}")
print (f"BERT avg recall: {np.mean(bert_results['recall'])}")
print (f"BERT avg f1: {np.mean(bert_results['f1'])}")


In [None]:
# save predictions
validation_df_pd = validation_df.to_pandas()
validation_df_pd['user_prompt'] = ques_list
validation_df_pd['ai_answer'] = predictions
validation_df_pd['human_answer'] = references
validation_df_pd["bertscore_f1"] = bert_results["f1"]
validation_df_pd["bertscore_precision"] = bert_results["precision"]
validation_df_pd["bertscore_recall"] = bert_results["recall"]
validation_df_pd.to_json('/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/reddit_dataset_val_100_ft_response.jsonl', orient='records', lines=True)


## Inferencing Fine Tuned Model on GPT Generated Test Set

This will be used to evaluate using LLM-as-a-Judge

In [None]:
# Run inference on all prompts in the validation set.
# And compute BLEU, ROUGE, BERTScore with human responses.
import time

predictions = []
ques_list = []
for i, ex in enumerate(test_df):
  print (f"Running for example {i+1}")
  st = time.time()
  ques, _ = get_q_and_a(ex["text"])
  with torch.no_grad():
    inputs = tokenizer(build_prompt(ques), return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        eos_token_id=tokenizer.eos_token_id,
    )
    raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
    _, ai_answer = get_q_and_a(raw_output)
    ai_answer = clean_text(ai_answer)
    predictions.append(ai_answer)
    ques_list.append(ques)
    print (f"Time taken for example {i+1}: {(time.time()-st)/60.0}")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Running for example 1


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 1: 0.3536958773930868
Running for example 2


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 2: 0.35196306705474856
Running for example 3


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 3: 0.3506628672281901
Running for example 4


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 4: 0.350769050916036
Running for example 5


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 5: 0.35127911567687986
Running for example 6


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 6: 0.3515407681465149
Running for example 7


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 7: 0.3515092333157857
Running for example 8


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 8: 0.3500020742416382
Running for example 9


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 9: 0.3523138364156087
Running for example 10


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 10: 0.3513926386833191
Running for example 11


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 11: 0.35193126598993935
Running for example 12


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 12: 0.3546976685523987
Running for example 13


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 13: 0.35150832732518517
Running for example 14


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 14: 0.35156930685043336
Running for example 15


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 15: 0.3522643526395162
Running for example 16


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 16: 0.35222819248835247
Running for example 17


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 17: 0.35304178794225055
Running for example 18


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 18: 0.35116061369578044
Running for example 19


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 19: 0.35016566117604575
Running for example 20


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 20: 0.35040254990259806
Running for example 21


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 21: 0.3512661814689636
Running for example 22


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 22: 0.3507504343986511
Running for example 23


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 23: 0.3533252477645874
Running for example 24


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 24: 0.3517361283302307
Running for example 25


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 25: 0.3515665292739868
Running for example 26


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 26: 0.35101839701334636
Running for example 27


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 27: 0.3515737613042196
Running for example 28


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 28: 0.351093053817749
Running for example 29


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 29: 0.35030699570973717
Running for example 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 30: 0.35096171299616497
Running for example 31


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 31: 0.35049935976664226
Running for example 32


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 32: 0.3517666935920715
Running for example 33


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 33: 0.34945869048436484
Running for example 34


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 34: 0.3501705527305603
Running for example 35


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 35: 0.3510085662206014
Running for example 36


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 36: 0.35020361344019574
Running for example 37


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 37: 0.35071099201838174
Running for example 38


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 38: 0.3513977249463399
Running for example 39


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 39: 0.3515122294425964
Running for example 40


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 40: 0.3523275891939799
Running for example 41


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 41: 0.3536101738611857
Running for example 42


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 42: 0.35175034602483113
Running for example 43


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 43: 0.35039949814478555
Running for example 44


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 44: 0.3525212645530701
Running for example 45


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 45: 0.35112574497858684
Running for example 46


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 46: 0.35006097952524823
Running for example 47


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 47: 0.3509403467178345
Running for example 48


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 48: 0.35129257043202716
Running for example 49


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 49: 0.3502541383107503
Running for example 50


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 50: 0.3497222344080607
Running for example 51


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 51: 0.35186320543289185
Running for example 52


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 52: 0.35474373896916706
Running for example 53


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 53: 0.35119032859802246
Running for example 54


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 54: 0.3486607074737549
Running for example 55


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 55: 0.3502657969792684
Running for example 56


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 56: 0.353179387251536
Running for example 57


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 57: 0.35326516230901084
Running for example 58


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 58: 0.3533243974049886
Running for example 59


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 59: 0.3519928534825643
Running for example 60


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 60: 0.35207772652308145
Running for example 61


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 61: 0.35206359227498374
Running for example 62


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 62: 0.35163845618565875
Running for example 63


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 63: 0.35132569471995034
Running for example 64


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 64: 0.353077510992686
Running for example 65


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 65: 0.35284379720687864
Running for example 66


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 66: 0.3506434679031372
Running for example 67


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 67: 0.35243544181187947
Running for example 68


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 68: 0.35186682144800824
Running for example 69


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 69: 0.35443209012349447
Running for example 70


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 70: 0.35423417488733927
Running for example 71


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 71: 0.3527738094329834
Running for example 72


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 72: 0.35398500760396323
Running for example 73


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 73: 0.35831979910532635
Running for example 74


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 74: 0.35148399670918784
Running for example 75


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 75: 0.35145312547683716
Running for example 76


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 76: 0.35530589818954467
Running for example 77


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 77: 0.35386195182800295
Running for example 78


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 78: 0.3517716288566589
Running for example 79


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 79: 0.35625036160151163
Running for example 80


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 80: 0.35318496227264407
Running for example 81


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 81: 0.3535060127576192
Running for example 82


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 82: 0.35564159154891967
Running for example 83


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 83: 0.35279502868652346
Running for example 84


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 84: 0.3530072569847107
Running for example 85


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 85: 0.3533025622367859
Running for example 86


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 86: 0.3535926739374797
Running for example 87


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 87: 0.35377471446990966
Running for example 88


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 88: 0.3538569649060567
Running for example 89


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 89: 0.35490253766377766
Running for example 90


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 90: 0.35459553003311156
Running for example 91


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 91: 0.35388033390045165
Running for example 92


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 92: 0.35663623015085855
Running for example 93


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 93: 0.3594473679860433
Running for example 94


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 94: 0.35513832171758014
Running for example 95


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 95: 0.35259710947672523
Running for example 96


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 96: 0.35435737768809
Running for example 97


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 97: 0.35326074759165443
Running for example 98


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 98: 0.3535870631535848
Running for example 99


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Time taken for example 99: 0.3519122322400411
Running for example 100
Time taken for example 100: 0.3508267879486084


In [None]:
# save predictions
test_df_pd = test_df.to_pandas()
test_df_pd['user_prompt'] = ques_list
test_df_pd['ai_answer'] = predictions
test_df_pd.to_json('/content/drive/MyDrive/ColabNotebooks/ParentPalAI/data/gpt_reddit_test_samples_ft_response.jsonl', orient='records', lines=True)
