# Fine-tuning LLMs for News Category (Evaluation)

Inspired by [Kshitiz Sahay's blog](https://medium.com/@kshitiz.sahay26/fine-tuning-llama-2-for-news-category-prediction-a-step-by-step-comprehensive-guide-to-fine-tuning-48c06dee28a9)

step-by-step tutorial for fine-tuning any LLM (Large Language Model). 

This guide will be divided into two parts:

**Part 3: Evaluate Model**
1. Load the model
2. Creating test text

In [1]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
"""
if torch.cuda.is_available():
    # Get the number of CUDA devices
    device_count = torch.cuda.device_count()
    print(f"CUDA is available with {device_count} CUDA device(s)!")
    
    # Get the name of each CUDA device
    for i in range(device_count):
        print(f"Device {i}: {torch.cuda.get_device_name(i)}")
else:
    print("CUDA is not available. Running on CPU.")
"""
import os
import torch
import wandb

wandb.init(mode="disabled")
os.environ['WANDB_NOTEBOOK_NAME'] = "NewsClassificationEval"
torch.cuda.empty_cache()

BASE_DIR = '/mlx_devbox/users/haidong.shao/playground/'
#model_path = 'openlm-research/open_llama_3b_v2'
model_path = "Qwen/Qwen2-0.5B"

TORCH_DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_path)
#tokenizer.add_special_tokens({'pad_token': '[PAD]'})

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
#model = LlamaForCausalLM.from_pretrained(
model = AutoModelForCausalLM.from_pretrained(

    model_path, #load_in_8bit=True, 
    device_map='auto',
)
model.to(TORCH_DEVICE)

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 896)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=896, out_features=896, bias=True)
          (k_proj): Linear(in_features=896, out_features=128, bias=True)
          (v_proj): Linear(in_features=896, out_features=128, bias=True)
          (o_proj): Linear(in_features=896, out_features=896, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): Linear(in_features=8

In [5]:
peft_model_id = BASE_DIR + 'results/checkpoint-600'
peft_model = PeftModel.from_pretrained(model, peft_model_id)

## New Model
Copy some text from google news

In [6]:
test_strings = ["The result of Sunday’s parliamentary election runoff comes as a huge surprise, with France appearing to be on the verge of a major political shift – but not the one everyone was expecting.No pollster predicted before Sunday that a left-wing alliance would win and that the far right would come in third place. This is a shocking reversal of the outcome of the first round of voting, if tonight’s results match the projections. For now, France seems ungovernable. With no party projected to get close to clinching a majority, the parliament will be in a state of paralysis, split between three blocs.",
"Earlier this year, the South Korean tech giant, which is the largest phone maker in the world by shipment volume, announced the Galaxy Ring, its first finger-worn health tracker and a direct competitor to Oura's popular Oura Ring wellness device. With Samsung's Unpacked event coming up on July 10, we're expecting to learn a lot more about the company's big challenger to the Oura Ring, including its price, when it'll launch and additional details on its health-tracking capabilities. Given Samsung's massive presence in the consumer electronics space, you'd think Oura would be concerned about the Galaxy Ring's arrival. ",
"On the first day of production on the New Zealand location of “Avatar: The Way of Water,” actor Cliff Curtis asked if he could bring his family to the film’s home base to give a traditional blessing. Curtis showed up with 43 people and led an elaborate Maori blessing in front of the entire crew, then gave gifts to the crew members. Producer Jon Landau’s gift was a carved wooden oar — as Landau told TheWrap a couple of years later, “It was to help steer the ship as we were going into production. I still have it hanging on my wall",
"Shares of industrial and transportation companies rose after inflation data. The consumer-price inflation index slowed to a 3% annual rate, significantly softer than economists had forecast, and likely setting up a Federal Reserve rate cut in September. Shares of Joby Aviation rallied after the maker of emission-free aircraft reported a successful 523-mile flight of an electrical aircraft."            
]

chinese_test_strings = ["Xbox、ABathing Ape 將推聯乘產品？大家沒有看錯啊！早前美國 A Bathing Ape IG 已經 post 了 Xbox 與 A Bathing Ape 推出聯乘產品的預告短片，從中更可見當中有何產品推出的。.",
"今屆港姐別開生面，由觀眾投票決定正式入圍佳麗名單，昨晚（7日）無綫播出節目《誰是入圍者》讓觀眾在投票前對參賽者們進一步了解，當中有部分來自內地及海外，其中有「河北黃婉佩（Race@2R）」之稱的陳甜甜，樣子甜美，不過一開口就嚇親人，內地口音非常重，幸好字幕拯救了觀眾，來自河北的她坦言來港不到一年，現時每日上2至3小時堂學廣東話，最欣賞的女星是胡杏兒，雖然發音不標準，但她仍十分落力全程以廣東話對答，誠意可嘉。",
"英格蘭隊昨晨於歐洲國家盃8強法定時間內1：1逼和瑞士隊，至互射12碼階段靠門將比克福特救出敵衛文路爾艾簡治首輪極刑，加上隊友5輪全中，終贏5：3晉級，將面對荷蘭隊力爭連續兩屆殺入決賽。賽後比克福特的「提水」水樽成焦點外，是役膺全場最佳的翼鋒布卡約沙卡在極列戰中鵠，為上屆決賽射失「贖罪」，亦獲主帥修夫基點名稱讚"]

test_strings = test_strings+chinese_test_strings

predictions = []
for test in test_strings:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request..

  ### Instruction:
  Categorize the news article into one of the 5 categories:\n\ntech\nbusiness\nsport\nentertainment\npolitics
  
  Input:
  {}

  ### Response:""".format(test)
  input_ids_temp = tokenizer(prompt, return_tensors="pt")
  input_ids = input_ids_temp.input_ids.to(TORCH_DEVICE)

  generation_output = peft_model.generate(
      input_ids=input_ids, 
      max_new_tokens=156,
      pad_token_id=tokenizer.eos_token_id,
      attention_mask=input_ids_temp['attention_mask'].to(TORCH_DEVICE)
  )
  predictions.append(tokenizer.decode(generation_output[0]))

In [7]:
def extract_response_text(input_string):
    start_marker = '### Response:'
    end_marker = '###'
    
    start_index = input_string.find(start_marker)
    if start_index == -1:
        return None
    
    start_index += len(start_marker)
    
    end_index = input_string.find(end_marker, start_index)
    if end_index == -1:
        return input_string[start_index:]
    
    return input_string[start_index:end_index].strip()
for i in range(len(predictions)): 
  pred = predictions[i]
  #print(pred)
  text = test_strings[i]
  print(text+'\n')
  print(extract_response_text(pred))
  print('--------')

The result of Sunday’s parliamentary election runoff comes as a huge surprise, with France appearing to be on the verge of a major political shift – but not the one everyone was expecting.No pollster predicted before Sunday that a left-wing alliance would win and that the far right would come in third place. This is a shocking reversal of the outcome of the first round of voting, if tonight’s results match the projections. For now, France seems ungovernable. With no party projected to get close to clinching a majority, the parliament will be in a state of paralysis, split between three blocs.

politics
--------
Earlier this year, the South Korean tech giant, which is the largest phone maker in the world by shipment volume, announced the Galaxy Ring, its first finger-worn health tracker and a direct competitor to Oura's popular Oura Ring wellness device. With Samsung's Unpacked event coming up on July 10, we're expecting to learn a lot more about the company's big challenger to the Oura

## Base Model
I can not find a good prompt to generate reasonable outputs

In [None]:
import pandas as pd

dataset_name = "ft-lora/bbc-text.csv"
df = pd.read_csv(BASE_DIR+dataset_name)
df['category'].replace('tech', 'technology',inplace=True)
df['category'].replace('sport', 'sports',inplace=True)

predictions = []
cnt = 0
#for test in df['text']:
num_rows = len(df)

for i in range(num_rows - 1, -1, -1):
  row = df.iloc[i]
  category = row['category']
  test = row['text']
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request..

  ### Instruction:
  Categorize the news article into one of the 5 categories:\n\ntechnology\nbusiness\nsports\nentertainment\npolitics
  
  Input:
  {}

  ### Response:""".format(test)
  input_ids_temp = tokenizer(prompt, return_tensors="pt")
  input_ids = input_ids_temp.input_ids.to(TORCH_DEVICE)

  generation_output = peft_model.generate(
      input_ids=input_ids, 
      max_new_tokens=156,
      pad_token_id=tokenizer.eos_token_id,
      attention_mask=input_ids_temp['attention_mask'].to(TORCH_DEVICE)
  )
  eval_category = extract_response_text(tokenizer.decode(generation_output[0]))
  if category != eval_category:
    msg = "Error {},  ##eval## {},  ##category## {}".format(test, eval_category, category)
    predictions.append(msg)
    #print(msg)
  cnt = cnt + 1
  if cnt % 50 == 0:
    print("progress {} / {}, errors {}".format(cnt, num_rows, len(predictions)))

progress 50 / 2225, errors 6
progress 100 / 2225, errors 6
progress 150 / 2225, errors 10
progress 200 / 2225, errors 14
progress 250 / 2225, errors 15
progress 300 / 2225, errors 16
progress 350 / 2225, errors 17
progress 400 / 2225, errors 19
progress 450 / 2225, errors 25
progress 500 / 2225, errors 28
progress 550 / 2225, errors 32


In [None]:
#df_copy = df.copy()
#df_copy['eval_category'] = predictions
#df_copy
#predictions


In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, #load_in_8bit=True, 
                                         device_map='auto',)

In [None]:
predictions = []
for test in test_strings:
    prompt = 'News text:\n\n ' + test + '\n The closest category for the above text should be (select one from tech, business, sport, entertainment, or politics):'
    input_ids_temp = tokenizer(prompt, return_tensors="pt")
    input_ids = input_ids_temp.input_ids.to(TORCH_DEVICE)
    generation_output = model.generate(
        input_ids=input_ids,
        max_new_tokens=156,
        pad_token_id=tokenizer.eos_token_id,
        attention_mask=input_ids_temp['attention_mask'].to(TORCH_DEVICE)
    )
    predictions.append(tokenizer.decode(generation_output[0]))

for i in range(len(predictions)): 
  pred = predictions[i]
  print(pred)
  print('--------')