
# Emotion Label Generation with Qwen3 + QLoRA
 ======================================================

 In this notebook, we fine-tune the Qwen3 language model using the QLoRA technique
 to automatically generate emotion labels for tweets. This is a label generation task
 where the model predicts a set of emotions based on the given tweet text.

  Key Steps in This Notebook:
 - Environment Setup with 4-bit Quantization (QLoRA)
 - Fine-tuning the Qwen/Qwen3-0.6B-Base model for label generation
 - Training with formatted prompts combining tweets and their associated emotion labels
 - Model evaluation using mean token accuracy
 - Final model saving and Weights & Biases (W&B) logging

# 📊 W&B Run Link:
https://wandb.ai/mourlayetraore120-the-university-of-texas-at-dallas/nlp-emotion-classification/table?nw=nwusermourlayetraore120



# Environment Set Up

In [None]:
!pip install -q torch transformers datasets peft accelerate bitsandbytes trl wandb evaluate scikit-learn

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m116.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m93.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m61.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# standard pythgion libraries
from pathlib import Path
import re
import gc
import time
from typing import Dict, List, Union, Optional
from tqdm import tqdm
import itertools
import json
import joblib
import ast
from datetime import datetime
from difflib import get_close_matches

# Data Science librraies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import multilabel_confusion_matrix, precision_score, recall_score, f1_score

# Pytorch
import torch
import torch.nn as nn

# Huggingface Librraies
import evaluate
from datasets import load_dataset, DatasetDict, Dataset, ClassLabel
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM
from transformers import (
    TrainingArguments,
    Trainer,
    set_seed,
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    AutoConfig,
    pipeline,
    BitsAndBytesConfig,
)
from peft import (
    TaskType,
    LoraConfig,
    prepare_model_for_kbit_training,
    get_peft_model,
    AutoPeftModelForCausalLM,
    PeftConfig
)

from huggingface_hub import login

# Logging and secrets
import wandb
from google.colab import userdata

In [None]:
# 2. Mount Google Drive to access dataset
from google.colab import drive
drive.mount('/content/drive')



Mounted at /content/drive


In [None]:
# 7. Load Dataset
import pandas as pd
df = pd.read_csv("/content/drive/My Drive/NLP_CLASS/HOMEWORK/HW5/train.csv")
print(" Dataset loaded:")
display(df.head())


 Dataset loaded:


Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2017-21441,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,2017-31535,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0
2,2017-21068,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0
3,2017-31436,Accept the challenges so that you can literall...,0,0,0,0,1,0,1,0,0,0,0
4,2017-22195,My roommate: it's okay that we can't spell bec...,1,0,1,0,0,0,0,0,0,0,0


In [None]:

# 5. Initialize W&B
# 4. Login to Weights & Biases (W&B)
wandb_api_key = '137b014a91f75b1c495116229423e9f47fe8605a'

if wandb_api_key:
    wandb.login(key=wandb_api_key)
    print("Successfully logged in to W&B")
else:
    print("WANDB key not found.")



[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmourlayetraore120[0m ([33mmourlayetraore120-the-university-of-texas-at-dallas[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Successfully logged in to W&B


In [None]:
# Set project for grouping runs
%env WANDB_PROJECT=nlp-emotion-classification

# Initialize the run with a specific name
wandb.init(project="nlp-emotion-classification", name="hw7-qwen3-label-gen", reinit=True)


env: WANDB_PROJECT=nlp-emotion-classification




#3 data

In [None]:
# 1. Basic Info
print(f"Shape: {df.shape}\nColumns: {df.columns.tolist()}\n")

# 2. Sample Rows (focus on text + labels)
print("Sample rows:")
display(df.head(2))

# 3. Label Distribution (if multi-label)
if "labels" in df.columns:
    print("\nLabel distribution:")
    print(df["labels"].apply(lambda x: len(x) if isinstance(x, list) else 1).value_counts())

Shape: (7724, 13)
Columns: ['ID', 'Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust']

Sample rows:


Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2017-21441,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,2017-31535,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0


In [None]:
# First ensure we've created the labels column in the original DataFrame
emotion_cols = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'love',
                'optimism', 'pessimism', 'sadness', 'surprise', 'trust']
df['labels'] = df[emotion_cols].apply(
    lambda row: ', '.join([col for col, val in zip(emotion_cols, row) if val == 1]),
    axis=1
)

In [None]:
# Convert your DataFrame to Hugging Face Dataset
emotion_dataset = Dataset.from_pandas(df)

# Select and rename columns to match expected format
selected_columns = {
    'text': emotion_dataset['Tweet'],  # Your text input
    'tag': emotion_dataset['labels']   # The labels we created earlier
}

# Create new dataset with selected columns
emotion_selected_columns = Dataset.from_dict(selected_columns)

# Convert to pandas for easier processing (optional step)
emotion_selected_columns.set_format(type='pandas')
df_emotion = emotion_selected_columns[:]

# Format labels as JSON string (consistent with teacher's approach)
# Note: We already created properly formatted labels, so just ensure JSON compatibility
import json
df_emotion['label'] = df_emotion['tag'].apply(lambda x: json.dumps(x.split(', ')))

# Create final dataset with only text and label columns
df_final = df_emotion[['text', 'label']]

# Define our emotion classes (instead of programming languages)
class_names = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'love',
               'optimism', 'pessimism', 'sadness', 'surprise', 'trust']

## Split Dataset

In [None]:
# First convert your pandas DataFrame to Hugging Face Dataset
from datasets import Dataset
hf_dataset = Dataset.from_pandas(df_final)

# Now perform the splits correctly
train_valtest = hf_dataset.train_test_split(test_size=0.3, seed=42)
val_test = train_valtest['test'].train_test_split(test_size=0.5, seed=42)

dataset_splits = DatasetDict({
    "train": train_valtest['train'],
    "valid": val_test['train'],
    "test": val_test['test']
})

print(f"Splits created - Train: {len(dataset_splits['train'])}, Val: {len(dataset_splits['valid'])}, Test: {len(dataset_splits['test'])}")



Splits created - Train: 5406, Val: 1159, Test: 1159


# GPU Memory Helper

In [None]:
def free_gpu_memory():
    """Utility to clear GPU memory (unchanged from teacher's code)."""
    try:
        for obj in list(locals().values()):
            if torch.is_tensor(obj):
                del obj
        gc.collect()
        torch.cuda.empty_cache()
        time.sleep(2)
        print("GPU memory freed.")
    except Exception as e:
        print(f"Error freeing GPU memory: {e}")

In [None]:
free_gpu_memory()

GPU memory freed.


In [None]:
# Qwen3 model checkpoint
checkpoint = "Qwen/Qwen3-0.6B"  # or "Qwen/Qwen3-0.6B-Base" for base model

# Initialize tokenizer (critical for QLoRA)
tokenizer = AutoTokenizer.from_pretrained(
    checkpoint,
    trust_remote_code=True,  # Required for Qwen models
    pad_token="<|endoftext|>",  # Explicitly set pad token
    padding_side="right"  # Standard for decoder-only models
)

# Verify special tokens (HW7: Needed for generation formatting)
print(f"EOS token: {tokenizer.eos_token}")  # Should be '<|endoftext|>'
print(f"PAD token: {tokenizer.pad_token}")  # Must match what we set above

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/9.68k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

EOS token: <|im_end|>
PAD token: <|endoftext|>


#<font color = 'indianred'> **Create Prompts**



In [None]:
def format_emotion_prompts(examples, tokenizer):
    """HW7-specific prompt formatter for emotion label generation"""
    prompts = []
    for i in range(len(examples['text'])):
        # HW7 prompt structure (simpler than teacher's version)
        prompt = (
            f"Generate emotion labels for: {examples['text'][i].strip()}\n"
            f"Labels: {examples['label'][i]}{tokenizer.eos_token}"
        )
        prompts.append(prompt.strip())
    return {"text": prompts}

In [None]:
from functools import partial
from pprint import pprint  # Fixed import (was 'sprint')

# HW7-optimized prompt formatter
def format_emotion_example(example, tokenizer):
    """Simplified formatter for emotion labels"""
    return (
        f"Identify emotions in this text:\n"
        f"Text: {example['text'].strip()}\n"
        f"Labels: {example['label'].strip()}{tokenizer.eos_token}"
    )

# Create batched version
def format_emotion_batch(examples, tokenizer):
    return {"text": [format_emotion_example({"text": t, "label": l}, tokenizer)
                    for t, l in zip(examples['text'], examples['label'])]}

# Test with correct dataset slicing
samples = format_emotion_batch(dataset_splits['train'][0:3], tokenizer)
pprint(samples['text'][0])  # Verify first sample

('Identify emotions in this text:\n'
 'Text: If you think reason will prevail in this election remember that Hitler '
 "was elected by what was then a wholly 'reasonable' society. #fear\n"
 'Labels: ["fear"]<|im_end|>')


##  <font color = 'indianred'> **Filter Longer sequences**

In [None]:
def filter_emotion_by_length(examples, tokenizer, max_length=512):
    """
    HW7-optimized length filter for emotion labels
    Returns: {'keep': [bool]} indicating which examples are within length limit
    """
    # Process all examples in batch
    prompts = [
        f"Text: {text}\nLabels: {label}{tokenizer.eos_token}"
        for text, label in zip(examples['text'], examples['label'])
    ]

    # Tokenize all at once (faster)
    tokenized = tokenizer(prompts, add_special_tokens=False)
    return {
        'keep': [len(ids) <= max_length for ids in tokenized['input_ids']]
    }

# Apply to dataset (batched for efficiency)
dataset_splits = dataset_splits.map(
    filter_emotion_by_length,
    batched=True,
    fn_kwargs={'tokenizer': tokenizer, 'max_length': 512}
)

# Filter and clean up
dataset_splits_filtered = dataset_splits.filter(lambda x: x['keep']).remove_columns(['keep'])
print(f"Filtered counts - Train: {len(dataset_splits_filtered['train'])}, Val: {len(dataset_splits_filtered['valid'])}")

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/1159 [00:00<?, ? examples/s]

Map:   0%|          | 0/1159 [00:00<?, ? examples/s]

Filter:   0%|          | 0/5406 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1159 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1159 [00:00<?, ? examples/s]

Filtered counts - Train: 5406, Val: 1159


##  <font color = 'indianred'> **Data Collator**

In [None]:
# HW7: Optimized Data Collator for Qwen3 Emotion Generation
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # Causal LM
    pad_to_multiple_of=8  # Better GPU utilization
)

# Enhanced Verification
def verify_emotion_tokens(tokenizer):
    """Comprehensive check for emotion label processing"""
    test_cases = [
        ("joy", "single emotion"),
        ("joy, love", "multiple emotions"),
        ("", "no emotion")  # Edge case
    ]

    print("=== HW7 Tokenization Validation ===")
    print(f"EOS Token: {tokenizer.eos_token} (ID: {tokenizer.eos_token_id})")
    print(f"PAD Token: {tokenizer.pad_token} (ID: {tokenizer.pad_token_id})\n")

    for labels, desc in test_cases:
        prompt = f"Text: Test input\nLabels: {labels}{tokenizer.eos_token}"
        tokens = tokenizer.tokenize(prompt)
        ids = tokenizer.encode(prompt, add_special_tokens=False)

        print(f"Case: {desc:<15} | Tokens: {tokens}")
        print(f"IDs: {ids}\nDecoded: {tokenizer.decode(ids)}\n")

# Execute verification
verify_emotion_tokens(tokenizer)



=== HW7 Tokenization Validation ===
EOS Token: <|im_end|> (ID: 151645)
PAD Token: <|endoftext|> (ID: 151643)

Case: single emotion  | Tokens: ['Text', ':', 'ĠTest', 'Ġinput', 'Ċ', 'Labels', ':', 'Ġjoy', '<|im_end|>']
IDs: [1178, 25, 3393, 1946, 198, 23674, 25, 15888, 151645]
Decoded: Text: Test input
Labels: joy<|im_end|>

Case: multiple emotions | Tokens: ['Text', ':', 'ĠTest', 'Ġinput', 'Ċ', 'Labels', ':', 'Ġjoy', ',', 'Ġlove', '<|im_end|>']
IDs: [1178, 25, 3393, 1946, 198, 23674, 25, 15888, 11, 2948, 151645]
Decoded: Text: Test input
Labels: joy, love<|im_end|>

Case: no emotion      | Tokens: ['Text', ':', 'ĠTest', 'Ġinput', 'Ċ', 'Labels', ':', 'Ġ', '<|im_end|>']
IDs: [1178, 25, 3393, 1946, 198, 23674, 25, 220, 151645]
Decoded: Text: Test input
Labels: <|im_end|>



#  <font color = 'indianred'> **Model Training Set Up**

In [None]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
from peft import prepare_model_for_kbit_training

# 1. QLoRA Configuration (must run first)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16  # Force float16 for stability
)

# 2. Model Loading with Fallback
try:
    # Attempt with existing packages
    model = AutoModelForCausalLM.from_pretrained(
        "Qwen/Qwen3-0.6B",
        quantization_config=bnb_config,
        torch_dtype=torch.float16,
        trust_remote_code=True,  # Required for Qwen
        device_map="auto"
    )
    print("✅ Model loaded successfully with existing packages")

except ImportError as e:
    print(f"⚠️ ImportError: {e}\nInstalling required packages...")

    # Install dependencies
    !pip install -q bitsandbytes transformers accelerate
    !pip install -q git+https://github.com/huggingface/peft.git

    # Retry loading
    model = AutoModelForCausalLM.from_pretrained(
        "Qwen/Qwen3-0.6B",
        quantization_config=bnb_config,
        torch_dtype=torch.float16,
        trust_remote_code=True,
        device_map="auto"
    )
    print("✅ Model loaded after package installation")

# 3. Final Preparation
model = prepare_model_for_kbit_training(model)
model.gradient_checkpointing_enable()

# Verification
print("\n=== HW7 Model Ready ===")
print(f"Device: {model.device}")
print(f"4-bit Quantized: {model.is_quantized}")
print(f"Trainable params: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

✅ Model loaded successfully with existing packages

=== HW7 Model Ready ===
Device: cuda:0
4-bit Quantized: True
Trainable params: 0


##  <font color = 'indianred'> **PEFT Setup**

In [None]:
model

Qwen3ForCausalLM(
  (model): Qwen3Model(
    (embed_tokens): Embedding(151936, 1024)
    (layers): ModuleList(
      (0-27): 28 x Qwen3DecoderLayer(
        (self_attn): Qwen3Attention(
          (q_proj): Linear4bit(in_features=1024, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=1024, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=1024, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=1024, bias=False)
          (q_norm): Qwen3RMSNorm((128,), eps=1e-06)
          (k_norm): Qwen3RMSNorm((128,), eps=1e-06)
        )
        (mlp): Qwen3MLP(
          (gate_proj): Linear4bit(in_features=1024, out_features=3072, bias=False)
          (up_proj): Linear4bit(in_features=1024, out_features=3072, bias=False)
          (down_proj): Linear4bit(in_features=3072, out_features=1024, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen3RMSNorm((1024,), eps=1e-06)
        (po

In [None]:
# Optimized LoRA Setup for Qwen3
from peft import LoraConfig, get_peft_model

# 1. Verify target modules (Qwen3-specific)
target_modules = ["q_proj", "k_proj", "v_proj"]  # From model inspection
print(f"HW7 Target Modules: {target_modules}")

HW7 Target Modules: ['q_proj', 'k_proj', 'v_proj']


In [None]:
# 2. LoRA Config (HW7-optimized)
peft_config = LoraConfig(
    task_type="CAUSAL_LM",  # Direct string (no need for TaskType import)
    r=8,                   # HW7: Reduced from 128 for efficiency
    lora_alpha=32,         # Scaled down proportionally
    lora_dropout=0.05,     # Default for regularization
    target_modules=target_modules,  # Only attention projections
    bias="none"            # Disabled for QLoRA
)



In [None]:
peft_config

LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='Qwen/Qwen3-0.6B', revision=None, inference_mode=False, r=8, target_modules={'q_proj', 'k_proj', 'v_proj'}, exclude_modules=None, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', trainable_token_indices=None, loftq_config={}, eva_config=None, corda_config=None, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False)

In [None]:
# 3. Apply to model
lora_model = get_peft_model(model, peft_config)
lora_model.print_trainable_parameters()

trainable params: 1,605,632 || all params: 597,655,552 || trainable%: 0.2687


## <font color = 'indianred'> **Training Arguments**</font>







In [None]:
from pathlib import Path
from transformers import TrainingArguments  # HW7: Using standard Trainer

# 1. Output Directory (HW7-specific)
model_folder = Path("qwen3_emotion_qlora")
model_folder.mkdir(exist_ok=True, parents=True)

# 2. Training Arguments (Optimized for HW7)
training_args = TrainingArguments(
    output_dir=str(model_folder),
    run_name="qwen3_emotion_hw7",  # For W&B
    num_train_epochs=3,  # HW7: Increased from 2 for better convergence
    per_device_train_batch_size=4,  # Adjusted for Colab
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,  # Reduced from 8 for stability
    learning_rate=2e-4,  # Increased from 2e-5 for QLoRA
    optim="paged_adamw_8bit",  # HW7: Better for QLoRA
    logging_steps=20,
    eval_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=50,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    fp16=True,  # Force fp16 for Colab compatibility
    bf16=False,
    gradient_checkpointing=True,
    report_to="wandb"  # HW7: Required for submission
)

# 3. Model Config (Critical for QLoRA)
model.config.use_cache = False  # Disable for gradient checkpointing
model.config.pretraining_tp = 1  # Optimize for Qwen3



In [None]:
model.config

Qwen3Config {
  "_attn_implementation_autoset": true,
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 40960,
  "max_window_layers": 28,
  "model_type": "qwen3",
  "num_attention_heads": 16,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_quant_storage": "uint8",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbyt

##  <font color = 'indianred'> **Initialize Trainer**</font>



In [None]:
from trl import SFTTrainer

## 1. Prepare Dataset with Pre-formatted Prompts
def format_emotion_prompt(example):
    """HW7-specific prompt formatting"""
    full_prompt = (
        f"Identify emotions in this text:\n"
        f"Text: {example['text']}\n"
        f"Labels: {example['label']}{tokenizer.eos_token}"
    )
    return {"text": full_prompt}

# Apply formatting to datasets
formatted_train = dataset_splits_filtered["train"].map(format_emotion_prompt)
formatted_val = dataset_splits_filtered["valid"].map(format_emotion_prompt)

## 2. Version-Compatible Trainer Setup
trainer_args = {
    "model": lora_model,
    "args": training_args,
    "train_dataset": formatted_train,
    "eval_dataset": formatted_val,
    "peft_config": peft_config,
    "data_collator": data_collator
}

# Remove max_seq_length from args since it's causing issues
if "max_seq_length" in trainer_args:
    del trainer_args["max_seq_length"]

try:
    # Attempt initialization with minimal parameters
    trainer = SFTTrainer(**trainer_args)
except TypeError as e:
    print(f"Initialization error: {e}")
    print("Trying alternative approach...")

    # Fallback: Use transformers Trainer instead
    from transformers import Trainer
    trainer = Trainer(**trainer_args)

## 3. Critical Post-Init Configuration
trainer.tokenizer = tokenizer
trainer.model.config.use_cache = False

print("Trainer successfully initialized!")
print(f"Tokenizer assigned: {trainer.tokenizer is not None}")
print(f"Train dataset size: {len(formatted_train)}")

Map:   0%|          | 0/5406 [00:00<?, ? examples/s]

Map:   0%|          | 0/1159 [00:00<?, ? examples/s]

Converting train dataset to ChatML:   0%|          | 0/5406 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/5406 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/5406 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/5406 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/1159 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/1159 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/1159 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/1159 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


✅ Trainer successfully initialized!
Tokenizer assigned: True
Train dataset size: 5406


## <font color = 'indianred'> **Setup WandB**</font>

In [None]:
# 1. Environment Setup (HW7 Requirement)
%env WANDB_PROJECT=qwen3_emotion_hw7

env: WANDB_PROJECT=qwen3_emotion_hw7


##  <font color = 'indianred'> **Training**

In [None]:
# 1. Training with Memory Management
try:
    print("Starting training...")
    trainer.train()
    print("Training completed successfully!")

    # HW7: Save best model checkpoint info
    if trainer.state.best_model_checkpoint:
        best_model_checkpoint_step = trainer.state.best_model_checkpoint.split('-')[-1]
        print(f"Best model at step: {best_model_checkpoint_step}")
    else:
        print("No best model checkpoint found")

except RuntimeError as e:
    if 'CUDA out of memory' in str(e):
        print("⚠️ CUDA OOM - Attempting recovery...")
        free_gpu_memory()

        # HW7: Reduced batch size fallback
        trainer.args.per_device_train_batch_size = max(1, trainer.args.per_device_train_batch_size // 2)
        print(f"Retrying with reduced batch size: {trainer.args.per_device_train_batch_size}")

        trainer.train()  # Final attempt
    else:
        raise e


Starting training...


Step,Training Loss,Validation Loss
50,3.0284,2.921077
100,2.7976,2.852086
150,2.8281,2.820377
200,2.7255,2.806708
250,2.7366,2.793375
300,2.7137,2.78008
350,2.7027,2.774419
400,2.6468,2.76831
450,2.6841,2.760072
500,2.639,2.757352


Training completed successfully!
Best model at step: 1000


In [None]:

#  HW7: Mandatory Final Save
trainer.save_model("qwen3_emotion_final")
tokenizer.save_pretrained("qwen3_emotion_final")
print("📁 Model and tokenizer saved for submission")

📁 Model and tokenizer saved for submission


In [None]:
eval_results =trainer.evaluate()

In [None]:
eval_results

{'eval_loss': 2.734558582305908,
 'eval_runtime': 41.5734,
 'eval_samples_per_second': 27.878,
 'eval_steps_per_second': 6.976}

In [None]:
test_df = pd.read_csv("/content/drive/My Drive/NLP_CLASS/HOMEWORK/HW5/test.csv")
test_df

Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2018-01559,@Adnan__786__ @AsYouNotWish Dont worry Indian ...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
1,2018-03739,"Academy of Sciences, eschews the normally sobe...",NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
2,2018-00385,I blew that opportunity -__- #mad,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3,2018-03001,This time in 2 weeks I will be 30... 😥,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
4,2018-01988,#Deppression is real. Partners w/ #depressed p...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3254,2018-03848,shaft abrasions from panties merely shifted to...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3255,2018-00416,@lomadia heard of Remothered? Indie horror gam...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3256,2018-03717,All this fake outrage. Y'all need to stop 🤣,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE
3257,2018-03504,Would be ever so grateful if you could record ...,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE


In [None]:
# Convert test_df to Hugging Face Dataset
from datasets import Dataset
test_dataset = Dataset.from_pandas(test_df)

# Apply the SAME prompt formatting used in training
formatted_test = test_dataset.map(
    lambda x: {"text": f"Identify emotions in this text:\nText: {x['Tweet']}\nLabels:"},
    remove_columns=test_dataset.column_names
)


Map:   0%|          | 0/3259 [00:00<?, ? examples/s]

In [None]:
free_gpu_memory()

GPU memory freed.


In [None]:
import torch
from tqdm import tqdm

# 1. Free memory aggressively
def clear_memory():
    torch.cuda.empty_cache()
    gc.collect()

clear_memory()

# 2. Process in smaller batches
batch_size = 4  # Reduce if needed
predictions = []

for i in tqdm(range(0, len(test_texts), batch_size)):
    # Prepare batch
    batch_texts = test_texts[i:i+batch_size]
    batch_prompts = [f"Identify emotions:\nText: {t}\nLabels:" for t in batch_texts]

    # Tokenize and move to GPU
    inputs = tokenizer(
        batch_prompts,
        padding="max_length",
        truncation=True,
        max_length=512,
        return_tensors="pt"
    ).to("cuda")

    # Generate with memory limits
    with torch.no_grad():
        outputs = trainer.model.generate(
            **inputs,
            max_new_tokens=15,  # Strict limit
            do_sample=False  # Disable sampling to save memory
        )
        batch_preds = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        predictions.extend([p.split("Labels:")[-1].strip() for p in batch_preds])

    # Clean up
    del inputs, outputs, batch_preds
    clear_memory()

# 3. Verify
print(f"First prediction: {predictions[0]}")
print(f"Prediction count: {len(predictions)}/{len(test_texts)}")

100%|██████████| 815/815 [33:59<00:00,  2.50s/it]

First prediction: : ["anger", "disgust", "fear"] #India
Prediction count: 3259/3259





In [None]:
# Save raw predictions with IDs and text
predictions_df = pd.DataFrame({
    "ID": test_df["ID"],
    "Text": test_df["Tweet"],
    "Predicted_Labels": predictions  # Your raw model outputs (e.g., "joy, anger")
})

predictions_df.to_csv("raw_predictions.csv", index=False)
print("✅ Raw predictions saved to raw_predictions.csv")

✅ Raw predictions saved to raw_predictions.csv


In [None]:
from google.colab import files
files.download("raw_predictions.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Convert predictions to Kaggle format (binary columns)
kaggle_submission = pd.DataFrame({
    "ID": test_df["ID"],
    **{col: [int(col.lower() in pred.lower()) for pred in predictions] for col in emotion_cols}
})


In [None]:

# Save
kaggle_submission.to_csv("kaggle_submission.csv", index=False)
print("Submission file saved!")

Submission file saved!


In [None]:
from google.colab import files
files.download("kaggle_submission.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
wandb.finish()

0,1
eval/loss,█▅▄▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁
eval/mean_token_accuracy,▁▃▄▅▅▆▆▆▆▇▇▇▇███████
eval/num_tokens,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
eval/runtime,█▁▂▂▂▁▃▂▂▂▂▂▁▂▂▂▂▂▂▂█
eval/samples_per_second,▁█▇▇▇█▆▆▇▇▇▇█▆▆▇▇▇▇▇▁
eval/steps_per_second,▁█▇▇▇█▆▆▇▇▇▇█▆▆▇▇▇▇▇▁
train/epoch,▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█████
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇███
train/grad_norm,▇▅▅▄▃▃▅▂▅▂▂▁▃▁▃▅▄▄▅▄▄▄▆▆▅▅▄▅▇▅▅▅▅▇▆█▆▇█▇
train/learning_rate,████▇▇▇▇▇▇▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁

0,1
eval/loss,2.73456
eval/mean_token_accuracy,0.54312
eval/num_tokens,756873.0
eval/runtime,41.5734
eval/samples_per_second,27.878
eval/steps_per_second,6.976
total_flos,2673191207043072.0
train/epoch,3.0
train/global_step,1014.0
train/grad_norm,2.09336



#  Conclusion – Emotion Label Generation Completed
 ======================================================
 Final Evaluation Results:
- Eval Loss: 2.73456
- Eval Mean Token Accuracy: 54.312%
- Training Loss: 2.6456

 Observations:
- The QLoRA approach enabled us to fine-tune the large Qwen3 model efficiently using limited GPU resources.
- The mean token accuracy suggests that the model learned to generate emotion labels with reasonable precision.
- Further improvements could include prompt engineering and experimenting with larger Qwen models.

#  W&B Run Link (Results & Logs):
https://wandb.ai/mourlayetraore120-the-university-of-texas-at-dallas/nlp-emotion-classification/table?nw=nwusermourlayetraore120

