# Quantifying the Environmental Cost of AI: Carbon Emissions in Language Model Fine-Tuning for Question Answering

> ### **Project Goal** : As language models continue to play a larger role in natural language processing, their environmental impact has become an important issue to consider. While much of the research in this area focuses on improving model accuracy, the energy use and carbon footprint involved in training these systems are often overlooked or poorly documented. This project aims to explore that imbalance by studying how improvements in model performance relate to the environmental costs of fine-tuning.

# Training Strategy 1: Full Fine-Tuning (Model DistilBERT)

In [1]:
!pip install transformers
!pip install datasets
!pip install accelerate
!pip install codecarbon
!pip install evaluate codecarbon



In [2]:
# Importing Necessary Libraries
import os
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForQuestionAnswering,
    TrainingArguments,
    Trainer,
    default_data_collator,
    pipeline
)
import torch
from datasets import Dataset
from codecarbon import EmissionsTracker
from google.colab import drive
import pandas as pd
from collections import defaultdict
import json
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import numpy as np

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## STEP 1: Loading The Stanford Question Answering Dataset (SQuAD) Dataset

In [3]:
squad = load_dataset("squad_v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
print("SQuAD Format: ",squad)
print(f"\nFull training set size: {len(squad['train'])}")
print(f"\nValidation set size: {len(squad['validation'])}")

SQuAD Format:  DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

Full training set size: 130319

Validation set size: 11873


In [5]:
train_data = pd.DataFrame(squad['train'])
train_data.head()

Unnamed: 0,id,title,context,question,answers
0,56be85543aeaaa14008c9063,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,When did Beyonce start becoming popular?,"{'text': ['in the late 1990s'], 'answer_start'..."
1,56be85543aeaaa14008c9065,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,What areas did Beyonce compete in when she was...,"{'text': ['singing and dancing'], 'answer_star..."
2,56be85543aeaaa14008c9066,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,When did Beyonce leave Destiny's Child and bec...,"{'text': ['2003'], 'answer_start': [526]}"
3,56bf6b0f3aeaaa14008c9601,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,In what city and state did Beyonce grow up?,"{'text': ['Houston, Texas'], 'answer_start': [..."
4,56bf6b0f3aeaaa14008c9602,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,In which decade did Beyonce become famous?,"{'text': ['late 1990s'], 'answer_start': [276]}"


## STEP 2: Tokenization For the Model Function

In [6]:
#Autotokenizer automatically picks the correct tokenizer for given model

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

In [7]:
def preprocess_function(examples):
    #Convert raw SQuAD examples into model-ready training data.
    questions = [q.strip() for q in examples["question"]]
    contexts = [c.strip() for c in examples["context"]]

    # Tokenize
    tokenized = tokenizer(
        questions,
        contexts,
        max_length=384,
        stride=128,
        padding="max_length",
        truncation="only_second",         #Truncate from context
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
    )

    # Mapping back to original samples
    sample_mapping = tokenized.pop("overflow_to_sample_mapping")
    offset_mapping = tokenized["offset_mapping"]

    start_positions = []
    end_positions = []

    for i, offsets in enumerate(offset_mapping):
        sample_idx = sample_mapping[i]
        answers = examples["answers"][sample_idx]

        # In no answer case
        if len(answers["answer_start"]) == 0:
            start_positions.append(0)
            end_positions.append(0)
            continue

        start_char = answers["answer_start"][0]
        end_char = start_char + len(answers["text"][0])

        seq_ids = tokenized.sequence_ids(i)

        # Find context section
        context_start = seq_ids.index(1) if 1 in seq_ids else 0
        context_end = len(seq_ids) - 1 - seq_ids[::-1].index(1) if 1 in seq_ids else len(seq_ids) - 1

        # If answer not inside context - mark no answer
        if not (offsets[context_start][0] <= start_char and offsets[context_end][1] >= end_char):
            start_positions.append(0)
            end_positions.append(0)
            continue

        # Find start token
        token_start = context_start
        while token_start <= context_end and offsets[token_start][0] <= start_char:
            token_start += 1
        start_positions.append(token_start - 1)

        # Find end token - move forward until we pass answer end
        token_end = context_start
        while token_end <= context_end and offsets[token_end][1] < end_char:
            token_end += 1
        end_positions.append(token_end)

    tokenized["start_positions"] = start_positions
    tokenized["end_positions"] = end_positions

    return tokenized

In [8]:
print("========== Data Format Within SQuAD Training Set ==========")
print("\nQuestion at Index[0]: ", squad["train"][0]['question'])
print("\nContext at Index[0]: ", squad["train"][0]['context'])
print("\nAnswers at Index[0]: ", squad["train"][0]['answers'])

#Testing preprocess_function function
sample = {
    "question": [squad["train"][0]['question']],
    "context": [squad["train"][0]['context']],
    "answers": [squad["train"][0]['answers']]
}

output = preprocess_function(sample)
print("\n========== Data Format After Preprocessing ==========")

for k, v in output.items():

    print('\n',k, ":", v[:5] if isinstance(v, list) else v)

# Now test start and end position mapping
predicted = tokenizer.decode(output['input_ids'][0][output['start_positions'][0]:output['end_positions'][0]+1])
print(f"\nPredicted Answer Mapping: '{predicted}'")


Question at Index[0]:  When did Beyonce start becoming popular?

Context at Index[0]:  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".

Answers at Index[0]:  {'text': ['in the late 1990s'], 'answer_start': [269]}


 input_ids : [[101, 2043, 2106, 20773, 2707, 3352, 2759, 1029, 102, 20773, 21025, 19358, 22815, 1011, 5708, 1006, 1013, 12170, 23432, 297

In [9]:
# Preprocess validation set (full)
print("\nPreprocessing validation set...")
tokenized_validation = squad["validation"].map(
    preprocess_function,
    batched=True,
    remove_columns=squad["validation"].column_names
)


Preprocessing validation set...


In [10]:
tokenized_validation.features

{'input_ids': List(Value('int32')),
 'attention_mask': List(Value('int8')),
 'offset_mapping': List(List(Value('int64'))),
 'start_positions': Value('int64'),
 'end_positions': Value('int64')}

In [11]:
#Prepareing function for tokenization based of training size of the data.

def prepare_dataset(train_data, size_fraction, preprocess_fn):

    #Create and preprocess a subset of training data.
    num_samples = int(len(train_data) * size_fraction)
    train_subset = train_data.select(range(num_samples))

    print(f"Preprocessing {num_samples} training samples...")
    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    return tokenized_train, num_samples

## STEP 3: Training The DistilBert Model Functions

In [12]:
#Model Architecture:
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
print(f"\n{'='*80}")
print("\nDistilBERT Model Architecture:")
print(f"\n{'='*80}")
print("\nTransformer layers:",model.config.n_layers)
print("\nHidden size:",model.config.dim)
print('\nIntermediate feed-forward size:',model.config.hidden_dim)
print("\nAttention heads:",model.config.n_heads)
print("\nMax positional embeddings:", model.config.max_position_embeddings)
print("\nVocabulary size:", model.config.vocab_size)

# Parameter Count
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print("\nTotal parameters:", f"{total_params:,}")
print("\nTrainable parameters:", f"{trainable_params:,}")


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.




DistilBERT Model Architecture:


Transformer layers: 6

Hidden size: 768

Intermediate feed-forward size: 3072

Attention heads: 12

Max positional embeddings: 512

Vocabulary size: 30522

Total parameters: 66,364,418

Trainable parameters: 66,364,418


In [13]:
# Custom compute metrics function for F1 and Exact Match
def compute_metrics(pred):
    predictions, labels = pred
    start_preds = np.argmax(predictions[0], axis=1)
    end_preds = np.argmax(predictions[1], axis=1)

    start_true = labels[0]
    end_true = labels[1]

    # Calculate exact match
    exact_matches = ((start_preds == start_true) & (end_preds == end_true)).sum()
    exact_match = exact_matches / len(start_true)

    # Calculate F1 score (token overlap)
    f1_scores = []
    for start_p, end_p, start_t, end_t in zip(start_preds, end_preds, start_true, end_true):
        pred_tokens = set(range(start_p, end_p + 1))
        true_tokens = set(range(start_t, end_t + 1))

        if len(pred_tokens) == 0 and len(true_tokens) == 0:
            f1_scores.append(1.0)
        elif len(pred_tokens) == 0 or len(true_tokens) == 0:
            f1_scores.append(0.0)
        else:
            overlap = len(pred_tokens & true_tokens)
            precision = overlap / len(pred_tokens)
            recall = overlap / len(true_tokens)
            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
            f1_scores.append(f1)

    avg_f1 = np.mean(f1_scores)

    return {
        "exact_match": exact_match,
        "f1": avg_f1
    }

In [14]:
def train_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn, size_fraction, model_name):

    # Load fresh model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Setup output directory
    output_dir = f"results_distilbert_{int(size_fraction*100)}pct"

    # Training arguments
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=3e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=2,
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"DistilBERT_{int(size_fraction*100)}pct",
        output_dir=output_dir
    )
    tracker.start()

    # Train
    print("Training model...")
    train_results = trainer.train()

    # Stop carbon tracking
    tracker.stop()

    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir


## STEP 4: Evaluating And Saving The Results Functions

> We will be training our model on various data sizes from our SQuAD dataset.
>
> Training Data Variation: [25%, 50%, 80%]

In [15]:
def evaluate_and_save(trainer, train_results, emissions_data, output_dir, size_fraction, num_samples):
    """Evaluate model, print results, and save artifacts."""

    # Evaluate
    print("Evaluating model...")
    eval_results = trainer.evaluate()

    #Calculate trainable parameters for Full Fine-tuning
    model = trainer.model
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())
    trainable_percentage = 100 * trainable_params / total_params

    # Compile results
    result_entry = {
        "training_method": "Full Fine-Tuning",
        "model_name": "DistilBERT",
        'dataset_size%': int(size_fraction*100),
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print summary
    print(f"\n{'='*80}")
    print(f"\nFINE-TUNING RESULTS SUMMARY FOR {size_fraction*100}% DATASET:")
    print(f"{'='*80}")
    print(f"Training Method: Full Fine-Tuning")
    print(f"Model: DistilBERT")

    print(f"\nModel Parameters:")
    print(f"Total Parameters: {total_params:,}")
    print(f"Trainable Parameters: {trainable_params:,}")
    print(f"Trainable Percentage: {trainable_percentage:.2f}%")


    print(f"\nPerformance Metrics:")
    print(f"F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\nEnergy Consumption:")
    print(f"Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nAverage Power Draw:")
    print(f"CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\nCarbon Footprint:")
    print(f"Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\nLocation & Infrastructure:")
    print(f"Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"Region: {emissions_data.region}")
    print(f"On Cloud: {emissions_data.on_cloud}")
    print(f"PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\nSystem Specifications:")
    print(f"OS: {emissions_data.os}")
    print(f"CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"GPU: None detected")
    print(f"RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"Python: {emissions_data.python_version}")

    print(f"\n{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")

    # Clear GPU memory
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

In [16]:
def run_experiment(size_fraction, train_data, eval_data, tokenizer, preprocess_fn, compute_metrics_fn, model_name):

    print(f"\n{'='*60}")
    print(f"Training with {size_fraction*100}% of training data")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train model
    trainer, train_results, emissions_data, output_dir = train_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save(trainer, train_results, emissions_data, output_dir, size_fraction, num_samples)

    return result_entry

In [17]:
# Store results
results_summary = []

In [18]:
#Considering 25% of data for training the model
%%time
print("\n" + "="*80)
print("EXPERIMENT 1: FULL FINE-TUNING WITH 25.0% TRAINING DATASET")
print("="*80)
result1 = run_experiment(
        size_fraction=0.25,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="distilbert-base-uncased"
    )

results_summary.append(result1)


EXPERIMENT 1: FULL FINE-TUNING WITH 25.0% TRAINING DATASET

Training with 25.0% of training data
Preprocessing 32579 training samples...


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 03:42:40] [setup] RAM Tracking...
[codecarbon INFO @ 03:42:40] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 03:42:41] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 03:42:41] [setup] GPU Tracking...
[codecarbon INFO @ 03:42:41] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 03:42:41] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: py

Training model...


[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:

 2e1136a6f36ccf97a6d888cd8cb3b2895739daa7


[34m[1mwandb[0m: Enter your choice:

 2e1136a6f36ccf97a6d888cd8cb3b2895739daa7


[34m[1mwandb[0m: Enter your choice:[codecarbon INFO @ 03:42:57] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:42:57] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:42:57] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:42:57] Energy consumed for all GPUs : 0.000235 kWh. Total GPU Power : 56.44876751472431 W
[codecarbon INFO @ 03:42:57] 0.000571 kWh of electricity and 0.000000 L of water were used since the beginning.


 2e1136a6f36ccf97a6d888cd8cb3b2895739daa7


[34m[1mwandb[0m: Enter your choice:[codecarbon INFO @ 03:43:12] Energy consumed for RAM : 0.000317 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:43:12] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:12] Energy consumed for All CPU : 0.000354 kWh
[codecarbon INFO @ 03:43:12] Energy consumed for all GPUs : 0.000471 kWh. Total GPU Power : 56.607699745992754 W
[codecarbon INFO @ 03:43:12] 0.001142 kWh of electricity and 0.000000 L of water were used since the beginning.


 2


[34m[1mwandb[0m: You chose 'Use an existing W&B account'
[34m[1mwandb[0m: Logging into https://api.wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: Find your API key here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mdikshaph07[0m ([33mdikshaph07-rutgers-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[codecarbon INFO @ 03:43:27] Energy consumed for RAM : 0.000475 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:43:27] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:27] Energy consumed for All CPU : 0.000531 kWh
[codecarbon INFO @ 03:43:27] Energy consumed for all GPUs : 0.000709 kWh. Total GPU Power : 57.023969482846525 W
[codecarbon INFO @ 03:43:27] 0.001715 kWh of electricity and 0.000000 L of water were used since the beginning.


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.3151,1.758805,0.380419,0.448923
2,0.9479,2.111456,0.368469,0.447256


[codecarbon INFO @ 03:43:42] Energy consumed for RAM : 0.000633 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:43:42] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:42] Energy consumed for All CPU : 0.000708 kWh
[codecarbon INFO @ 03:43:42] Energy consumed for all GPUs : 0.001562 kWh. Total GPU Power : 204.89020364158264 W
[codecarbon INFO @ 03:43:42] 0.002903 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:43:43] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:43:43] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:43] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:43:43] Energy consumed for all GPUs : 0.000863 kWh. Total GPU Power : 206.94350789592767 W
[codecarbon INFO @ 03:43:43] 0.001198 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:

Evaluating model...




FINE-TUNING RESULTS SUMMARY FOR 25.0% DATASET:
Training Method: Full Fine-Tuning
Model: DistilBERT

Model Parameters:
Total Parameters: 66,364,418
Trainable Parameters: 66,364,418
Trainable Percentage: 100.00%

Performance Metrics:
F1 Score: 0.4489
Exact Match: 0.3804
Eval Loss: 1.7588

Energy Consumption:
Total Energy: 0.016602 kWh
CPU Energy: 0.002620 kWh (15.8%)
GPU Energy: 0.011639 kWh (70.1%)
RAM Energy: 0.002343 kWh (14.1%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 188.50 W
RAM Power: 38.00 W
Total Power: 269.00 W

Carbon Footprint:
Total CO2 Emissions: 0.007816 kg
Emissions Rate: 0.000035197 kg/s
Duration: 0.06 hours
Training Time (Trainer): 0.06 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

CPU times: user 3min 10s,

In [19]:
#Considering 50% of data for training the model
%%time
print("\n" + "="*80)
print("EXPERIMENT 2: FULL FINE-TUNING WITH 50.0% TRAINING DATASET")
print("="*80)
result2 = run_experiment(
        size_fraction=0.5,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="distilbert-base-uncased"
    )
results_summary.append(result2)


EXPERIMENT 2: FULL FINE-TUNING WITH 50.0% TRAINING DATASET

Training with 50.0% of training data
Preprocessing 65159 training samples...


Map:   0%|          | 0/65159 [00:00<?, ? examples/s]

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 03:47:17] [setup] RAM Tracking...
[codecarbon INFO @ 03:47:17] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 03:47:18] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 03:47:18] [setup] GPU Tracking...
[codecarbon INFO @ 03:47:18] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 03:47:18] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: py

Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.2324,1.468583,0.451376,0.530516
2,0.9194,1.446385,0.491594,0.570179


[codecarbon INFO @ 03:47:35] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:47:35] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:47:35] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:47:35] Energy consumed for all GPUs : 0.000909 kWh. Total GPU Power : 218.01188906121504 W
[codecarbon INFO @ 03:47:35] 0.001245 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:47:35] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:47:35] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:47:35] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:47:35] Energy consumed for all GPUs : 0.000931 kWh. Total GPU Power : 223.15882601651322 W
[codecarbon INFO @ 03:47:35] 0.001266 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:

Evaluating model...




FINE-TUNING RESULTS SUMMARY FOR 50.0% DATASET:
Training Method: Full Fine-Tuning
Model: DistilBERT

Model Parameters:
Total Parameters: 66,364,418
Trainable Parameters: 66,364,418
Trainable Percentage: 100.00%

Performance Metrics:
F1 Score: 0.5702
Exact Match: 0.4916
Eval Loss: 1.4464

Energy Consumption:
Total Energy: 0.028109 kWh
CPU Energy: 0.003889 kWh (13.8%)
GPU Energy: 0.020743 kWh (73.8%)
RAM Energy: 0.003477 kWh (12.4%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 226.55 W
RAM Power: 38.00 W
Total Power: 307.05 W

Carbon Footprint:
Total CO2 Emissions: 0.013233 kg
Emissions Rate: 0.000040148 kg/s
Duration: 0.09 hours
Training Time (Trainer): 0.09 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

CPU times: user 7min 6s, 

In [20]:
#Considering 80% of data for training the model
%%time
print("\n" + "="*80)
print("EXPERIMENT 3: FULL FINE-TUNING WITH 80.0% TRAINING DATASET")
print("="*80)
result3 = run_experiment(
        size_fraction=0.8,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="distilbert-base-uncased"
    )
results_summary.append(result3)


EXPERIMENT 3: FULL FINE-TUNING WITH 80.0% TRAINING DATASET

Training with 80.0% of training data
Preprocessing 104255 training samples...


Map:   0%|          | 0/104255 [00:00<?, ? examples/s]

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 03:54:05] [setup] RAM Tracking...
[codecarbon INFO @ 03:54:05] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 03:54:06] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 03:54:06] [setup] GPU Tracking...
[codecarbon INFO @ 03:54:06] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 03:54:06] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: py

Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.2574,1.224591,0.521675,0.593698
2,0.895,1.299863,0.530246,0.609413


[codecarbon INFO @ 03:54:23] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:54:23] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:54:23] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:54:23] Energy consumed for all GPUs : 0.000894 kWh. Total GPU Power : 214.5089546613244 W
[codecarbon INFO @ 03:54:23] 0.001230 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:54:23] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 03:54:23] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:54:23] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:54:23] Energy consumed for all GPUs : 0.000917 kWh. Total GPU Power : 220.09134996871038 W
[codecarbon INFO @ 03:54:23] 0.001253 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:5

Evaluating model...




FINE-TUNING RESULTS SUMMARY FOR 80.0% DATASET:
Training Method: Full Fine-Tuning
Model: DistilBERT

Model Parameters:
Total Parameters: 66,364,418
Trainable Parameters: 66,364,418
Trainable Percentage: 100.00%

Performance Metrics:
F1 Score: 0.6094
Exact Match: 0.5302
Eval Loss: 1.2999

Energy Consumption:
Total Energy: 0.044046 kWh
CPU Energy: 0.006057 kWh (13.8%)
GPU Energy: 0.032574 kWh (74.0%)
RAM Energy: 0.005415 kWh (12.3%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 226.81 W
RAM Power: 38.00 W
Total Power: 307.31 W

Carbon Footprint:
Total CO2 Emissions: 0.020736 kg
Emissions Rate: 0.000040396 kg/s
Duration: 0.14 hours
Training Time (Trainer): 0.14 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

CPU times: user 11min 5s,

###STEP 4.1: Results and Analysis

In [21]:
# Create summary DataFrame
results_df = pd.DataFrame(results_summary)

print("\n" + "="*60)
print("FINAL RESULTS SUMMARY")
print("="*60)
print(results_df.to_string(index=False))


FINAL RESULTS SUMMARY
 training_method model_name  dataset_size%  train_samples  valid_samples  trainable_params  total_params  trainable_percentage  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count                 gpu_model  ram_total_size_gb  pue codecarbon_version
Full Fine-Tuning DistilBERT             25          32579          12134          66364418      66364418                 100.0  0.448923     0.380419   1.758805             0.061552                 0.000035      0.007816 2025-12-11T03:46:25        222.059212        0.061683             0.016602        0.002620        0.011639        0.002343   

In [22]:
results_df.to_csv("/content/drive/MyDrive/distilbert_dataset_size_results.csv", index=False)

In [23]:
# Load the dataset
full_ft_results = pd.read_csv("/content/drive/MyDrive/distilbert_dataset_size_results.csv")

print("Data loaded successfully!")
print(f"Total experiments: {len(full_ft_results)}")
print("\nExperiments:")
print(full_ft_results[['train_samples', 'dataset_size%', 'f1_score', 'emissions_kg']])


Data loaded successfully!
Total experiments: 3

Experiments:
   train_samples  dataset_size%  f1_score  emissions_kg
0          32579             25  0.448923      0.007816
1          65159             50  0.570179      0.013233
2         104255             80  0.609413      0.020736


In [24]:
# PLOT 1: Energy Consumption vs Dataset Size (Stacked Area)
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.update_layout(
    title=dict(text="Energy Consumption Scaling with Dataset Size", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_energy_scaling.html")

In [25]:
# PLOT 2: Performance & Emissions Growth (Dual Y-axis)
df_sorted = full_ft_results.sort_values('train_samples')

fig = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig.add_trace(
    go.Bar(
        x=df_sorted['dataset_size%'],
        y=df_sorted['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=True
)

fig.update_xaxes(title_text="Dataset Size (%)")
fig.update_yaxes(title_text="Performance Score", secondary_y=False)
fig.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig.update_layout(
    title=dict(text="Performance vs Carbon Emissions by Dataset Size", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_performance_emissions.html")

# Training Strategy 2: LoRA (Low-Rank Adaptation) fine-tuning (Model DistilBERT)

In [26]:
#Import PEFT for LoRA
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

## STEP 5: Creating And Training LoRA Model

In [27]:
def create_lora_model(model_name, r, lora_alpha, lora_dropout=0.1):
    # Load base model
    base_model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Configure LoRA
    lora_config = LoraConfig(
        task_type=TaskType.QUESTION_ANS,  # Task type for QA
        r=r,                               # Rank of update matrices
        lora_alpha=lora_alpha,             # Scaling factor
        lora_dropout=lora_dropout,         # Dropout probability
        target_modules=["q_lin", "v_lin"], # Which layers to apply LoRA to
        bias="none",                       # Don't train biases
        inference_mode=False,              # Training mode
    )

    # Apply LoRA to model
    lora_model = get_peft_model(base_model, lora_config)

    # Print trainable parameters
    lora_model.print_trainable_parameters()

    return lora_model

In [28]:
def train_lora_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn, size_fraction, lora_rank):

    # Create LoRA model
    print(f"\nCreating LoRA model (rank={lora_rank})...")
    lora_model = create_lora_model(
        model_name="distilbert-base-uncased",
        r=lora_rank,
        lora_alpha=lora_rank * 2,  # Common practice: alpha = 2*r
        lora_dropout=0.1
    )

    # Setup output directory
    output_dir = f"results_distilbert_lora_r{lora_rank}_{int(size_fraction*100)}pct"

    # Training arguments (can use higher learning rate for LoRA)
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=3e-4,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=2,
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=lora_model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"DistilBERT_LoRA_r{lora_rank}_{int(size_fraction*100)}pct",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print("Training LoRA model...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()

    # Get full emissions data object
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, lora_model

##STEP 6: Evaluating The LoRA Model On Different Rank Sizes

> We will be training our model on various ranks from our SQuAD dataset.
>
> Training Data Rank Variation: [4, 8, 16]

In [29]:
def evaluate_and_save_lora(trainer, train_results, emissions_data, output_dir, size_fraction, num_samples, lora_model):
    """Evaluate LoRA model and save results with detailed emissions."""
    print("Evaluating LoRA model...")
    eval_results = trainer.evaluate()

    # Count trainable parameters
    trainable_params = sum(p.numel() for p in lora_model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in lora_model.parameters())
    trainable_percentage = 100 * trainable_params / total_params

    # Extract emissions data from EmissionsData object
    result_entry = {
        "training_method": "LoRA",
        "model_name": "DistilBERT",
        "dataset_size%": int(size_fraction*100),
        "lora_rank": lora_model.peft_config['default'].r,
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data (direct access to EmissionsData attributes)
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,  # Power Usage Effectiveness
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print detailed summary
    print(f"\n{'='*80}")
    print(f"LoRA RESULTS SUMMARY (Rank {result_entry['lora_rank']})")
    print(f"{'='*80}")

    print(f"\nModel Configuration:")
    print(f"Training Method: LoRA")
    print(f"LoRA Rank: {result_entry['lora_rank']}")
    print(f"Trainable Parameters: {trainable_params:,} ({trainable_percentage:.2f}%)")
    print(f"Total Parameters: {total_params:,}")
    print(f"Dataset Size: {size_fraction*100}%")

    print(f"\nPerformance Metrics:")
    print(f"F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\nEnergy Consumption:")
    print(f"Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nAverage Power Draw:")
    print(f"CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\nCarbon Footprint:")
    print(f"Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\nLocation & Infrastructure:")
    print(f"Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"Region: {emissions_data.region}")
    print(f"On Cloud: {emissions_data.on_cloud}")
    print(f"PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\nSystem Specifications:")
    print(f"OS: {emissions_data.os}")
    print(f"CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"GPU: None detected")
    print(f"RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"Python: {emissions_data.python_version}")

    print(f"\n{'='*80}")

    # Save LoRA adapters
    lora_model.save_pretrained(f"{output_dir}/lora_adapters")
    tokenizer.save_pretrained(f"{output_dir}/lora_adapters")
    print(f"LoRA adapters saved to {output_dir}/lora_adapters")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry


In [30]:
def run_lora_experiment(size_fraction, train_data, eval_data, tokenizer, preprocess_fn, compute_metrics_fn, lora_rank):

    print(f"\n{'='*60}")
    print(f"LoRA Training with {size_fraction*100}% of training data")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train LoRA model
    trainer, train_results, emissions_data, output_dir, lora_model = train_lora_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, lora_rank
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_lora(trainer, train_results, emissions_data, output_dir, size_fraction, num_samples, lora_model)

    return result_entry

In [31]:
result_lora = []

In [32]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 1: LoRA with Rank 4")
print("="*80)

result_r4 = run_lora_experiment(
    size_fraction=0.8,  # 80% of training data
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=4
)
result_lora.append(result_r4)


EXPERIMENT 1: LoRA with Rank 4

LoRA Training with 80.0% of training data
Preprocessing 104255 training samples...

Creating LoRA model (rank=4)...


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:03:00] [setup] RAM Tracking...
[codecarbon INFO @ 04:03:00] [setup] CPU Tracking...


trainable params: 75,266 || all params: 66,439,684 || trainable%: 0.1133


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:03:01] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:03:01] [setup] GPU Tracking...
[codecarbon INFO @ 04:03:01] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:03:01] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:03:01] >>> Tracker's metadata:
[codecarbon INFO @ 04:03:01]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:03:01]   Python version: 3.12.12
[codecarbon INFO @ 04:03:01]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:03:01]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:03:01]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:03:0

Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.7295,1.489805,0.408522,0.463459
2,1.5664,1.417477,0.429784,0.492617


[codecarbon INFO @ 04:03:18] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:03:18] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:03:18] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:03:18] Energy consumed for all GPUs : 0.000801 kWh. Total GPU Power : 191.96821292558965 W
[codecarbon INFO @ 04:03:18] 0.001136 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:03:18] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:03:18] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:03:18] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:03:18] Energy consumed for all GPUs : 0.000826 kWh. Total GPU Power : 198.07502937632427 W
[codecarbon INFO @ 04:03:18] 0.001161 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:

Evaluating LoRA model...



LoRA RESULTS SUMMARY (Rank 4)

Model Configuration:
Training Method: LoRA
LoRA Rank: 4
Trainable Parameters: 75,266 (0.11%)
Total Parameters: 66,439,684
Dataset Size: 80.0%

Performance Metrics:
F1 Score: 0.4926
Exact Match: 0.4298
Eval Loss: 1.4175

Energy Consumption:
Total Energy: 0.039610 kWh
CPU Energy: 0.006026 kWh (15.2%)
GPU Energy: 0.028198 kWh (71.2%)
RAM Energy: 0.005387 kWh (13.6%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 195.46 W
RAM Power: 38.00 W
Total Power: 275.96 W

Carbon Footprint:
Total CO2 Emissions: 0.018648 kg
Emissions Rate: 0.000036517 kg/s
Duration: 0.14 hours
Training Time (Trainer): 0.14 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

LoRA adapters saved to results_distilbert_lora_r4_80pct/lora_ad

In [33]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 2: LoRA with Rank 8")
print("="*80)

result_r8 = run_lora_experiment(
    size_fraction=0.8,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=8
)
result_lora.append(result_r8)


EXPERIMENT 2: LoRA with Rank 8

LoRA Training with 80.0% of training data
Preprocessing 104255 training samples...

Creating LoRA model (rank=8)...


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:11:49] [setup] RAM Tracking...
[codecarbon INFO @ 04:11:49] [setup] CPU Tracking...


trainable params: 148,994 || all params: 66,513,412 || trainable%: 0.2240


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:11:50] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:11:50] [setup] GPU Tracking...
[codecarbon INFO @ 04:11:50] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:11:50] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:11:50] >>> Tracker's metadata:
[codecarbon INFO @ 04:11:50]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:11:50]   Python version: 3.12.12
[codecarbon INFO @ 04:11:50]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:11:50]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:11:50]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:11:5

Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.6428,1.428119,0.420636,0.482259
2,1.4575,1.371605,0.443135,0.51334


[codecarbon INFO @ 04:12:07] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:12:07] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:12:07] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:12:07] Energy consumed for all GPUs : 0.000785 kWh. Total GPU Power : 188.33203605345582 W
[codecarbon INFO @ 04:12:07] 0.001121 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:12:07] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:12:07] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:12:07] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:12:07] Energy consumed for all GPUs : 0.000803 kWh. Total GPU Power : 192.6033199157516 W
[codecarbon INFO @ 04:12:07] 0.001138 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:1

Evaluating LoRA model...



LoRA RESULTS SUMMARY (Rank 8)

Model Configuration:
Training Method: LoRA
LoRA Rank: 8
Trainable Parameters: 148,994 (0.22%)
Total Parameters: 66,513,412
Dataset Size: 80.0%

Performance Metrics:
F1 Score: 0.5133
Exact Match: 0.4431
Eval Loss: 1.3716

Energy Consumption:
Total Energy: 0.039636 kWh
CPU Energy: 0.006085 kWh (15.4%)
GPU Energy: 0.028111 kWh (70.9%)
RAM Energy: 0.005440 kWh (13.7%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 195.44 W
RAM Power: 38.00 W
Total Power: 275.94 W

Carbon Footprint:
Total CO2 Emissions: 0.018660 kg
Emissions Rate: 0.000036185 kg/s
Duration: 0.14 hours
Training Time (Trainer): 0.14 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

LoRA adapters saved to results_distilbert_lora_r8_80pct/lora_a

In [34]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 3: LoRA with Rank 16")
print("="*80)

result_r16 = run_lora_experiment(
    size_fraction=0.8,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=16
)
result_lora.append(result_r16)


EXPERIMENT 3: LoRA with Rank 16

LoRA Training with 80.0% of training data
Preprocessing 104255 training samples...

Creating LoRA model (rank=16)...


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:20:42] [setup] RAM Tracking...
[codecarbon INFO @ 04:20:42] [setup] CPU Tracking...


trainable params: 296,450 || all params: 66,660,868 || trainable%: 0.4447


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:20:44] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:20:44] [setup] GPU Tracking...
[codecarbon INFO @ 04:20:44] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:20:44] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:20:44] >>> Tracker's metadata:
[codecarbon INFO @ 04:20:44]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:20:44]   Python version: 3.12.12
[codecarbon INFO @ 04:20:44]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:20:44]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:20:44]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:20:4

Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.5596,1.364956,0.45014,0.516906
2,1.3702,1.309308,0.473957,0.544032


[codecarbon INFO @ 04:21:00] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:21:00] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:21:00] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:21:00] Energy consumed for all GPUs : 0.000792 kWh. Total GPU Power : 189.95579271230116 W
[codecarbon INFO @ 04:21:00] 0.001127 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:21:01] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:21:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:21:01] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:21:01] Energy consumed for all GPUs : 0.000817 kWh. Total GPU Power : 195.88843608355364 W
[codecarbon INFO @ 04:21:01] 0.001152 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:

Evaluating LoRA model...



LoRA RESULTS SUMMARY (Rank 16)

Model Configuration:
Training Method: LoRA
LoRA Rank: 16
Trainable Parameters: 296,450 (0.44%)
Total Parameters: 66,660,868
Dataset Size: 80.0%

Performance Metrics:
F1 Score: 0.5440
Exact Match: 0.4740
Eval Loss: 1.3093

Energy Consumption:
Total Energy: 0.039658 kWh
CPU Energy: 0.006106 kWh (15.4%)
GPU Energy: 0.028094 kWh (70.8%)
RAM Energy: 0.005459 kWh (13.8%)

Average Power Draw:
CPU Power: 42.50 W
GPU Power: 194.81 W
RAM Power: 38.00 W
Total Power: 275.31 W

Carbon Footprint:
Total CO2 Emissions: 0.018670 kg
Emissions Rate: 0.000036082 kg/s
Duration: 0.14 hours
Training Time (Trainer): 0.14 hours

Location & Infrastructure:
Country: Singapore (SGP)
Region: 
On Cloud: N
PUE (Power Usage Effectiveness): 1.0

System Specifications:
OS: Linux-6.6.105+-x86_64-with-glibc2.35
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
GPU: 1 x NVIDIA A100-SXM4-40GB (Count: 1)
RAM: 83.47 GB
Python: 3.12.12

LoRA adapters saved to results_distilbert_lora_r16_80pct/lor

###STEP 6.1: Results and Analysis

In [35]:
results_df_lora = pd.DataFrame(result_lora)
print("\n" + "="*60)
print("LoRA RESULTS SUMMARY")
print("="*60)
print(results_df_lora.to_string(index=False))

# Save to CSV
results_df_lora.to_csv("/content/drive/MyDrive/bert_lora_results.csv", index=False)


LoRA RESULTS SUMMARY
training_method model_name  dataset_size%  lora_rank  train_samples  valid_samples  trainable_params  total_params  trainable_percentage  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count                 gpu_model  ram_total_size_gb  pue codecarbon_version
           LoRA DistilBERT             80          4         104255          12134             75266      66439684              0.113285  0.492617     0.429784   1.417477             0.141688                 0.000037      0.018648 2025-12-11T04:11:34        510.656511        0.141849             0.039610        0.006026        0.028198

In [36]:
print("\n" + "="*80)
print("LoRA RANK COMPARISON")
print("="*80)
print(results_df_lora[['lora_rank', 'trainable_params', 'trainable_percentage', 'f1_score', 'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))


LoRA RANK COMPARISON
 lora_rank  trainable_params  trainable_percentage  f1_score  exact_match  emissions_kg  training_time_hours
         4             75266              0.113285  0.492617     0.429784      0.018648             0.141688
         8            148994              0.224006  0.513340     0.443135      0.018660             0.143082
        16            296450              0.444714  0.544032     0.473957      0.018670             0.143574


In [37]:
# Compare efficiency vs performance
print("\n" + "="*80)
print("EFFICIENCY ANALYSIS")
print("="*80)

baseline = results_df_lora[results_df_lora['lora_rank'] == 8].iloc[0]  # Use rank 8 as baseline

for _, row in results_df_lora.iterrows():
    rank = row['lora_rank']
    params_ratio = row['trainable_params'] / baseline['trainable_params']
    f1_diff = row['f1_score'] - baseline['f1_score']
    emissions_diff = row['emissions_kg'] - baseline['emissions_kg']

    print(f"\nLoRA Rank {rank}:")
    print(f"Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.2f}%)")
    print(f"vs Rank 8: {params_ratio:.2f}x parameters")
    print(f"F1 Score: {row['f1_score']:.4f} ({f1_diff:+.4f} vs Rank 8)")
    print(f"Emissions: {row['emissions_kg']:.6f} kg ({emissions_diff:+.6f} vs Rank 8)")
    print(f"Training Time: {row['training_time_hours']:.2f} hours")

    # Efficiency metric: F1 per kg CO2
    efficiency = row['f1_score'] / row['emissions_kg']
    print(f"Efficiency (F1/kg CO2): {efficiency:.2f}")


EFFICIENCY ANALYSIS

LoRA Rank 4:
Trainable Params: 75,266 (0.11%)
vs Rank 8: 0.51x parameters
F1 Score: 0.4926 (-0.0207 vs Rank 8)
Emissions: 0.018648 kg (-0.000012 vs Rank 8)
Training Time: 0.14 hours
Efficiency (F1/kg CO2): 26.42

LoRA Rank 8:
Trainable Params: 148,994 (0.22%)
vs Rank 8: 1.00x parameters
F1 Score: 0.5133 (+0.0000 vs Rank 8)
Emissions: 0.018660 kg (+0.000000 vs Rank 8)
Training Time: 0.14 hours
Efficiency (F1/kg CO2): 27.51

LoRA Rank 16:
Trainable Params: 296,450 (0.44%)
vs Rank 8: 1.99x parameters
F1 Score: 0.5440 (+0.0307 vs Rank 8)
Emissions: 0.018670 kg (+0.000011 vs Rank 8)
Training Time: 0.14 hours
Efficiency (F1/kg CO2): 29.14


In [38]:
# PLOT 1: LoRA Energy Consumption by Rank
df_sorted_lora = results_df_lora.sort_values('lora_rank')

fig_lora_energy = go.Figure()

fig_lora_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_lora['lora_rank'],
    y=df_sorted_lora['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Rank: %{x}<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_lora['lora_rank'],
    y=df_sorted_lora['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Rank: %{x}<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_lora['lora_rank'],
    y=df_sorted_lora['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Rank: %{x}<extra></extra>'
))

fig_lora_energy.update_layout(
    title=dict(text="LoRA: Energy Consumption by Rank", font=dict(size=18)),
    xaxis_title='LoRA Rank',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_lora_energy.show()
fig_lora_energy.write_html("/content/drive/MyDrive/lora_energy_by_rank.html")

In [39]:
# PLOT 2: LoRA Performance & Emissions by Rank (Dual Y-axis)
df_sorted_lora = results_df_lora.sort_values('lora_rank')

fig_lora_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['lora_rank'],
        y=df_sorted_lora['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Rank: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['lora_rank'],
        y=df_sorted_lora['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Rank: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_lora_perf.add_trace(
    go.Bar(
        x=df_sorted_lora['lora_rank'],
        y=df_sorted_lora['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Rank: %{x}<extra></extra>'
    ),
    secondary_y=True
)

fig_lora_perf.update_xaxes(title_text="LoRA Rank")
fig_lora_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_lora_perf.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig_lora_perf.update_layout(
    title=dict(text="LoRA: Performance vs Carbon Emissions by Rank", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_lora_perf.show()
fig_lora_perf.write_html("/content/drive/MyDrive/lora_performance_emissions.html")

# Training Strategy 3: Few-shot Learning With Frozen Backbone

## STEP 7: Creating And Training Few-shot Model

In [40]:
def create_frozen_model(model_name="distilbert-base-uncased"):
    #Create model with frozen backbone (only QA head is trainable).
    # Load base model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Freeze ALL parameters first
    for param in model.parameters():
        param.requires_grad = False

    # Unfreeze ONLY the QA head (classifier layer)
    # For DistilBERT: qa_outputs layer
    for param in model.qa_outputs.parameters():
        param.requires_grad = True

    # Count parameters
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())

    print(f"\nModel Configuration:")
    print(f"Total Parameters: {total_params:,}")
    print(f"Trainable Parameters: {trainable_params:,}")
    print(f"Frozen Parameters: {total_params - trainable_params:,}")
    print(f"Trainable Percentage: {100 * trainable_params / total_params:.4f}%")

    return model, trainable_params, total_params


In [41]:
def prepare_fewshot_dataset(train_data, num_shots, preprocess_fn):
    # Select only num_shots examples
    train_subset = train_data.select(range(num_shots))

    print(f"Creating few-shot dataset with {num_shots} examples...")
    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    # After tokenization with sliding window, we get more samples
    actual_samples = len(tokenized_train)
    print(f"Original examples: {num_shots}")
    print(f"After tokenization (with sliding window): {actual_samples} samples")

    return tokenized_train, num_shots  # Return original num_shots for tracking


In [42]:
def train_fewshot_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn, num_shots, model_name="distilbert-base-uncased"):
    # Create frozen model
    model, trainable_params, total_params = create_frozen_model(model_name)

    # Setup output directory
    output_dir = f"results_distilbert_fewshot_{num_shots}shots"

    # Training arguments - DIFFERENT from full fine-tuning
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=5e-4,  # Higher LR since we're only training the head
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=10,  # More epochs for few-shot
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=50,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"DistilBERT_FewShot_{num_shots}shots",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print(f"\nTraining few-shot model ({num_shots} examples)...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params


## STEP 8: Evaluating The Few-shot Model On Different Sample Sizes

>We will be training our model on various sample from our SQuAD dataset.
>
>Training Few-shot Variation: [100, 500, 1000]

In [43]:
def evaluate_and_save_fewshot(trainer, train_results, emissions_data, output_dir, num_shots, trainable_params, total_params):
    print("Evaluating few-shot model...")
    eval_results = trainer.evaluate()

    trainable_percentage = 100 * trainable_params / total_params

    # Compile results
    result_entry = {
        "training_method": "Few-Shot (Frozen Backbone)",
        "model_name": "DistilBERT",
        "num_shots": num_shots,
        "train_samples": num_shots,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print summary
    print(f"\n{'='*80}")
    print(f"FEW-SHOT LEARNING RESULTS ({num_shots} examples)")
    print(f"{'='*80}")
    print(f"\nModel Configuration:")
    print(f"Training Method: Few-Shot (Frozen Backbone)")
    print(f"Training Examples: {num_shots}")
    print(f"Trainable Parameters: {trainable_params:,} ({trainable_percentage:.4f}%)")
    print(f"Frozen Parameters: {total_params - trainable_params:,}")

    print(f"\nPerformance:")
    print(f"F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\nEnergy:")
    print(f"Total: {emissions_data.energy_consumed:.6f} kWh")
    if emissions_data.energy_consumed > 0:
        print(f"GPU: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
        print(f"CPU: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nCarbon:")
    print(f"CO₂ Emissions: {emissions_data.emissions:.6f} kg")
    print(f"Training Time: {train_results.metrics['train_runtime']/3600:.2f} hours")
    print(f"{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")
    print(f"Model saved to {output_dir}/final_model")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry


In [44]:
def run_fewshot_experiment(num_shots, train_data, eval_data, tokenizer, preprocess_fn, compute_metrics_fn, model_name="distilbert-base-uncased"):

    print(f"\n{'='*60}")
    print(f"Few-Shot Learning with {num_shots} examples")
    print(f"{'='*60}")

    # Step 1: Prepare few-shot dataset
    tokenized_train, num_shots = prepare_fewshot_dataset(train_data, num_shots, preprocess_fn)

    # Step 2: Train with frozen backbone
    trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params = train_fewshot_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        num_shots, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_fewshot(
        trainer, train_results, emissions_data, output_dir,
        num_shots, trainable_params, total_params
    )

    return result_entry

In [45]:
result_fewshot = []

In [46]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 1: 100-shot Learning")
print("="*80)

result_100 = run_fewshot_experiment(
    num_shots=100,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="distilbert-base-uncased"
)
result_fewshot.append(result_100)


EXPERIMENT 1: 100-shot Learning

Few-Shot Learning with 100 examples
Creating few-shot dataset with 100 examples...


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Original examples: 100
After tokenization (with sliding window): 100 samples


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:29:41] [setup] RAM Tracking...
[codecarbon INFO @ 04:29:41] [setup] CPU Tracking...



Model Configuration:
Total Parameters: 66,364,418
Trainable Parameters: 1,538
Frozen Parameters: 66,362,880
Trainable Percentage: 0.0023%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:29:42] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:29:42] [setup] GPU Tracking...
[codecarbon INFO @ 04:29:42] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:29:42] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:29:42] >>> Tracker's metadata:
[codecarbon INFO @ 04:29:42]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:29:42]   Python version: 3.12.12
[codecarbon INFO @ 04:29:42]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:29:42]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:29:42]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:29:4


Training few-shot model (100 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.960748,0.000165,0.006065
2,No log,5.925188,0.000165,0.006769
3,No log,5.897899,0.000165,0.006562
4,No log,5.877831,0.000247,0.00762
5,No log,5.861616,0.000165,0.007852
6,No log,5.847296,0.00033,0.008439
7,No log,5.837134,0.00033,0.008846
8,5.605100,5.830018,0.000412,0.009092
9,5.605100,5.82541,0.000412,0.009039
10,5.605100,5.823724,0.000412,0.009082


[codecarbon INFO @ 04:29:59] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:29:59] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:29:59] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:29:59] Energy consumed for all GPUs : 0.000674 kWh. Total GPU Power : 161.63906746160603 W
[codecarbon INFO @ 04:29:59] 0.001010 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:29:59] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:29:59] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:29:59] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:29:59] Energy consumed for all GPUs : 0.000693 kWh. Total GPU Power : 166.156729761393 W
[codecarbon INFO @ 04:30:00] 0.001028 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:30

Evaluating few-shot model...



FEW-SHOT LEARNING RESULTS (100 examples)

Model Configuration:
Training Method: Few-Shot (Frozen Backbone)
Training Examples: 100
Trainable Parameters: 1,538 (0.0023%)
Frozen Parameters: 66,362,880

Performance:
F1 Score: 0.0091
Exact Match: 0.0004
Eval Loss: 5.8300

Energy:
Total: 0.008910 kWh
GPU: 0.006071 kWh (68.1%)
CPU: 0.001499 kWh (16.8%)

Carbon:
CO₂ Emissions: 0.004195 kg
Training Time: 0.04 hours
Model saved to results_distilbert_fewshot_100shots/final_model
CPU times: user 2min 15s, sys: 3.88 s, total: 2min 19s
Wall time: 2min 22s


In [47]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 2: 500-shot Learning")
print("="*80)

result_500 = run_fewshot_experiment(
    num_shots=500,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="distilbert-base-uncased"
)
result_fewshot.append(result_500)


EXPERIMENT 2: 500-shot Learning

Few-Shot Learning with 500 examples
Creating few-shot dataset with 500 examples...


Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Original examples: 500
After tokenization (with sliding window): 527 samples


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:32:04] [setup] RAM Tracking...
[codecarbon INFO @ 04:32:04] [setup] CPU Tracking...



Model Configuration:
Total Parameters: 66,364,418
Trainable Parameters: 1,538
Frozen Parameters: 66,362,880
Trainable Percentage: 0.0023%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:32:05] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:32:05] [setup] GPU Tracking...
[codecarbon INFO @ 04:32:05] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:32:05] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:32:05] >>> Tracker's metadata:
[codecarbon INFO @ 04:32:05]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:32:05]   Python version: 3.12.12
[codecarbon INFO @ 04:32:05]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:32:05]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:32:05]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:32:0


Training few-shot model (500 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.600938,0.000247,0.012893
2,5.614300,5.36577,0.000824,0.013981
3,5.614300,5.196386,0.000989,0.014842
4,5.119000,5.050659,0.001731,0.015044
5,4.839400,4.921439,0.002225,0.015549
6,4.839400,4.84842,0.00272,0.015826
7,4.660800,4.767348,0.005357,0.017459
8,4.575200,4.723065,0.00684,0.018732
9,4.575200,4.696511,0.007747,0.019596
10,4.522800,4.687153,0.008241,0.019856


[codecarbon INFO @ 04:32:22] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:32:22] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:32:22] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:32:22] Energy consumed for all GPUs : 0.000684 kWh. Total GPU Power : 163.99161468134258 W
[codecarbon INFO @ 04:32:22] 0.001019 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:32:23] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:32:23] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:32:23] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:32:23] Energy consumed for all GPUs : 0.000702 kWh. Total GPU Power : 168.43396504470567 W
[codecarbon INFO @ 04:32:23] 0.001038 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:

Evaluating few-shot model...



FEW-SHOT LEARNING RESULTS (500 examples)

Model Configuration:
Training Method: Few-Shot (Frozen Backbone)
Training Examples: 500
Trainable Parameters: 1,538 (0.0023%)
Frozen Parameters: 66,362,880

Performance:
F1 Score: 0.0199
Exact Match: 0.0082
Eval Loss: 4.6872

Energy:
Total: 0.009242 kWh
GPU: 0.006320 kWh (68.4%)
CPU: 0.001543 kWh (16.7%)

Carbon:
CO₂ Emissions: 0.004351 kg
Training Time: 0.04 hours
Model saved to results_distilbert_fewshot_500shots/final_model
CPU times: user 2min 20s, sys: 3.99 s, total: 2min 24s
Wall time: 2min 26s


In [48]:
%%time
print("\n" + "="*80)
print("EXPERIMENT 3: 1000-shot Learning")
print("="*80)

result_1000 = run_fewshot_experiment(
    num_shots=1000,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="distilbert-base-uncased"
)
result_fewshot.append(result_1000)


EXPERIMENT 3: 1000-shot Learning

Few-Shot Learning with 1000 examples
Creating few-shot dataset with 1000 examples...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Original examples: 1000
After tokenization (with sliding window): 1027 samples


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:34:31] [setup] RAM Tracking...
[codecarbon INFO @ 04:34:31] [setup] CPU Tracking...



Model Configuration:
Total Parameters: 66,364,418
Trainable Parameters: 1,538
Frozen Parameters: 66,362,880
Trainable Percentage: 0.0023%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:34:32] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:34:32] [setup] GPU Tracking...
[codecarbon INFO @ 04:34:32] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:34:32] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:34:32] >>> Tracker's metadata:
[codecarbon INFO @ 04:34:32]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:34:32]   Python version: 3.12.12
[codecarbon INFO @ 04:34:32]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:34:32]   Available RAM : 83.474 GB
[codecarbon INFO @ 04:34:32]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:34:3


Training few-shot model (1000 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,5.6405,5.487395,0.001071,0.014791
2,5.1223,5.161344,0.002225,0.016665
3,4.808,4.953194,0.003626,0.018421
4,4.4793,4.815657,0.004533,0.019417
5,4.4013,4.72191,0.005522,0.020926
6,4.2976,4.657375,0.005934,0.021714
7,4.226,4.601258,0.006099,0.021734
8,4.1701,4.566637,0.006346,0.022027
9,4.1827,4.545423,0.006428,0.022245
10,4.1202,4.540926,0.006428,0.022166


[codecarbon INFO @ 04:34:49] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:34:49] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:34:49] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:34:49] Energy consumed for all GPUs : 0.000684 kWh. Total GPU Power : 163.97668728444586 W
[codecarbon INFO @ 04:34:49] 0.001019 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:34:49] Energy consumed for RAM : 0.000158 kWh. RAM Power : 38.0 W
[codecarbon INFO @ 04:34:49] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:34:49] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:34:49] Energy consumed for all GPUs : 0.000695 kWh. Total GPU Power : 166.62317409103653 W
[codecarbon INFO @ 04:34:49] 0.001030 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:

Evaluating few-shot model...



FEW-SHOT LEARNING RESULTS (1000 examples)

Model Configuration:
Training Method: Few-Shot (Frozen Backbone)
Training Examples: 1000
Trainable Parameters: 1,538 (0.0023%)
Frozen Parameters: 66,362,880

Performance:
F1 Score: 0.0222
Exact Match: 0.0064
Eval Loss: 4.5454

Energy:
Total: 0.009578 kWh
GPU: 0.006511 kWh (68.0%)
CPU: 0.001620 kWh (16.9%)

Carbon:
CO₂ Emissions: 0.004509 kg
Training Time: 0.04 hours
Model saved to results_distilbert_fewshot_1000shots/final_model
CPU times: user 2min 27s, sys: 3.79 s, total: 2min 31s
Wall time: 2min 33s


### STEP 8.1: Results and Analysis

In [49]:
results_df_fewshot = pd.DataFrame(result_fewshot)
print("\n" + "="*60)
print("FEW-SHOT LEARNING RESULTS SUMMARY")
print("="*60)
print(results_df_fewshot[['num_shots', 'trainable_percentage', 'f1_score', 'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Save to CSV
results_df_fewshot.to_csv("/content/drive/MyDrive/distilbert_fewshot_results.csv", index=False)



FEW-SHOT LEARNING RESULTS SUMMARY
 num_shots  trainable_percentage  f1_score  exact_match  emissions_kg  training_time_hours
       100              0.002318  0.009092     0.000412      0.004195             0.035131
       500              0.002318  0.019856     0.008241      0.004351             0.036160
      1000              0.002318  0.022245     0.006428      0.004509             0.037972


In [50]:
# FEW-SHOT EFFICIENCY ANALYSIS

print("\n" + "="*80)
print("FEW-SHOT EFFICIENCY ANALYSIS")
print("="*80)

# Use 500-shot as baseline (middle ground)
baseline = results_df_fewshot[results_df_fewshot['num_shots'] == 500].iloc[0]

for _, row in results_df_fewshot.iterrows():
    shots = row['num_shots']
    samples_ratio = row['num_shots'] / baseline['num_shots']
    f1_diff = row['f1_score'] - baseline['f1_score']
    emissions_diff = row['emissions_kg'] - baseline['emissions_kg']
    time_diff = row['training_time_hours'] - baseline['training_time_hours']

    print(f"\n{shots}-Shot Learning:")
    print(f"Training Examples: {row['num_shots']:,}")
    print(f"Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.4f}%)")
    print(f"vs 500-shot: {samples_ratio:.2f}x training data")
    print(f"F1 Score: {row['f1_score']:.4f} ({f1_diff:+.4f} vs 500-shot)")
    print(f"Emissions: {row['emissions_kg']:.6f} kg ({emissions_diff:+.6f} vs 500-shot)")
    print(f"Training Time: {row['training_time_hours']:.2f} hours ({time_diff:+.2f} vs 500-shot)")

    # Efficiency metrics
    efficiency_co2 = row['f1_score'] / row['emissions_kg'] if row['emissions_kg'] > 0 else 0
    efficiency_time = row['f1_score'] / row['training_time_hours'] if row['training_time_hours'] > 0 else 0
    efficiency_samples = row['f1_score'] / row['num_shots'] if row['num_shots'] > 0 else 0

    print(f"Efficiency (F1/kg CO₂): {efficiency_co2:.2f}")
    print(f"Efficiency (F1/hour): {efficiency_time:.4f}")
    print(f"Efficiency (F1/sample): {efficiency_samples:.6f}")




FEW-SHOT EFFICIENCY ANALYSIS

100-Shot Learning:
Training Examples: 100
Trainable Params: 1,538 (0.0023%)
vs 500-shot: 0.20x training data
F1 Score: 0.0091 (-0.0108 vs 500-shot)
Emissions: 0.004195 kg (-0.000156 vs 500-shot)
Training Time: 0.04 hours (-0.00 vs 500-shot)
Efficiency (F1/kg CO₂): 2.17
Efficiency (F1/hour): 0.2588
Efficiency (F1/sample): 0.000091

500-Shot Learning:
Training Examples: 500
Trainable Params: 1,538 (0.0023%)
vs 500-shot: 1.00x training data
F1 Score: 0.0199 (+0.0000 vs 500-shot)
Emissions: 0.004351 kg (+0.000000 vs 500-shot)
Training Time: 0.04 hours (+0.00 vs 500-shot)
Efficiency (F1/kg CO₂): 4.56
Efficiency (F1/hour): 0.5491
Efficiency (F1/sample): 0.000040

1000-Shot Learning:
Training Examples: 1,000
Trainable Params: 1,538 (0.0023%)
vs 500-shot: 2.00x training data
F1 Score: 0.0222 (+0.0024 vs 500-shot)
Emissions: 0.004509 kg (+0.000158 vs 500-shot)
Training Time: 0.04 hours (+0.00 vs 500-shot)
Efficiency (F1/kg CO₂): 4.93
Efficiency (F1/hour): 0.5858
E

In [51]:
# PLOT 1: Few-Shot Energy Consumption by Shots
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_energy = go.Figure()

fig_fewshot_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.update_layout(
    title=dict(text="Few-Shot: Energy Consumption by Number of Examples", font=dict(size=18)),
    xaxis_title='Number of Training Examples',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_fewshot_energy.show()
fig_fewshot_energy.write_html("/content/drive/MyDrive/fewshot_energy_by_shots.html")

In [52]:
# PLOT 2: Few-Shot Performance & Emissions by Shots (Dual Y-axis)
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_fewshot_perf.add_trace(
    go.Bar(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=True
)

fig_fewshot_perf.update_xaxes(title_text="Number of Training Examples")
fig_fewshot_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_fewshot_perf.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig_fewshot_perf.update_layout(
    title=dict(text="Few-Shot: Performance vs Carbon Emissions", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_fewshot_perf.show()
fig_fewshot_perf.write_html("/content/drive/MyDrive/fewshot_performance_emissions.html")

#Comparing And Testing All The Models

In [68]:
def test_model(model_path, examples, tokenizer_name="distilbert-base-uncased"):

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

    # Auto-detect model type
    is_lora = "lora_adapters" in model_path or "lora" in model_path.lower()

    if is_lora:
        method = "LoRA"
        # Load base model + LoRA adapters
        base_model = AutoModelForQuestionAnswering.from_pretrained(tokenizer_name)
        model = PeftModel.from_pretrained(base_model, model_path)

    else:
        # Detect if few-shot or full fine-tuning
        method = "Few-Shot" if "fewshot" in model_path.lower() else "Full Fine-tuning"
        model = AutoModelForQuestionAnswering.from_pretrained(model_path)


    # Create pipeline
    qa_pipeline = pipeline(
        "question-answering",
        model=model,
        tokenizer=tokenizer,
        device=0 if torch.cuda.is_available() else -1
    )

    # Test all examples
    results = []
    for i, ex in enumerate(examples, 1):
        # Get prediction
        prediction = qa_pipeline(question=ex['question'], context=ex['context'])

        # Store result
        result = {
            'example_num': i,
            'method': method,
            'question': ex['question'],
            'context': ex['context'][:100] + "..." if len(ex['context']) > 100 else ex['context'],
            'predicted_answer': prediction['answer'],
            'expected_answer': ex.get('expected_answer', None),
            'confidence': prediction['score'],
            'start_position': prediction['start'],
            'end_position': prediction['end']
        }
        results.append(result)

        # Print formatted output
        print(f"\nExample {i}")
        print(f"Question: {ex['question']}")
        print(f"Context: {ex['context'][:150]}{'...' if len(ex['context']) > 150 else ''}")
        print(f"\nPredicted: '{prediction['answer']}'")
        print(f"   Confidence: {prediction['score']:.2%}")

        # Check match with expected answer
        if ex.get('expected_answer'):
            expected = ex['expected_answer'].lower().strip()
            predicted = prediction['answer'].lower().strip()
            # Flexible matching: either one contains the other
            match = (predicted in expected) or (expected in predicted)
            print(f"   Expected: '{ex['expected_answer']}'")
            print(f"   Match: {'YES' if match else 'NO'}")

        print("-" * 80)

    return results

In [69]:
test_examples = [
     {
        'question': "What is the capital of France?",
        'context': "Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.",
        'expected_answer': "Paris"
    },
    {
        'question': "What does Google Colab provide access to?",
        'context': "Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.",
        'expected_answer': "GPUs and TPUs"
    },
    {
        'question': "When was Python created?",
        'context': "Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.",
        'expected_answer': "1991"
    },
    {
        'question': "Who invented the telephone?",
        'context': "The telephone was invented by Alexander Graham Bell in 1876. He made the first successful telephone call on March 10, 1876.",
        'expected_answer': "Alexander Graham Bell"
    },
]


In [70]:
print(os.listdir('/content/'))

['.config', 'results_distilbert_lora_r4_80pct', 'drive', 'results_distilbert_80pct', 'wandb', 'results_distilbert_fewshot_100shots', 'results_distilbert_fewshot_1000shots', 'results_distilbert_lora_r8_80pct', 'results_distilbert_50pct', 'results_distilbert_lora_r16_80pct', 'results_distilbert_fewshot_500shots', 'results_distilbert_25pct', 'sample_data']


In [71]:
# Test Full Fine-tuning
print("=" * 40)
print("TESTING FULL FINE-TUNING MODEL")
print("=" * 40)
results_full_ft = test_model(
    model_path="results_distilbert_80pct/final_model",
    examples=test_examples
)

TESTING FULL FINE-TUNING MODEL


Device set to use cuda:0



Example 1
Question: What is the capital of France?
Context: Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.

Predicted: 'Paris'
   Confidence: 99.18%
   Expected: 'Paris'
   Match: YES
--------------------------------------------------------------------------------

Example 2
Question: What does Google Colab provide access to?
Context: Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.

Predicted: 'GPUs and TPUs'
   Confidence: 73.24%
   Expected: 'GPUs and TPUs'
   Match: YES
--------------------------------------------------------------------------------

Example 3
Question: When was Python created?
Context: Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.

Predicted: '1991'
   Confidence: 87.71%
   Expected: '1991'
   Match: YES
--------------------------------------------

In [72]:
# Test LoRA
print("\n" + "=" * 40)
print("TESTING LORA MODEL")
print("=" * 40)
results_lora = test_model(
    model_path="results_distilbert_lora_r16_80pct/lora_adapters",
    examples=test_examples
)



TESTING LORA MODEL


Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0



Example 1
Question: What is the capital of France?
Context: Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.

Predicted: 'Paris'
   Confidence: 75.19%
   Expected: 'Paris'
   Match: YES
--------------------------------------------------------------------------------

Example 2
Question: What does Google Colab provide access to?
Context: Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.

Predicted: 'GPUs and TPUs'
   Confidence: 38.04%
   Expected: 'GPUs and TPUs'
   Match: YES
--------------------------------------------------------------------------------

Example 3
Question: When was Python created?
Context: Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.

Predicted: '1991'
   Confidence: 78.01%
   Expected: '1991'
   Match: YES
--------------------------------------------

In [73]:
# Test Few-Shot
print("\n" + "=" * 40)
print("TESTING FEW-SHOT MODEL")
print("=" * 40)
test_results_fewshot = test_model(
    model_path="results_distilbert_fewshot_1000shots/final_model",
    examples=test_examples
)



TESTING FEW-SHOT MODEL


Device set to use cuda:0



Example 1
Question: What is the capital of France?
Context: Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.

Predicted: 'Paris'
   Confidence: 4.59%
   Expected: 'Paris'
   Match: YES
--------------------------------------------------------------------------------

Example 2
Question: What does Google Colab provide access to?
Context: Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.

Predicted: 'Google Colab provides free access to GPUs'
   Confidence: 3.72%
   Expected: 'GPUs and TPUs'
   Match: NO
--------------------------------------------------------------------------------

Example 3
Question: When was Python created?
Context: Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.

Predicted: 'van Rossum'
   Confidence: 3.33%
   Expected: '1991'
   Match: NO
---------------

In [74]:
all_results = {
    "Full FT (80%)": results_full_ft,
    "LoRA (r=16, 80%)": results_lora,
    "Few-Shot (1000)": test_results_fewshot
}


##Comparison Function

In [75]:
def compare_models(results_dict):
    print("\n" + "="*80)
    print("MODEL COMPARISON")
    print("="*80)

    comparison_data = []

    for model_name, results in results_dict.items():
        for result in results:
            comparison_data.append({
                'Model': model_name,
                'Question': result['question'][:50] + "...",
                'Predicted': result['predicted_answer'],
                'Expected': result.get('expected_answer', 'N/A'),
                'Confidence': result['confidence']
            })

    comparison_df = pd.DataFrame(comparison_data)

    # Group by question to see how different models answer
    for question in comparison_df['Question'].unique():
        print(f"\nQUESTION: {question}")
        question_results = comparison_df[comparison_df['Question'] == question]
        for _, row in question_results.iterrows():
            exp = str(row['Expected']).lower()
            pred = str(row['Predicted']).lower()
            match_indicator = "✓" if (pred in exp or exp in pred) and exp != 'n/a' else "✗"
            print(f"  {match_indicator} {row['Model']:20s}: {row['Predicted']:40s} ({row['Confidence']:.1%})")
        expected_val = question_results.iloc[0]['Expected']
        if expected_val != 'N/A':
            print(f"Expected ANSWER: {expected_val}")

    return comparison_df

In [76]:
comparison_df = compare_models(all_results)


MODEL COMPARISON

QUESTION: What is the capital of France?...
  ✓ Full FT (80%)       : Paris                                    (99.2%)
  ✓ LoRA (r=16, 80%)    : Paris                                    (75.2%)
  ✓ Few-Shot (1000)     : Paris                                    (4.6%)
Expected ANSWER: Paris

QUESTION: What does Google Colab provide access to?...
  ✓ Full FT (80%)       : GPUs and TPUs                            (73.2%)
  ✓ LoRA (r=16, 80%)    : GPUs and TPUs                            (38.0%)
  ✗ Few-Shot (1000)     : Google Colab provides free access to GPUs (3.7%)
Expected ANSWER: GPUs and TPUs

QUESTION: When was Python created?...
  ✓ Full FT (80%)       : 1991                                     (87.7%)
  ✓ LoRA (r=16, 80%)    : 1991                                     (78.0%)
  ✗ Few-Shot (1000)     : van Rossum                               (3.3%)
Expected ANSWER: 1991

QUESTION: Who invented the telephone?...
  ✓ Full FT (80%)       : Alexander Graham Bell    

In [77]:
# Save comparison results
comparison_df.to_csv("/content/drive/MyDrive/model_comparison.csv", index=False)

In [78]:
print("\n" + "="*80)
print("COMPARISON: FULL FT vs LoRA vs FEW-SHOT")
print("="*80)

# Load previous results
full_ft_results = pd.read_csv("/content/drive/MyDrive/distilbert_dataset_size_results.csv")
lora_results = pd.read_csv("/content/drive/MyDrive/distilbert_lora_results.csv")
results_fewshot = pd.read_csv("/content/drive/MyDrive/distilbert_fewshot_results.csv")

# Add method identifiers if not present
if 'training_method' not in full_ft_results.columns:
    full_ft_results['training_method'] = 'Full Fine-tuning'
if 'training_method' not in lora_results.columns:
    lora_results['training_method'] = 'LoRA'

# Combine all results
all_methods = pd.concat([full_ft_results, lora_results, results_fewshot], ignore_index=True)

print("\nTraining Efficiency Comparison:")
print(all_methods[['training_method', 'train_samples','f1_score', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Save combined results
all_methods.to_csv("/content/drive/MyDrive/all_training_methods_comparison.csv", index=False)



COMPARISON: FULL FT vs LoRA vs FEW-SHOT

Training Efficiency Comparison:
           training_method  train_samples  f1_score  emissions_kg  training_time_hours
          Full Fine-Tuning          32579  0.448923      0.007816             0.061552
          Full Fine-Tuning          65159  0.570179      0.013233             0.091409
          Full Fine-Tuning         104255  0.609413      0.020736             0.142435
                      LoRA         104255  0.493034      0.009820             0.135649
                      LoRA         104255  0.512776      0.009813             0.135268
                      LoRA         104255  0.545845      0.009804             0.135895
Few-Shot (Frozen Backbone)            100  0.009092      0.004195             0.035131
Few-Shot (Frozen Backbone)            500  0.019856      0.004351             0.036160
Few-Shot (Frozen Backbone)           1000  0.022245      0.004509             0.037972
