# Quantifying the Environmental Cost of AI: Carbon Emissions in Language Model Fine-Tuning for Question Answering

> ### **Project Goal** : As language models continue to play a larger role in natural language processing, their environmental impact has become an important issue to consider. While much of the research in this area focuses on improving model accuracy, the energy use and carbon footprint involved in training these systems are often overlooked or poorly documented. This project aims to explore that imbalance by studying how improvements in model performance relate to the environmental costs of fine-tuning.


# Training Strategy 1: Full Fine-Tuning (Model DistilBERT)

In [None]:
!pip install transformers
!pip install datasets
!pip install accelerate
!pip install codecarbon
!pip install evaluate codecarbon

Collecting codecarbon
  Downloading codecarbon-3.1.1-py3-none-any.whl.metadata (12 kB)
Collecting fief-client[cli] (from codecarbon)
  Downloading fief_client-0.20.0-py3-none-any.whl.metadata (2.1 kB)
Collecting psutil>=6.0.0 (from codecarbon)
  Downloading psutil-7.1.3-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl.metadata (23 kB)
Collecting rapidfuzz (from codecarbon)
  Downloading rapidfuzz-3.14.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB)
Collecting questionary (from codecarbon)
  Downloading questionary-2.1.1-py3-none-any.whl.metadata (5.4 kB)
Collecting httpx<0.28.0,>=0.21.3 (from fief-client[cli]->codecarbon)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jwcrypto<2.0.0,>=1.4 (from fief-client[cli]->codecarbon)
  Downloading jwcrypto-1.5.6-py3-none-any.whl.metadata (3.1 kB)
Collecting yaspin (from fief-client[cli]->codecarbon)
  Downloading yaspin-3.3.0-py3-none-any.whl.metadata (15 kB)


Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


In [None]:
# Importing Necessary Libraries
import os
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForQuestionAnswering,
    TrainingArguments,
    Trainer,
    default_data_collator
)
import torch
import numpy as np
from datasets import Dataset
import evaluate
from codecarbon import EmissionsTracker
from google.colab import drive
import pandas as pd
from collections import defaultdict
import json

drive.mount('/content/drive')

Mounted at /content/drive


## STEP 1: Loading The Stanford Question Answering Dataset (SQuAD) Dataset

In [None]:
squad = load_dataset("squad_v2")
df_train = pd.DataFrame(squad['train'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

squad_v2/train-00000-of-00001.parquet:   0%|          | 0.00/16.4M [00:00<?, ?B/s]

squad_v2/validation-00000-of-00001.parqu(…):   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/130319 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11873 [00:00<?, ? examples/s]

In [None]:
print("SQuAD Format: ",squad)
print(f"\nFull training set size: {len(squad['train'])}")
print(f"\nValidation set size: {len(squad['validation'])}")

SQuAD Format:  DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

Full training set size: 130319

Validation set size: 11873


In [None]:
df_train.head(10)

Unnamed: 0,id,title,context,question,answers
0,56be85543aeaaa14008c9063,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,When did Beyonce start becoming popular?,"{'text': ['in the late 1990s'], 'answer_start'..."
1,56be85543aeaaa14008c9065,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,What areas did Beyonce compete in when she was...,"{'text': ['singing and dancing'], 'answer_star..."
2,56be85543aeaaa14008c9066,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,When did Beyonce leave Destiny's Child and bec...,"{'text': ['2003'], 'answer_start': [526]}"
3,56bf6b0f3aeaaa14008c9601,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,In what city and state did Beyonce grow up?,"{'text': ['Houston, Texas'], 'answer_start': [..."
4,56bf6b0f3aeaaa14008c9602,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,In which decade did Beyonce become famous?,"{'text': ['late 1990s'], 'answer_start': [276]}"
5,56bf6b0f3aeaaa14008c9603,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,In what R&B group was she the lead singer?,"{'text': ['Destiny's Child'], 'answer_start': ..."
6,56bf6b0f3aeaaa14008c9604,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,What album made her a worldwide known artist?,"{'text': ['Dangerously in Love'], 'answer_star..."
7,56bf6b0f3aeaaa14008c9605,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,Who managed the Destiny's Child group?,"{'text': ['Mathew Knowles'], 'answer_start': [..."
8,56d43c5f2ccc5a1400d830a9,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,When did Beyoncé rise to fame?,"{'text': ['late 1990s'], 'answer_start': [276]}"
9,56d43c5f2ccc5a1400d830aa,Beyoncé,Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...,What role did Beyoncé have in Destiny's Child?,"{'text': ['lead singer'], 'answer_start': [290]}"


## STEP 2: Tokenization For the Model Function

In [None]:
#Autotokenizer automatically picks the correct tokenizer for given model

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [None]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    contexts = [c.strip() for c in examples["context"]]

    # Tokenize
    tokenized = tokenizer(
        questions,
        contexts,
        max_length=384,
        stride=128,
        padding="max_length",
        truncation="only_second",
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
    )

    # Mapping back to original samples
    sample_mapping = tokenized.pop("overflow_to_sample_mapping")
    offset_mapping = tokenized["offset_mapping"]

    start_positions = []
    end_positions = []

    for i, offsets in enumerate(offset_mapping):
        sample_idx = sample_mapping[i]
        answers = examples["answers"][sample_idx]

        # SQuAD v2: no answer case
        if len(answers["answer_start"]) == 0:
            start_positions.append(0)
            end_positions.append(0)
            continue

        start_char = answers["answer_start"][0]
        end_char = start_char + len(answers["text"][0])

        seq_ids = tokenized.sequence_ids(i)

        # Find context section
        context_start = seq_ids.index(1) if 1 in seq_ids else 0
        context_end = len(seq_ids) - 1 - seq_ids[::-1].index(1) if 1 in seq_ids else len(seq_ids) - 1

        # If answer not inside context → mark no answer
        if not (offsets[context_start][0] <= start_char and offsets[context_end][1] >= end_char):
            start_positions.append(0)
            end_positions.append(0)
            continue

        # Find start token
        token_start = context_start
        while token_start <= context_end and offsets[token_start][0] <= start_char:
            token_start += 1
        start_positions.append(token_start - 1)

        # Find end token
        token_end = context_end
        while token_end >= context_start and offsets[token_end][1] >= end_char:
            token_end -= 1
        end_positions.append(token_end + 1)

    tokenized["start_positions"] = start_positions
    tokenized["end_positions"] = end_positions

    return tokenized

In [None]:
#Prepareing function for tokenization based of training size of the data.

def prepare_dataset(train_data, size_fraction, preprocess_fn):

    #Create and preprocess a subset of training data.
    num_samples = int(len(train_data) * size_fraction)
    train_subset = train_data.select(range(num_samples))

    print(f"🔄 Preprocessing {num_samples} training samples...")
    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    return tokenized_train, num_samples

In [None]:
# Preprocess validation set (full)
print("\n🔄 Preprocessing validation set...")
tokenized_validation = squad["validation"].map(
    preprocess_function,
    batched=True,
    remove_columns=squad["validation"].column_names
)


🔄 Preprocessing validation set...


Map:   0%|          | 0/11873 [00:00<?, ? examples/s]

## STEP 3: Training The DistilBert Model Functions

In [None]:
#Model Architecture:
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")
print(f"\n{'='*80}")
print("\n🛠 BERT Model Architecture:")
print(f"{'='*80}")
print("\nTransformer layers:", model.config.num_hidden_layers)
print("Hidden size:", model.config.hidden_size)
print("Intermediate feed-forward size:", model.config.intermediate_size)
print("Attention heads:", model.config.num_attention_heads)
print("Max positional embeddings:", model.config.max_position_embeddings)
print("Vocabulary size:", model.config.vocab_size)


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.




🛠 BERT Model Architecture:

Transformer layers: 12
Hidden size: 768
Intermediate feed-forward size: 3072
Attention heads: 12
Max positional embeddings: 512
Vocabulary size: 30522


In [None]:
# Custom compute metrics function for F1 and Exact Match
def compute_metrics(pred):
    predictions, labels = pred
    start_preds = np.argmax(predictions[0], axis=1)
    end_preds = np.argmax(predictions[1], axis=1)

    start_true = labels[0]
    end_true = labels[1]

    # Calculate exact match
    exact_matches = ((start_preds == start_true) & (end_preds == end_true)).sum()
    exact_match = exact_matches / len(start_true)

    # Calculate F1 score (token overlap)
    f1_scores = []
    for start_p, end_p, start_t, end_t in zip(start_preds, end_preds, start_true, end_true):
        pred_tokens = set(range(start_p, end_p + 1))
        true_tokens = set(range(start_t, end_t + 1))

        if len(pred_tokens) == 0 and len(true_tokens) == 0:
            f1_scores.append(1.0)
        elif len(pred_tokens) == 0 or len(true_tokens) == 0:
            f1_scores.append(0.0)
        else:
            overlap = len(pred_tokens & true_tokens)
            precision = overlap / len(pred_tokens)
            recall = overlap / len(true_tokens)
            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
            f1_scores.append(f1)

    avg_f1 = np.mean(f1_scores)

    return {
        "exact_match": exact_match,
        "f1": avg_f1
    }

In [None]:
def train_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                size_fraction, model_name="bert-base-uncased"):

    # Load fresh model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Setup output directory
    output_dir = f"results_bert_{int(size_fraction*100)}pct"

    # Training arguments
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=2,
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_{int(size_fraction*100)}pct",
        output_dir=output_dir
    )
    tracker.start()

    # Train
    print("🏋️ Training model...")
    train_results = trainer.train()

    # Stop carbon tracking
    tracker.stop()

    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir


## STEP 4: Evaluating And Saving The Results Functions

In [None]:
def evaluate_and_save(trainer, train_results, emissions_data, output_dir,
                      size_fraction, num_samples):
    #Evaluate model, print results, and save artifacts.

    # Evaluate
    print("📊 Evaluating model...")
    eval_results = trainer.evaluate()

    # Compile results
    result_entry = {
        "training_method": "Full Fine-Tuning",
        "model_name": "BERT",
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data (direct access to EmissionsData attributes)
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,  # Power Usage Effectiveness
        "codecarbon_version": emissions_data.codecarbon_version,

    }


    # Print summary
    print(f"\n{'='*80}")
    print(f"\n📈 FINE-TUNING RESULTS SUMMARY FOR {size_fraction*100}% DATASET:")
    print(f"{'='*80}")
    print(f"  Training Method: Full Fine-Tuning")
    print(f"  Model: BERT")

    print(f"\n🎯 Performance Metrics:")
    print(f"  F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"  Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"  Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n⚡ Energy Consumption:")
    print(f"  Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"  CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"  GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"  RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\n🔌 Average Power Draw:")
    print(f"  CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"  GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"  RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"  Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\n🌱 Carbon Footprint:")
    print(f"  Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"  Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"  Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"  Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\n📍 Location & Infrastructure:")
    print(f"  Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"  Region: {emissions_data.region}")
    print(f"  On Cloud: {emissions_data.on_cloud}")
    print(f"  PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\n💻 System Specifications:")
    print(f"  OS: {emissions_data.os}")
    print(f"  CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"  GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"  GPU: None detected")
    print(f"  RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"  Python: {emissions_data.python_version}")

    print(f"\n{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")

    # Clear GPU memory
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

### STEP 4.1: Training and Evaluating The Model On Different DataSet Sizes

> We will be training our model on various data sizes from our SQuAD dataset.
>
> Training Data Variation: [25%, 50%, 80%]

In [None]:
def run_experiment(size_fraction, train_data, eval_data, tokenizer,
                   preprocess_fn, compute_metrics_fn, model_name="bert-base-uncased"):

    #Run a complete training experiment for a given dataset size.


    print(f"\n{'='*60}")
    print(f"🚀 Training with {size_fraction*100}% of training data")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train model
    trainer, train_results, emissions_data, output_dir = train_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save(
        trainer, train_results, emissions_data, output_dir,
        size_fraction, num_samples
    )

    return result_entry

In [None]:
# Store results
results_summary = []

In [None]:
#Considering 25% of data for training the model
result1 = run_experiment(
        size_fraction=0.25,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )

results_summary.append(result1)


🚀 Training with 25.0% of training data
🔄 Preprocessing 32579 training samples...


Map:   0%|          | 0/32579 [00:00<?, ? examples/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 16:57:52] [setup] RAM Tracking...
[codecarbon INFO @ 16:57:52] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 16:57:53] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 16:57:53] [setup] GPU Tracking...
[codecarbon INFO @ 16:57:53] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 16:57:53] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
       

🏋️ Training model...


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:[codecarbon INFO @ 16:58:10] Energy consumed for RAM : 0.000083 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 16:58:10] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 16:58:10] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 16:58:10] Energy consumed for all GPUs : 0.000117 kWh. Total GPU Power : 28.06860850966012 W
[codecarbon INFO @ 16:58:10] 0.000377 kWh of electricity and 0.000000 L of water were used since the beginning.


 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msanjanasawant524[0m ([33msanjanasawant524-rutgers-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.2382,1.473462,0.430691,0.510264
2,0.8957,1.601798,0.442723,0.528042


[codecarbon INFO @ 16:58:25] Energy consumed for RAM : 0.000167 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 16:58:25] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 16:58:25] Energy consumed for All CPU : 0.000354 kWh
[codecarbon INFO @ 16:58:25] Energy consumed for all GPUs : 0.000327 kWh. Total GPU Power : 50.34602143740257 W
[codecarbon INFO @ 16:58:25] 0.000848 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 16:58:31] Energy consumed for RAM : 0.000083 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 16:58:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 16:58:31] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 16:58:31] Energy consumed for all GPUs : 0.000275 kWh. Total GPU Power : 65.88194323824254 W
[codecarbon INFO @ 16:58:31] 0.000535 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 16:58

📊 Evaluating model...




📈 FINE-TUNING RESULTS SUMMARY FOR 25.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

🎯 Performance Metrics:
  F1 Score: 0.5280
  Exact Match: 0.4427
  Eval Loss: 1.6018

⚡ Energy Consumption:
  Total Energy: 0.051179 kWh
  CPU Energy: 0.016557 kWh (32.4%)
  GPU Energy: 0.026831 kWh (52.4%)
  RAM Energy: 0.007791 kWh (15.2%)

🔌 Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 63.69 W
  RAM Power: 20.00 W
  Total Power: 126.19 W

🌱 Carbon Footprint:
  Total CO2 Emissions: 0.032877 kg
  Emissions Rate: 0.000023428 kg/s
  Duration: 0.39 hours
  Training Time (Trainer): 0.39 hours

📍 Location & Infrastructure:
  Country: Taiwan (TWN)
  Region: taipei city
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
  GPU: 1 x Tesla T4 (Count: 1)
  RAM: 50.99 GB
  Python: 3.12.12



In [None]:
#Considering 50% of data for training the model
result2 = run_experiment(
        size_fraction=0.5,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )
results_summary.append(result2)


🚀 Training with 50.0% of training data
🔄 Preprocessing 65159 training samples...


Map:   0%|          | 0/65159 [00:00<?, ? examples/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 17:23:20] [setup] RAM Tracking...
[codecarbon INFO @ 17:23:20] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 17:23:21] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 17:23:21] [setup] GPU Tracking...
[codecarbon INFO @ 17:23:21] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 17:23:21] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
       

🏋️ Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.168,1.314587,0.506593,0.594073
2,0.8378,1.279595,0.543514,0.630827


[codecarbon INFO @ 17:23:38] Energy consumed for RAM : 0.000083 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 17:23:38] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 17:23:38] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 17:23:38] Energy consumed for all GPUs : 0.000282 kWh. Total GPU Power : 67.62983665809448 W
[codecarbon INFO @ 17:23:38] 0.000542 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 17:23:38] Energy consumed for RAM : 0.000083 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 17:23:38] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 17:23:38] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 17:23:38] Energy consumed for all GPUs : 0.000288 kWh. Total GPU Power : 69.11975039996953 W
[codecarbon INFO @ 17:23:38] 0.000549 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 17:23

📊 Evaluating model...




📈 FINE-TUNING RESULTS SUMMARY FOR 50.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

🎯 Performance Metrics:
  F1 Score: 0.6308
  Exact Match: 0.5435
  Eval Loss: 1.2796

⚡ Energy Consumption:
  Total Energy: 0.097278 kWh
  CPU Energy: 0.031300 kWh (32.2%)
  GPU Energy: 0.051251 kWh (52.7%)
  RAM Energy: 0.014728 kWh (15.1%)

🔌 Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 65.72 W
  RAM Power: 20.00 W
  Total Power: 128.22 W

🌱 Carbon Footprint:
  Total CO2 Emissions: 0.062489 kg
  Emissions Rate: 0.000023556 kg/s
  Duration: 0.74 hours
  Training Time (Trainer): 0.74 hours

📍 Location & Infrastructure:
  Country: Taiwan (TWN)
  Region: taipei city
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
  GPU: 1 x Tesla T4 (Count: 1)
  RAM: 50.99 GB
  Python: 3.12.12



In [None]:
#Considering 80% of data for training the model
result3 = run_experiment(
        size_fraction=0.8,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )
results_summary.append(result3)


🚀 Training with 80.0% of training data
🔄 Preprocessing 104255 training samples...


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 19:02:14] [setup] RAM Tracking...
[codecarbon INFO @ 19:02:14] [setup] CPU Tracking...
[codecarbon INFO @ 19:02:15] Energy consumed for RAM : 0.017406 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 19:02:15] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 19:02:15] Energy consumed for All CPU : 0.036992 kWh
[codecarbon INFO @ 19:02:15] Energy consumed for all GPUs : 0.060551 kWh. Total GPU Power : 61.26110639174676 W
[codecarbon INFO @ 19:02:15] 0.114949 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 19:02:15] Energy consumed for RAM : 0.017406 kWh. RAM Power : 20.0 W
[codecarb

🏋️ Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.0977,1.048006,0.596753,0.674162
2,0.814,1.135933,0.596918,0.682119


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[codecarbon INFO @ 19:10:47] Energy consumed for RAM : 0.002832 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 19:10:47] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 19:10:47] Energy consumed for All CPU : 0.006018 kWh
[codecarbon INFO @ 19:10:47] Energy consumed for all GPUs : 0.009847 kWh. Total GPU Power : 69.56335750241558 W
[codecarbon INFO @ 19:10:47] 0.018697 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 19:10:48] Energy consumed for RAM : 0.002832 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 19:10:48] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 19:10:48] Energy consumed for All CPU : 0.006018 kWh
[codecarbon INFO @ 19:10:48] Energy consumed for all GPUs : 0.009856 kWh. Total GPU Power : 69.95478440412653 W
[codecarbon INFO @ 19:10:48] 0.018706 kWh of electricity and 0.000000 L

📊 Evaluating model...


[codecarbon INFO @ 20:11:45] Energy consumed for RAM : 0.040559 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:11:45] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:11:45] Energy consumed for All CPU : 0.086197 kWh
[codecarbon INFO @ 20:11:45] Energy consumed for all GPUs : 0.141023 kWh. Total GPU Power : 65.03756782657939 W
[codecarbon INFO @ 20:11:45] 0.267779 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:11:46] Energy consumed for RAM : 0.040559 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:11:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:11:46] Energy consumed for All CPU : 0.086197 kWh
[codecarbon INFO @ 20:11:46] Energy consumed for all GPUs : 0.141037 kWh. Total GPU Power : 65.42828663291415 W
[codecarbon INFO @ 20:11:46] 0.267793 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:12



📈 FINE-TUNING RESULTS SUMMARY FOR 80.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

🎯 Performance Metrics:
  F1 Score: 0.6821
  Exact Match: 0.5969
  Eval Loss: 1.1359

⚡ Energy Consumption:
  Total Energy: 0.152376 kWh
  CPU Energy: 0.049048 kWh (32.2%)
  GPU Energy: 0.080249 kWh (52.7%)
  RAM Energy: 0.023079 kWh (15.1%)

🔌 Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 40.71 W
  RAM Power: 20.00 W
  Total Power: 103.21 W

🌱 Carbon Footprint:
  Total CO2 Emissions: 0.097883 kg
  Emissions Rate: 0.000023546 kg/s
  Duration: 1.15 hours
  Training Time (Trainer): 1.15 hours

📍 Location & Infrastructure:
  Country: Taiwan (TWN)
  Region: taipei city
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
  GPU: 1 x Tesla T4 (Count: 1)
  RAM: 50.99 GB
  Python: 3.12.12



In [None]:
# Create summary DataFrame
results_df = pd.DataFrame(results_summary)
results_df['dataset_size%'] = (results_df['train_samples'] / len(squad['train']) * 100).round(0)

print("\n" + "="*60)
print("📊 FINAL RESULTS SUMMARY")
print("="*60)
print(results_df.to_string(index=False))


📊 FINAL RESULTS SUMMARY
 training_method model_name  train_samples  valid_samples  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code      region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count    gpu_model  ram_total_size_gb  pue codecarbon_version  dataset_size%
Full Fine-Tuning       BERT          32579          12134  0.528042     0.442723   1.601798             0.389657                 0.000023      0.032877 2025-11-30T17:21:18       1403.303696        0.389807             0.051179        0.016557        0.026831        0.007791         42.5    63.690722         20.0       Taiwan              TWN taipei city                                    N Linux-6.6.105

In [None]:
results_df.to_csv("/content/drive/MyDrive/bert_dataset_size_results.csv", index=False)
print("\n✅ Results saved to Google Drive!")

[codecarbon INFO @ 20:34:46] Energy consumed for RAM : 0.048222 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:34:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:34:46] Energy consumed for All CPU : 0.102483 kWh
[codecarbon INFO @ 20:34:46] Energy consumed for all GPUs : 0.154711 kWh. Total GPU Power : 34.461998953331126 W
[codecarbon INFO @ 20:34:46] 0.305416 kWh of electricity and 0.000000 L of water were used since the beginning.



✅ Results saved to Google Drive!


## MODEL EVALUATION WITH EXAMPLES

In [None]:
def test_model_manual(model, tokenizer, examples):
    model.eval()
    device = model.device
    results = []

    print("\n" + "="*80)
    print("🧪 MODEL EVALUATION (MANUAL MODE)")
    print("="*80)

    for i, example in enumerate(examples, 1):
        question = example['question']
        context = example['context']
        expected = example.get('expected_answer', None)

        # Tokenize
        inputs = tokenizer(question, context, return_tensors="pt",
                          max_length=384, truncation=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Get predictions
        with torch.no_grad():
            outputs = model(**inputs)

        # Extract answer
        start_idx = outputs.start_logits.argmax().item()
        end_idx = outputs.end_logits.argmax().item()

        # Get confidence scores
        start_score = torch.softmax(outputs.start_logits, dim=1)[0][start_idx].item()
        end_score = torch.softmax(outputs.end_logits, dim=1)[0][end_idx].item()
        confidence = (start_score + end_score) / 2

        # Decode answer
        if start_idx <= end_idx:
            answer_tokens = inputs["input_ids"][0][start_idx:end_idx+1]
            predicted_answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)
        else:
            predicted_answer = "[NO ANSWER]"

        # Store result
        result = {
            'question': question,
            'context': context[:100] + "..." if len(context) > 100 else context,
            'predicted_answer': predicted_answer,
            'expected_answer': expected,
            'confidence': confidence,
            'start_position': start_idx,
            'end_position': end_idx
        }
        results.append(result)

        # Print formatted output
        print(f"\n📝 Example {i}")
        print(f"Question: {question}")
        print(f"Context: {context[:150]}{'...' if len(context) > 150 else ''}")
        print(f"\n✅ Predicted Answer: '{predicted_answer}'")
        print(f"   Confidence: {confidence:.2%}")

        if expected:
            match = predicted_answer.lower().strip() == expected.lower().strip()
            print(f"   Expected Answer: '{expected}'")
            print(f"   Exact Match: {'✓ YES' if match else '✗ NO'}")

        print("-" * 80)

    return results

In [None]:
test_examples = [
    {
        'question': "What does Google Colab provide access to?",
        'context': "Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.",
        'expected_answer': "GPUs and TPUs"
    },
    {
        'question': "What is the capital of France?",
        'context': "Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.",
        'expected_answer': "Paris"
    },
    {
        'question': "When was Python created?",
        'context': "Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.",
        'expected_answer': "1991"
    },
    {
        'question': "What is photosynthesis?",
        'context': "Photosynthesis is the process by which plants use sunlight, water and carbon dioxide to create oxygen and energy in the form of sugar.",
        'expected_answer': "process by which plants use sunlight, water and carbon dioxide to create oxygen and energy"
    },
    {
        'question': "Who invented the telephone?",
        'context': "The telephone was invented by Alexander Graham Bell in 1876. He made the first successful telephone call on March 10, 1876.",
        'expected_answer': "Alexander Graham Bell"
    }
]


In [None]:
print(os.listdir('/content/'))

['.config', 'wandb', 'results_bert_25pct', 'results_bert_80pct', 'drive', 'results_bert_50pct', 'sample_data']


In [None]:
model_path = "results_bert_80pct/final_model"
model = AutoModelForQuestionAnswering.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
results = test_model_manual(model, tokenizer, test_examples)


🧪 MODEL EVALUATION (MANUAL MODE)

📝 Example 1
Question: What does Google Colab provide access to?
Context: Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.

✅ Predicted Answer: 'gpus and tpus'
   Confidence: 93.45%
   Expected Answer: 'GPUs and TPUs'
   Exact Match: ✓ YES
--------------------------------------------------------------------------------

📝 Example 2
Question: What is the capital of France?
Context: Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.

✅ Predicted Answer: 'paris'
   Confidence: 98.99%
   Expected Answer: 'Paris'
   Exact Match: ✓ YES
--------------------------------------------------------------------------------

📝 Example 3
Question: When was Python created?
Context: Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.

✅ Predicted Answer: '1991'
   

## Plots For Comparing Trends With Respect To The Change In Sizes.

In [None]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import numpy as np

In [None]:
# Load the dataset
full_ft_results = pd.read_csv("/content/drive/MyDrive/bert_dataset_size_results.csv")

print("📊 Data loaded successfully!")
print(f"Total experiments: {len(full_ft_results)}")
print("\nExperiments:")
print(full_ft_results[['train_samples', 'dataset_size%', 'f1_score', 'emissions_kg']])


📊 Data loaded successfully!
Total experiments: 3

Experiments:
   train_samples  dataset_size%  f1_score  emissions_kg
0          32579           25.0  0.528042      0.032877
1          65159           50.0  0.630827      0.062489
2         104255           80.0  0.682119      0.097883


In [None]:
# PLOT 1: Energy Consumption vs Dataset Size (Stacked Area)
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.update_layout(
    title=dict(text="Energy Consumption Scaling with Dataset Size", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_energy_scaling.html")
print("\n✅ Plot 1 saved: full_ft_energy_scaling.html")


✅ Plot 1 saved: full_ft_energy_scaling.html


In [None]:
# PLOT 2: Performance & Emissions Growth (Dual Y-axis)
df_sorted = full_ft_results.sort_values('train_samples')

fig = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig.add_trace(
    go.Bar(
        x=df_sorted['dataset_size%'],
        y=df_sorted['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=True
)

fig.update_xaxes(title_text="Dataset Size (%)")
fig.update_yaxes(title_text="Performance Score", secondary_y=False)
fig.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig.update_layout(
    title=dict(text="Performance vs Carbon Emissions by Dataset Size", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_performance_emissions.html")
print("\n✅ Plot 2 saved: full_ft_performance_emissions.html")


✅ Plot 2 saved: full_ft_performance_emissions.html


In [None]:
# PLOT 3: Efficiency Analysis (Diminishing Returns)
df_sorted = full_ft_results.sort_values('train_samples').copy()
df_sorted['f1_per_kg_co2'] = df_sorted['f1_score'] / df_sorted['emissions_kg']
df_sorted['em_per_kg_co2'] = df_sorted['exact_match'] / df_sorted['emissions_kg']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['f1_per_kg_co2'],
    name='F1 / kg CO₂',
    mode='lines+markers',
    line=dict(color='#4ECDC4', width=3),
    marker=dict(size=12),
    fill='tozeroy',
    fillcolor='rgba(78, 205, 196, 0.2)',
    hovertemplate='<b>F1 Efficiency</b>: %{y:.2f}<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['em_per_kg_co2'],
    name='EM / kg CO₂',
    mode='lines+markers',
    line=dict(color='#95E1D3', width=3, dash='dash'),
    marker=dict(size=10),
    hovertemplate='<b>EM Efficiency</b>: %{y:.2f}<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.update_layout(
    title=dict(text="Carbon Efficiency: Performance per kg CO₂", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Efficiency (Score per kg CO₂)',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

# Add annotation for optimal point
optimal_idx = df_sorted['f1_per_kg_co2'].idxmax()
optimal_row = df_sorted.loc[optimal_idx]

fig.add_annotation(
    x=optimal_row['dataset_size%'],
    y=optimal_row['f1_per_kg_co2'],
    text=f"Most Efficient:<br>{optimal_row['dataset_size%']:.0f}%",
    showarrow=True,
    arrowhead=2,
    arrowcolor="#FF6B6B",
    font=dict(size=12, color="#FF6B6B")
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_efficiency.html")
print("\n✅ Plot 3 saved: full_ft_efficiency.html")


✅ Plot 3 saved: full_ft_efficiency.html


In [None]:
# PLOT 4: Training Time vs Energy Consumption
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['training_time_hours'],
    y=df_sorted['energy_consumed_kwh'],
    mode='markers+lines',
    marker=dict(
        size=df_sorted['train_samples'] / 1000,  # Size by dataset
        color=df_sorted['f1_score'],
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="F1 Score"),
        line=dict(width=2, color='white')
    ),
    line=dict(color='#4ECDC4', width=2, dash='dot'),
    text=df_sorted['dataset_size%'].astype(str) + '%',
    textposition='top center',
    hovertemplate='<b>Dataset: %{text}</b><br>' +
                  'Time: %{x:.2f} hours<br>' +
                  'Energy: %{y:.6f} kWh<br>' +
                  '<extra></extra>'
))

fig.update_layout(
    title=dict(text="Training Time vs Energy Consumption", font=dict(size=18)),
    xaxis_title='Training Time (hours)',
    yaxis_title='Total Energy Consumption (kWh)',
    template='plotly_white',
    height=500,
    font=dict(size=13)
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_time_energy.html")
print("\n✅ Plot 4 saved: full_ft_time_energy.html")


✅ Plot 4 saved: full_ft_time_energy.html


In [None]:
# PLOT 5: Component-wise Power Draw
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['cpu_power_w'],
    name='CPU Power',
    mode='lines+markers',
    line=dict(color='#FF6B6B', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>CPU</b>: %{y:.2f} W<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['gpu_power_w'],
    name='GPU Power',
    mode='lines+markers',
    line=dict(color='#4ECDC4', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>GPU</b>: %{y:.2f} W<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['ram_power_w'],
    name='RAM Power',
    mode='lines+markers',
    line=dict(color='#95E1D3', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>RAM</b>: %{y:.2f} W<extra></extra>'
))

fig.update_layout(
    title=dict(text="Average Power Draw by Component", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Power Draw (Watts)',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_power_breakdown.html")
print("\n✅ Plot 5 saved: full_ft_power_breakdown.html")

[codecarbon INFO @ 20:35:31] Energy consumed for RAM : 0.048472 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:35:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:35:31] Energy consumed for All CPU : 0.103014 kWh
[codecarbon INFO @ 20:35:31] Energy consumed for all GPUs : 0.155140 kWh. Total GPU Power : 34.30187490497399 W
[codecarbon INFO @ 20:35:31] 0.306626 kWh of electricity and 0.000000 L of water were used since the beginning.



✅ Plot 5 saved: full_ft_power_breakdown.html


In [None]:
# PLOT 6: Comprehensive Dashboard (All Metrics)
df_sorted = full_ft_results.sort_values('train_samples')

fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=(
        'Energy Consumption',
        'Performance Metrics',
        'Carbon Emissions',
        'Training Time',
        'Power Draw',
        'Cost-Benefit Analysis'
    ),
    specs=[
        [{"type": "bar"}, {"type": "scatter"}, {"type": "bar"}],
        [{"type": "scatter"}, {"type": "bar"}, {"type": "scatter"}]
    ]
)

# 1. Energy consumption (stacked)
fig.add_trace(
    go.Bar(name='CPU', x=df_sorted['dataset_size%'],
           y=df_sorted['cpu_energy_kwh'], marker_color='#FF6B6B'),
    row=1, col=1
)
fig.add_trace(
    go.Bar(name='GPU', x=df_sorted['dataset_size%'],
           y=df_sorted['gpu_energy_kwh'], marker_color='#4ECDC4'),
    row=1, col=1
)

# 2. Performance metrics
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['f1_score'],
               mode='lines+markers', name='F1', line=dict(color='#4ECDC4', width=3)),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['exact_match'],
               mode='lines+markers', name='EM', line=dict(color='#95E1D3', width=3, dash='dash')),
    row=1, col=2
)

# 3. Carbon emissions
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['emissions_kg'],
           marker_color='#FF6B6B', showlegend=False),
    row=1, col=3
)

# 4. Training time
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['training_time_hours'],
               mode='lines+markers', marker=dict(size=12, color='#FFA07A'),
               line=dict(color='#FFA07A', width=3), showlegend=False),
    row=2, col=1
)

# 5. Power draw (stacked)
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['cpu_power_w'],
           marker_color='#FF6B6B', showlegend=False),
    row=2, col=2
)
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['gpu_power_w'],
           marker_color='#4ECDC4', showlegend=False),
    row=2, col=2
)

# 6. Efficiency
efficiency = df_sorted['f1_score'] / df_sorted['emissions_kg']
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=efficiency,
               mode='lines+markers', marker=dict(size=12, color='#9370DB'),
               line=dict(color='#9370DB', width=3), showlegend=False),
    row=2, col=3
)

# Update axes labels
fig.update_xaxes(title_text="Dataset %", row=1, col=1)
fig.update_xaxes(title_text="Dataset %", row=1, col=2)
fig.update_xaxes(title_text="Dataset %", row=1, col=3)
fig.update_xaxes(title_text="Dataset %", row=2, col=1)
fig.update_xaxes(title_text="Dataset %", row=2, col=2)
fig.update_xaxes(title_text="Dataset %", row=2, col=3)

fig.update_yaxes(title_text="Energy (kWh)", row=1, col=1)
fig.update_yaxes(title_text="Score", row=1, col=2)
fig.update_yaxes(title_text="CO₂ (kg)", row=1, col=3)
fig.update_yaxes(title_text="Hours", row=2, col=1)
fig.update_yaxes(title_text="Power (W)", row=2, col=2)
fig.update_yaxes(title_text="F1/kg CO₂", row=2, col=3)

fig.update_layout(
    height=800,
    title_text="<b>Full Fine-tuning: Comprehensive Analysis Dashboard</b>",
    showlegend=True,
    template='plotly_white',
    barmode='stack',
    font=dict(size=11)
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_dashboard.html")
print("\n✅ Plot 6 saved: full_ft_dashboard.html")


✅ Plot 6 saved: full_ft_dashboard.html


In [None]:
# SUMMARY STATISTICS & KEY INSIGHTS
print("\n" + "="*80)
print("📊 FULL FINE-TUNING SUMMARY STATISTICS")
print("="*80)

df_sorted = full_ft_results.sort_values('train_samples')

print(f"\n📈 Performance Growth:")
f1_growth = ((df_sorted['f1_score'].iloc[-1] - df_sorted['f1_score'].iloc[0]) /
             df_sorted['f1_score'].iloc[0] * 100)
print(f"  F1 Score improvement (smallest to largest): +{f1_growth:.2f}%")
print(f"  Best F1 Score: {df_sorted['f1_score'].max():.4f} at {df_sorted.loc[df_sorted['f1_score'].idxmax(), 'dataset_size%']:.0f}%")

print(f"\n🌱 Carbon Impact:")
emissions_growth = ((df_sorted['emissions_kg'].iloc[-1] - df_sorted['emissions_kg'].iloc[0]) /
                    df_sorted['emissions_kg'].iloc[0] * 100)
print(f"  Emissions growth (smallest to largest): +{emissions_growth:.2f}%")
print(f"  Total CO₂: {df_sorted['emissions_kg'].sum():.6f} kg")

print(f"\n⚡ Energy Analysis:")
print(f"  Total Energy Consumed: {df_sorted['energy_consumed_kwh'].sum():.6f} kWh")
gpu_ratio = (df_sorted['gpu_energy_kwh'].sum() / df_sorted['energy_consumed_kwh'].sum()) * 100
cpu_ratio = (df_sorted['cpu_energy_kwh'].sum() / df_sorted['energy_consumed_kwh'].sum()) * 100
print(f"  GPU Energy: {gpu_ratio:.1f}% of total")
print(f"  CPU Energy: {cpu_ratio:.1f}% of total")

print(f"\n💡 Efficiency Insights:")
df_sorted['efficiency'] = df_sorted['f1_score'] / df_sorted['emissions_kg']
best_eff_idx = df_sorted['efficiency'].idxmax()
print(f"  Most efficient dataset size: {df_sorted.loc[best_eff_idx, 'dataset_size%']:.0f}%")
print(f"  Efficiency at this size: {df_sorted.loc[best_eff_idx, 'efficiency']:.2f} F1/kg CO₂")


📊 FULL FINE-TUNING SUMMARY STATISTICS

📈 Performance Growth:
  F1 Score improvement (smallest to largest): +29.18%
  Best F1 Score: 0.6821 at 80%

🌱 Carbon Impact:
  Emissions growth (smallest to largest): +197.73%
  Total CO₂: 0.193249 kg

⚡ Energy Analysis:
  Total Energy Consumed: 0.300834 kWh
  GPU Energy: 52.6% of total
  CPU Energy: 32.2% of total

💡 Efficiency Insights:
  Most efficient dataset size: 25%
  Efficiency at this size: 16.06 F1/kg CO₂


# Training Strategy 2: LoRA (Low-Rank Adaptation) fine-tuning (Model DistilBERT)

In [None]:
!pip install peft

[codecarbon INFO @ 20:35:45] Energy consumed for RAM : 0.048555 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:35:45] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:35:45] Energy consumed for All CPU : 0.103191 kWh
[codecarbon INFO @ 20:35:45] Energy consumed for all GPUs : 0.155280 kWh. Total GPU Power : 34.51536195087546 W
[codecarbon INFO @ 20:35:45] 0.307026 kWh of electricity and 0.000000 L of water were used since the beginning.




[codecarbon INFO @ 20:35:46] Energy consumed for RAM : 0.048555 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:35:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:35:46] Energy consumed for All CPU : 0.103191 kWh
[codecarbon INFO @ 20:35:46] Energy consumed for all GPUs : 0.155284 kWh. Total GPU Power : 34.51871501988805 W




[codecarbon INFO @ 20:35:46] 0.307030 kWh of electricity and 0.000000 L of water were used since the beginning.


In [None]:
# ============================================================================
# Training Strategy 2: LoRA (Low-Rank Adaptation) Fine-Tuning (Model BERT)
# ============================================================================

# Import PEFT for LoRA
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

print("="*80)
print(" LORA FINE-TUNING SETUP")
print("="*80)


 LORA FINE-TUNING SETUP


In [None]:
# STEP 5: Creating And Training LoRA Model
# ============================================================================

def create_lora_model(model_name="bert-base-uncased", r=8, lora_alpha=16, lora_dropout=0.1):
    """
    Create BERT model with LoRA adapters.

    Args:
        model_name: Base model name
        r: Rank of update matrices
        lora_alpha: Scaling factor
        lora_dropout: Dropout probability
    """
    # Load base model
    base_model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Configure LoRA
    lora_config = LoraConfig(
        task_type=TaskType.QUESTION_ANS,  # Task type for QA
        r=r,  # Rank of update matrices
        lora_alpha=lora_alpha,  # Scaling factor
        lora_dropout=lora_dropout,  # Dropout probability
        target_modules=["query", "value"],  # Which layers to apply LoRA to (BERT attention)
        bias="none",  # Don't train biases
        inference_mode=False,  # Training mode
    )

    # Apply LoRA to model
    lora_model = get_peft_model(base_model, lora_config)

    # Print trainable parameters
    lora_model.print_trainable_parameters()

    return lora_model

def train_lora_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                     size_fraction, lora_rank=8):
    """Train BERT model with LoRA fine-tuning."""

    # Create LoRA model
    print(f"\n🔧 Creating LoRA model (rank={lora_rank})...")
    lora_model = create_lora_model(
        model_name="bert-base-uncased",
        r=lora_rank,
        lora_alpha=lora_rank * 2,  # Common practice: alpha = 2*r
        lora_dropout=0.1
    )

    # Setup output directory
    output_dir = f"results_bert_lora_r{lora_rank}_{int(size_fraction*100)}pct"

    # Training arguments (can use higher learning rate for LoRA)
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=3e-4,  # Higher LR for LoRA
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=3,  # More epochs for LoRA
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=lora_model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_LoRA_r{lora_rank}_{int(size_fraction*100)}pct",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print("🚀 Training LoRA model...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, lora_model


In [None]:
# STEP 6: Evaluating The LoRA Model On Different Rank Sizes
# ============================================================================

def evaluate_and_save_lora(trainer, train_results, emissions_data, output_dir,
                           size_fraction, num_samples, lora_model):
    """Evaluate LoRA model and save results with detailed emissions."""

    print("📊 Evaluating LoRA model...")
    eval_results = trainer.evaluate()

    # Count trainable parameters
    trainable_params = sum(p.numel() for p in lora_model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in lora_model.parameters())
    trainable_percentage = 100 * trainable_params / total_params

    # Extract emissions data from EmissionsData object
    result_entry = {
        "training_method": "LoRA",
        "model_name": "BERT",
        "lora_rank": lora_model.peft_config['default'].r,
        'dataset_size%': int(size_fraction*100),
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print detailed summary
    print(f"\n{'='*80}")
    print(f"  LoRA RESULTS SUMMARY (Rank {result_entry['lora_rank']}, {size_fraction*100}% Dataset)")
    print(f"{'='*80}")
    print(f"\n📦 Model Configuration:")
    print(f"   Training Method: LoRA")
    print(f"   LoRA Rank: {result_entry['lora_rank']}")
    print(f"   Trainable Parameters: {trainable_params:,} ({trainable_percentage:.2f}%)")
    print(f"   Total Parameters: {total_params:,}")
    print(f"   Dataset Size: {size_fraction*100}%")

    print(f"\n📈 Performance Metrics:")
    print(f"   F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"   Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"   Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n⚡ Energy Consumption:")
    print(f"   Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"   CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"   GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"   RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\n🔌 Average Power Draw:")
    print(f"   CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"   GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"   RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"   Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\n🌍 Carbon Footprint:")
    print(f"   Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"   Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"   Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"   Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\n📍 Location & Infrastructure:")
    print(f"   Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"   Region: {emissions_data.region}")
    print(f"   On Cloud: {emissions_data.on_cloud}")
    print(f"   PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\n💻 System Specifications:")
    print(f"   OS: {emissions_data.os}")
    print(f"   CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"   GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"   GPU: None detected")
    print(f"   RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"   Python: {emissions_data.python_version}")
    print(f"\n{'='*80}")

    # Save LoRA adapters
    lora_model.save_pretrained(f"{output_dir}/lora_adapters")
    tokenizer.save_pretrained(f"{output_dir}/lora_adapters")
    print(f"✅ LoRA adapters saved to {output_dir}/lora_adapters")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

def run_lora_experiment(size_fraction, train_data, eval_data, tokenizer, preprocess_fn,
                        compute_metrics_fn, lora_rank):
    """Run complete LoRA experiment for given dataset size and rank."""

    print(f"\n{'='*60}")
    print(f"  LoRA Training with {size_fraction*100}% of training data (Rank {lora_rank})")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train LoRA model
    trainer, train_results, emissions_data, output_dir, lora_model = train_lora_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, lora_rank
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_lora(
        trainer, train_results, emissions_data, output_dir,
        size_fraction, num_samples, lora_model
    )

    return result_entry

In [None]:
# We will be training our model on various data sizes from our SQuAD dataset.
# Training Data Variation: [25%, 50%, 80%]
# ============================================================================

result_lora = []

# Experiment 1: 25% data
print("\n" + "="*80)
print("  EXPERIMENT 1: LoRA FINE-TUNING WITH 25% TRAINING DATASET (Rank 8)")
print("="*80)

result_lora_25 = run_lora_experiment(
    size_fraction=0.25,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=8
)
result_lora.append(result_lora_25)

# Experiment 2: 50% data
print("\n" + "="*80)
print("  EXPERIMENT 2: LoRA FINE-TUNING WITH 50% TRAINING DATASET (Rank 8)")
print("="*80)

result_lora_50 = run_lora_experiment(
    size_fraction=0.5,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=8
)
result_lora.append(result_lora_50)

# Experiment 3: 80% data
print("\n" + "="*80)
print("  EXPERIMENT 3: LoRA FINE-TUNING WITH 80% TRAINING DATASET (Rank 8)")
print("="*80)

result_lora_80 = run_lora_experiment(
    size_fraction=0.8,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=8
)
result_lora.append(result_lora_80)


  EXPERIMENT 1: LoRA FINE-TUNING WITH 25% TRAINING DATASET (Rank 8)

  LoRA Training with 25.0% of training data (Rank 8)
🔄 Preprocessing 32579 training samples...


Map:   0%|          | 0/32579 [00:00<?, ? examples/s]

[codecarbon INFO @ 20:39:00] Energy consumed for RAM : 0.049638 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:39:00] Delta energy consumed for CPU with constant : 0.000178 kWh, power : 42.5 W
[codecarbon INFO @ 20:39:00] Energy consumed for All CPU : 0.105493 kWh
[codecarbon INFO @ 20:39:00] Energy consumed for all GPUs : 0.157144 kWh. Total GPU Power : 34.497783706988116 W
[codecarbon INFO @ 20:39:00] 0.312276 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:39:01] Energy consumed for RAM : 0.049638 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:39:01] Delta energy consumed for CPU with constant : 0.000178 kWh, power : 42.5 W
[codecarbon INFO @ 20:39:01] Energy consumed for All CPU : 0.105493 kWh
[codecarbon INFO @ 20:39:01] Energy consumed for all GPUs : 0.157147 kWh. Total GPU Power : 34.40032213567544 W
[codecarbon INFO @ 20:39:01] 0.312277 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:3


🔧 Creating LoRA model (rank=8)...


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 20:39:17] [setup] RAM Tracking...
[codecarbon INFO @ 20:39:17] [setup] CPU Tracking...


trainable params: 296,450 || all params: 109,189,636 || trainable%: 0.2715


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 20:39:18] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:39:18] [setup] GPU Tracking...
[codecarbon INFO @ 20:39:18] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:39:18] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 20:39:18] >>> Tracker's metadata:
[codecarbon INFO @ 20:39:18]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 20:39:18]   Python version: 3.12.12
[codecarbon INFO @ 20:39:18]   CodeCarbon version: 3.1.1
[codecarbon INFO @ 20:39:18]   Available RAM : 50.990 GB
[codecarbon INFO @ 20:39:18]   CPU count: 8 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 20:39:18

🚀 Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.613,1.731114,0.330559,0.400186
2,1.4197,1.799821,0.32685,0.408318
3,1.3756,1.851662,0.337894,0.418946


[codecarbon INFO @ 20:39:30] Energy consumed for RAM : 0.049805 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:39:30] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 20:39:30] Energy consumed for All CPU : 0.105847 kWh
[codecarbon INFO @ 20:39:30] Energy consumed for all GPUs : 0.157516 kWh. Total GPU Power : 55.02195520063454 W
[codecarbon INFO @ 20:39:30] 0.313168 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:39:31] Energy consumed for RAM : 0.049804 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 20:39:31] Delta energy consumed for CPU with constant : 0.000176 kWh, power : 42.5 W
[codecarbon INFO @ 20:39:31] Energy consumed for All CPU : 0.105846 kWh
[codecarbon INFO @ 20:39:31] Energy consumed for all GPUs : 0.157528 kWh. Total GPU Power : 57.32371933456027 W
[codecarbon INFO @ 20:39:31] 0.313178 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 20:39

📊 Evaluating LoRA model...


[codecarbon INFO @ 21:09:00] Energy consumed for RAM : 0.059632 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 21:09:00] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 21:09:00] Energy consumed for All CPU : 0.126733 kWh
[codecarbon INFO @ 21:09:00] Energy consumed for all GPUs : 0.191598 kWh. Total GPU Power : 67.90337944332907 W
[codecarbon INFO @ 21:09:00] 0.377963 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 21:09:01] Energy consumed for RAM : 0.059631 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 21:09:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 21:09:01] Energy consumed for All CPU : 0.126730 kWh
[codecarbon INFO @ 21:09:01] Energy consumed for all GPUs : 0.191610 kWh. Total GPU Power : 68.0060160015963 W
[codecarbon INFO @ 21:09:01] 0.377971 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 21:09:


  LoRA RESULTS SUMMARY (Rank 8, 25.0% Dataset)

📦 Model Configuration:
   Training Method: LoRA
   LoRA Rank: 8
   Trainable Parameters: 296,450 (0.27%)
   Total Parameters: 109,189,636
   Dataset Size: 25.0%

📈 Performance Metrics:
   F1 Score: 0.4189
   Exact Match: 0.3379
   Eval Loss: 1.8517

⚡ Energy Consumption:
   Total Energy: 0.064653 kWh
   CPU Energy: 0.020842 kWh (32.2%)
   GPU Energy: 0.034004 kWh (52.6%)
   RAM Energy: 0.009807 kWh (15.2%)

🔌 Average Power Draw:
   CPU Power: 42.50 W
   GPU Power: 67.87 W
   RAM Power: 20.00 W
   Total Power: 130.37 W

🌍 Carbon Footprint:
   Total CO2 Emissions: 0.041532 kg
   Emissions Rate: 0.000023511 kg/s
   Duration: 0.49 hours
   Training Time (Trainer): 0.49 hours

📍 Location & Infrastructure:
   Country: Taiwan (TWN)
   Region: taipei city
   On Cloud: N
   PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
   OS: Linux-6.6.105+-x86_64-with-glibc2.35
   CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
   GPU: 1 x Tesla T

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 21:10:03] [setup] RAM Tracking...
[codecarbon INFO @ 21:10:03] [setup] CPU Tracking...


trainable params: 296,450 || all params: 109,189,636 || trainable%: 0.2715


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 21:10:04] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 21:10:04] [setup] GPU Tracking...
[codecarbon INFO @ 21:10:04] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 21:10:04] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 21:10:04] >>> Tracker's metadata:
[codecarbon INFO @ 21:10:04]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 21:10:04]   Python version: 3.12.12
[codecarbon INFO @ 21:10:04]   CodeCarbon version: 3.1.1
[codecarbon INFO @ 21:10:04]   Available RAM : 50.990 GB
[codecarbon INFO @ 21:10:04]   CPU count: 8 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 21:10:04

🚀 Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.5304,1.509539,0.404566,0.473776
2,1.4008,1.450439,0.443629,0.523668
3,1.2925,1.40834,0.458711,0.538517


[codecarbon INFO @ 21:10:15] Energy consumed for RAM : 0.060049 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 21:10:15] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 21:10:15] Energy consumed for All CPU : 0.127617 kWh
[codecarbon INFO @ 21:10:15] Energy consumed for all GPUs : 0.192998 kWh. Total GPU Power : 58.66512977664448 W
[codecarbon INFO @ 21:10:15] 0.380664 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 21:10:16] Energy consumed for RAM : 0.060048 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 21:10:16] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 21:10:16] Energy consumed for All CPU : 0.127615 kWh
[codecarbon INFO @ 21:10:16] Energy consumed for all GPUs : 0.193010 kWh. Total GPU Power : 58.570774351279184 W
[codecarbon INFO @ 21:10:16] 0.380673 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 21:1

📊 Evaluating LoRA model...


[codecarbon INFO @ 22:05:15] Energy consumed for RAM : 0.078371 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:05:15] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:05:15] Energy consumed for All CPU : 0.166557 kWh
[codecarbon INFO @ 22:05:15] Energy consumed for all GPUs : 0.256587 kWh. Total GPU Power : 68.38344813708413 W
[codecarbon INFO @ 22:05:15] 0.501515 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 22:05:16] Energy consumed for RAM : 0.078370 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:05:16] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:05:16] Energy consumed for All CPU : 0.166554 kWh
[codecarbon INFO @ 22:05:16] Energy consumed for all GPUs : 0.256597 kWh. Total GPU Power : 68.39972561339043 W
[codecarbon INFO @ 22:05:16] 0.501520 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 22:05


  LoRA RESULTS SUMMARY (Rank 8, 50.0% Dataset)

📦 Model Configuration:
   Training Method: LoRA
   LoRA Rank: 8
   Trainable Parameters: 296,450 (0.27%)
   Total Parameters: 109,189,636
   Dataset Size: 50.0%

📈 Performance Metrics:
   F1 Score: 0.5385
   Exact Match: 0.4587
   Eval Loss: 1.4083

⚡ Energy Consumption:
   Total Energy: 0.121143 kWh
   CPU Energy: 0.039036 kWh (32.2%)
   GPU Energy: 0.063740 kWh (52.6%)
   RAM Energy: 0.018368 kWh (15.2%)

🔌 Average Power Draw:
   CPU Power: 42.50 W
   GPU Power: 67.77 W
   RAM Power: 20.00 W
   Total Power: 130.27 W

🌍 Carbon Footprint:
   Total CO2 Emissions: 0.077820 kg
   Emissions Rate: 0.000023522 kg/s
   Duration: 0.92 hours
   Training Time (Trainer): 0.92 hours

📍 Location & Infrastructure:
   Country: Taiwan (TWN)
   Region: taipei city
   On Cloud: N
   PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
   OS: Linux-6.6.105+-x86_64-with-glibc2.35
   CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
   GPU: 1 x Tesla T

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[codecarbon INFO @ 22:06:30] Energy consumed for RAM : 0.078788 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:06:30] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:06:30] Energy consumed for All CPU : 0.167442 kWh
[codecarbon INFO @ 22:06:30] Energy consumed for all GPUs : 0.258027 kWh. Total GPU Power : 67.94288109160345 W
[codecarbon INFO @ 22:06:30] 0.504256 kWh of electricity and 0.000000 L of water were used since the beginning.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 22:06:30] [setup] RAM Tracking...
[codecarbon INFO @ 22:06:30] [setup] CPU Tracking.

trainable params: 296,450 || all params: 109,189,636 || trainable%: 0.2715


[codecarbon INFO @ 22:06:31] Energy consumed for RAM : 0.078786 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:06:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:06:31] Energy consumed for All CPU : 0.167438 kWh
[codecarbon INFO @ 22:06:31] Energy consumed for all GPUs : 0.258032 kWh. Total GPU Power : 66.9712169430532 W
[codecarbon INFO @ 22:06:31] 0.504257 kWh of electricity and 0.000000 L of water were used since the beginning.
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 22:06:32] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 22:06:32] [setup] GPU Tracking...
[codecarbon INFO @ 22:06:32] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 22:06:32] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Met

🚀 Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.4857,1.369405,0.458134,0.532284
2,1.2845,1.231845,0.517884,0.593179


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[codecarbon INFO @ 22:21:00] Energy consumed for RAM : 0.083618 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:21:00] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:21:00] Energy consumed for All CPU : 0.177708 kWh
[codecarbon INFO @ 22:21:00] Energy consumed for all GPUs : 0.274749 kWh. Total GPU Power : 69.3824283874038 W
[codecarbon INFO @ 22:21:00] 0.536075 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 22:21:01] Energy consumed for RAM : 0.083616 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 22:21:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 22:21:01] Energy consumed for All CPU : 0.177704 kWh
[codecarbon INFO @ 22:21:01] Energy consumed for all GPUs : 0.274759 kWh. Total GPU Power : 69.41809803134784 W
[codecarbon INFO @ 22:21:01] 0.536079 kWh of electricity and 0.000000 L 

Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.4857,1.369405,0.458134,0.532284
2,1.2845,1.231845,0.517884,0.593179
3,1.2206,1.255903,0.512609,0.590422


[codecarbon INFO @ 23:22:00] Energy consumed for RAM : 0.103939 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:22:00] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:22:00] Energy consumed for All CPU : 0.220894 kWh
[codecarbon INFO @ 23:22:00] Energy consumed for all GPUs : 0.345247 kWh. Total GPU Power : 69.02886489342862 W
[codecarbon INFO @ 23:22:00] 0.670080 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:22:00] 0.023498 g.CO2eq/s mean an estimation of 741.0304894993408 kg.CO2eq/year
[codecarbon INFO @ 23:22:01] Energy consumed for RAM : 0.103937 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:22:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:22:01] Energy consumed for All CPU : 0.220890 kWh
[codecarbon INFO @ 23:22:01] Energy consumed for all GPUs : 0.345257 kWh. Total GPU Power : 69.01767020807596 W
[codecarbon INFO @ 23:22:01] 0.670

📊 Evaluating LoRA model...


[codecarbon INFO @ 23:33:15] Energy consumed for RAM : 0.107687 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:33:15] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:33:15] Energy consumed for All CPU : 0.228859 kWh
[codecarbon INFO @ 23:33:15] Energy consumed for all GPUs : 0.358245 kWh. Total GPU Power : 68.36117581882513 W
[codecarbon INFO @ 23:33:15] 0.694791 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:33:16] Energy consumed for RAM : 0.107684 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:33:16] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:33:16] Energy consumed for All CPU : 0.228854 kWh
[codecarbon INFO @ 23:33:16] Energy consumed for all GPUs : 0.358255 kWh. Total GPU Power : 68.58191517438831 W
[codecarbon INFO @ 23:33:16] 0.694794 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:33


  LoRA RESULTS SUMMARY (Rank 8, 80.0% Dataset)

📦 Model Configuration:
   Training Method: LoRA
   LoRA Rank: 8
   Trainable Parameters: 296,450 (0.27%)
   Total Parameters: 109,189,636
   Dataset Size: 80.0%

📈 Performance Metrics:
   F1 Score: 0.5932
   Exact Match: 0.5179
   Eval Loss: 1.2318

⚡ Energy Consumption:
   Total Energy: 0.189944 kWh
   CPU Energy: 0.061218 kWh (32.2%)
   GPU Energy: 0.099921 kWh (52.6%)
   RAM Energy: 0.028805 kWh (15.2%)

🔌 Average Power Draw:
   CPU Power: 42.50 W
   GPU Power: 68.46 W
   RAM Power: 20.00 W
   Total Power: 130.96 W

🌍 Carbon Footprint:
   Total CO2 Emissions: 0.122016 kg
   Emissions Rate: 0.000023517 kg/s
   Duration: 1.44 hours
   Training Time (Trainer): 1.44 hours

📍 Location & Infrastructure:
   Country: Taiwan (TWN)
   Region: taipei city
   On Cloud: N
   PUE (Power Usage Effectiveness): 1.0

💻 System Specifications:
   OS: Linux-6.6.105+-x86_64-with-glibc2.35
   CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (8 cores)
   GPU: 1 x Tesla T

In [None]:
# STEP 6.1: Results and Analysis
# ============================================================================

# Create summary DataFrame
results_df_lora = pd.DataFrame(result_lora)

print("\n" + "="*60)
print("  LoRA RESULTS SUMMARY")
print("="*60)
print(results_df_lora.to_string(index=False))

# Save to CSV
results_df_lora.to_csv("/content/drive/MyDrive/bert_lora_results.csv", index=False)
print("\n✅ LoRA results saved!")

# Print comparison
print("\n" + "="*80)
print("  LoRA DATASET SIZE COMPARISON")
print("="*80)
print(results_df_lora[['dataset_size%', 'trainable_params', 'trainable_percentage',
                       'f1_score', 'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Efficiency Analysis
print("\n" + "="*80)
print("  EFFICIENCY ANALYSIS")
print("="*80)

baseline = results_df_lora[results_df_lora['dataset_size%'] == 50].iloc[0]  # Use 50% as baseline

for _, row in results_df_lora.iterrows():
    dataset_pct = row['dataset_size%']
    samples_ratio = row['train_samples'] / baseline['train_samples']
    f1_diff = row['f1_score'] - baseline['f1_score']
    emissions_diff = row['emissions_kg'] - baseline['emissions_kg']

    print(f"\n{dataset_pct}% Dataset:")
    print(f"  Training Samples: {row['train_samples']:,}")
    print(f"  Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.2f}%)")
    print(f"  vs 50%: {samples_ratio:.2f}x training data")
    print(f"  F1 Score: {row['f1_score']:.4f} ({f1_diff:+.4f} vs 50%)")
    print(f"  Emissions: {row['emissions_kg']:.6f} kg ({emissions_diff:+.6f} vs 50%)")
    print(f"  Training Time: {row['training_time_hours']:.2f} hours")

    # Efficiency metric: F1 per kg CO2
    efficiency = row['f1_score'] / row['emissions_kg'] if row['emissions_kg'] > 0 else 0
    print(f"  Efficiency (F1/kg CO₂): {efficiency:.2f}")



  LoRA RESULTS SUMMARY
training_method model_name  lora_rank  dataset_size%  train_samples  valid_samples  trainable_params  total_params  trainable_percentage  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code      region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count    gpu_model  ram_total_size_gb  pue codecarbon_version
           LoRA       BERT          8             25          32579          12134            296450     109189636                0.2715  0.418946     0.337894   1.851662             0.490496                 0.000024      0.041532 2025-11-30T21:08:47       1766.452797        0.490681             0.064653        0.020842        0.034004      

In [None]:

# ============================================================================
# LoRA Visualizations
# ============================================================================

# PLOT 1: LoRA Energy Consumption by Dataset Size
print("\n📊 Creating LoRA Energy Plot...")
df_sorted_lora = results_df_lora.sort_values('train_samples')

fig_lora_energy = go.Figure()

fig_lora_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.update_layout(
    title=dict(text="LoRA: Energy Consumption by Dataset Size", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_lora_energy.show()
fig_lora_energy.write_html("/content/drive/MyDrive/lora_energy_by_dataset.html")
print("✅ LoRA Energy Plot saved: lora_energy_by_dataset.html")

# PLOT 2: LoRA Performance & Emissions (Dual Y-axis)
print("\n📊 Creating LoRA Performance vs Emissions Plot...")
df_sorted_lora = results_df_lora.sort_values('train_samples')

fig_lora_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_lora_perf.add_trace(
    go.Bar(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=True
)

fig_lora_perf.update_xaxes(title_text="Dataset Size (%)")
fig_lora_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_lora_perf.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig_lora_perf.update_layout(
    title=dict(text="LoRA: Performance vs Carbon Emissions by Dataset Size", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_lora_perf.show()
fig_lora_perf.write_html("/content/drive/MyDrive/lora_performance_emissions.html")
print("✅ LoRA Performance Plot saved: lora_performance_emissions.html")


📊 Creating LoRA Energy Plot...


✅ LoRA Energy Plot saved: lora_energy_by_dataset.html

📊 Creating LoRA Performance vs Emissions Plot...


✅ LoRA Performance Plot saved: lora_performance_emissions.html


# Training Strategy 3: Few-shot Learning With Frozen Backbone (Model BERT)

In [None]:
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
print("="*80)
print("  FEW-SHOT LEARNING WITH FROZEN BACKBONE")
print("="*80)

  FEW-SHOT LEARNING WITH FROZEN BACKBONE


# STEP 1: Custom Model with Frozen Backbone

In [None]:
# STEP 1: Creating And Training Few-shot Model
# ============================================================================

def create_frozen_model(model_name="bert-base-uncased"):
    """Create model with frozen backbone (only QA head is trainable)."""

    # Load base model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Freeze ALL parameters first
    for param in model.parameters():
        param.requires_grad = False

    # Unfreeze ONLY the QA head (classifier layer)
    # For BERT: qa_outputs layer
    for param in model.qa_outputs.parameters():
        param.requires_grad = True

    # Count parameters
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())

    print(f"\n📦 Model Configuration:")
    print(f"   Total Parameters: {total_params:,}")
    print(f"   Trainable Parameters: {trainable_params:,}")
    print(f"   Frozen Parameters: {total_params - trainable_params:,}")
    print(f"   Trainable Percentage: {100 * trainable_params / total_params:.4f}%")

    return model, trainable_params, total_params

def prepare_fewshot_dataset(train_data, num_shots, preprocess_fn):
    """Prepare few-shot dataset with specified number of examples."""

    # Select only num_shots examples
    train_subset = train_data.select(range(num_shots))

    print(f"🎯 Creating few-shot dataset with {num_shots} examples...")

    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    # After tokenization with sliding window, we get more samples
    actual_samples = len(tokenized_train)
    print(f"   Original examples: {num_shots}")
    print(f"   After tokenization (with sliding window): {actual_samples} samples")

    return tokenized_train, num_shots  # Return original num_shots for tracking

def train_fewshot_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                        num_shots, model_name):
    """Train BERT model with frozen backbone (few-shot learning)."""

    # Create frozen model
    model, trainable_params, total_params = create_frozen_model(model_name)

    # Setup output directory
    output_dir = f"results_bert_fewshot_{num_shots}shots"

    # Training arguments - DIFFERENT from full fine-tuning
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=5e-4,  # Higher LR since we're only training the head
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=10,  # More epochs for few-shot
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=50,
        greater_is_better=True,
        warmup_ratio=0.1
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_FewShot_{num_shots}shots",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print(f"\n🚀 Training few-shot model ({num_shots} examples)...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params

# STEP 2: Few-shot Dataset Preparation

In [None]:
# STEP 2: Evaluating The Few-shot Model On Different Shot Sizes
# ============================================================================

def evaluate_and_save_fewshot(trainer, train_results, emissions_data, output_dir,
                               num_shots, trainable_params, total_params):
    """Evaluate few-shot model and save results."""

    print("📊 Evaluating few-shot model...")
    eval_results = trainer.evaluate()

    trainable_percentage = 100 * trainable_params / total_params

    # Compile results
    result_entry = {
        "training_method": "Few-Shot (Frozen Backbone)",
        "model_name": "BERT",
        "num_shots": num_shots,
        "train_samples": num_shots,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,
        "emissions_kg": emissions_data.emissions,
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,

        # Energy
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_model": emissions_data.cpu_model,
        "cpu_count": emissions_data.cpu_count,
        "gpu_model": emissions_data.gpu_model,
        "gpu_count": emissions_data.gpu_count,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print summary
    print(f"\n{'='*80}")
    print(f"  FEW-SHOT LEARNING RESULTS ({num_shots} examples)")
    print(f"{'='*80}")
    print(f"\n📦 Model Configuration:")
    print(f"   Training Method: Few-Shot (Frozen Backbone)")
    print(f"   Training Examples: {num_shots}")
    print(f"   Trainable Parameters: {trainable_params:,} ({trainable_percentage:.4f}%)")
    print(f"   Frozen Parameters: {total_params - trainable_params:,}")

    print(f"\n📈 Performance:")
    print(f"   F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"   Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"   Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n⚡ Energy:")
    print(f"   Total: {emissions_data.energy_consumed:.6f} kWh")
    if emissions_data.energy_consumed > 0:
        print(f"   GPU: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
        print(f"   CPU: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\n🌍 Carbon:")
    print(f"   CO₂ Emissions: {emissions_data.emissions:.6f} kg")
    print(f"   Training Time: {train_results.metrics['train_runtime']/3600:.2f} hours")
    print(f"{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")
    print(f"✅ Model saved to {output_dir}/final_model")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

def run_fewshot_experiment(num_shots, train_data, eval_data, tokenizer, preprocess_fn,
                           compute_metrics_fn, model_name):
    """Run complete few-shot learning experiment."""

    print(f"\n{'='*60}")
    print(f"  Few-Shot Learning with {num_shots} examples")
    print(f"{'='*60}")

    # Step 1: Prepare few-shot dataset
    tokenized_train, num_shots = prepare_fewshot_dataset(train_data, num_shots, preprocess_fn)

    # Step 2: Train with frozen backbone
    trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params = train_fewshot_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        num_shots, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_fewshot(
        trainer, train_results, emissions_data, output_dir,
        num_shots, trainable_params, total_params
    )

    return result_entry

# STEP 3: Few-shot Training Function

In [None]:
result_fewshot = []

# Experiment 1: 100-shot
print("\n" + "="*80)
print("  EXPERIMENT 1: 100-shot Learning")
print("="*80)

result_100 = run_fewshot_experiment(
    num_shots=100,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_100)

# Experiment 2: 500-shot
print("\n" + "="*80)
print("  EXPERIMENT 2: 500-shot Learning")
print("="*80)

result_500 = run_fewshot_experiment(
    num_shots=500,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_500)

# Experiment 3: 1000-shot
print("\n" + "="*80)
print("  EXPERIMENT 3: 1000-shot Learning")
print("="*80)

result_1000 = run_fewshot_experiment(
    num_shots=1000,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_1000)



  EXPERIMENT 1: 100-shot Learning

  Few-Shot Learning with 100 examples
🎯 Creating few-shot dataset with 100 examples...


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

   Original examples: 100
   After tokenization (with sliding window): 100 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 23:34:42] [setup] RAM Tracking...
[codecarbon INFO @ 23:34:42] [setup] CPU Tracking...



📦 Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 23:34:43] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 23:34:43] [setup] GPU Tracking...
[codecarbon INFO @ 23:34:43] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 23:34:43] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 23:34:43] >>> Tracker's metadata:
[codecarbon INFO @ 23:34:43]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 23:34:43]   Python version: 3.12.12
[codecarbon INFO @ 23:34:43]   CodeCarbon version: 3.1.1
[codecarbon INFO @ 23:34:43]   Available RAM : 50.990 GB
[codecarbon INFO @ 23:34:43]   CPU count: 8 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 23:34:43


🚀 Training few-shot model (100 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.918365,0.0,0.014177
2,No log,5.860294,0.0,0.012638
3,No log,5.816506,0.000165,0.012125
4,No log,5.783706,0.000494,0.012723
5,No log,5.759678,0.000659,0.013084
6,No log,5.74044,0.000659,0.013178
7,No log,5.724514,0.000824,0.013116
8,5.617400,5.714353,0.000824,0.012996
9,5.617400,5.708546,0.000907,0.012971
10,5.617400,5.706142,0.000907,0.012836


[codecarbon INFO @ 23:34:45] Energy consumed for RAM : 0.108186 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:34:45] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:34:45] Energy consumed for All CPU : 0.229921 kWh
[codecarbon INFO @ 23:34:45] Energy consumed for all GPUs : 0.359694 kWh. Total GPU Power : 33.136946517544246 W
[codecarbon INFO @ 23:34:45] 0.697801 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:34:46] Energy consumed for RAM : 0.108184 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:34:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:34:46] Energy consumed for All CPU : 0.229916 kWh
[codecarbon INFO @ 23:34:46] Energy consumed for all GPUs : 0.359704 kWh. Total GPU Power : 34.937687489014415 W
[codecarbon INFO @ 23:34:46] 0.697804 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:

📊 Evaluating few-shot model...


[codecarbon INFO @ 23:46:31] Energy consumed for RAM : 0.112102 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:46:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:46:31] Energy consumed for All CPU : 0.238241 kWh
[codecarbon INFO @ 23:46:31] Energy consumed for all GPUs : 0.373241 kWh. Total GPU Power : 67.80261928889695 W
[codecarbon INFO @ 23:46:31] 0.723585 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:46:31] Energy consumed for RAM : 0.112099 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:46:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:46:31] Energy consumed for All CPU : 0.238235 kWh
[codecarbon INFO @ 23:46:31] Energy consumed for all GPUs : 0.373249 kWh. Total GPU Power : 67.8465780164066 W
[codecarbon INFO @ 23:46:31] 0.723583 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:46:


  FEW-SHOT LEARNING RESULTS (100 examples)

📦 Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 100
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

📈 Performance:
   F1 Score: 0.0142
   Exact Match: 0.0000
   Eval Loss: 5.9184

⚡ Energy:
   Total: 0.025440 kWh
   GPU: 0.013364 kWh (52.5%)
   CPU: 0.008212 kWh (32.3%)

🌍 Carbon:
   CO₂ Emissions: 0.016342 kg
   Training Time: 0.19 hours
✅ Model saved to results_bert_fewshot_100shots/final_model

  EXPERIMENT 2: 500-shot Learning

  Few-Shot Learning with 500 examples
🎯 Creating few-shot dataset with 500 examples...


Map:   0%|          | 0/500 [00:00<?, ? examples/s]

   Original examples: 500
   After tokenization (with sliding window): 527 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 23:47:30] [setup] RAM Tracking...
[codecarbon INFO @ 23:47:30] [setup] CPU Tracking...



📦 Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


[codecarbon INFO @ 23:47:31] Energy consumed for RAM : 0.112435 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:47:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:47:31] Energy consumed for All CPU : 0.238949 kWh
[codecarbon INFO @ 23:47:31] Energy consumed for all GPUs : 0.374386 kWh. Total GPU Power : 66.32610863394012 W
[codecarbon INFO @ 23:47:31] 0.725770 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:47:31] Energy consumed for RAM : 0.112432 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:47:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:47:31] Energy consumed for All CPU : 0.238943 kWh
[codecarbon INFO @ 23:47:31] Energy consumed for all GPUs : 0.374386 kWh. Total GPU Power : 64.93370236849292 W
[codecarbon INFO @ 23:47:31] 0.725761 kWh of electricity and 0.000000 L of water were used since the beginning.
 Linux OS detected: Plea


🚀 Training few-shot model (500 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.757192,0.0,0.01236
2,5.768600,5.364319,0.001483,0.015336
3,5.768600,5.06591,0.002884,0.016531
4,5.000800,4.794398,0.005522,0.018416
5,4.595800,4.57942,0.016895,0.029189
6,4.595800,4.466683,0.025878,0.038067
7,4.363200,4.348217,0.045492,0.056791
8,4.251000,4.287825,0.057277,0.068709
9,4.251000,4.254209,0.064529,0.076114
10,4.173400,4.241534,0.068073,0.07943


[codecarbon INFO @ 23:47:46] Energy consumed for RAM : 0.112518 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:47:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:47:46] Energy consumed for All CPU : 0.239126 kWh
[codecarbon INFO @ 23:47:46] Energy consumed for all GPUs : 0.374638 kWh. Total GPU Power : 60.61104675814475 W
[codecarbon INFO @ 23:47:46] 0.726282 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:47:46] Energy consumed for RAM : 0.112515 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:47:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:47:46] Energy consumed for All CPU : 0.239120 kWh
[codecarbon INFO @ 23:47:46] Energy consumed for all GPUs : 0.374646 kWh. Total GPU Power : 62.47756851615978 W
[codecarbon INFO @ 23:47:46] 0.726281 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:47

📊 Evaluating few-shot model...


[codecarbon INFO @ 23:59:46] Energy consumed for RAM : 0.116516 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:59:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:59:46] Energy consumed for All CPU : 0.247622 kWh
[codecarbon INFO @ 23:59:46] Energy consumed for all GPUs : 0.388468 kWh. Total GPU Power : 68.5100612307848 W
[codecarbon INFO @ 23:59:46] 0.752606 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 23:59:46] Energy consumed for RAM : 0.116513 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 23:59:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 23:59:46] Energy consumed for All CPU : 0.247617 kWh
[codecarbon INFO @ 23:59:46] Energy consumed for all GPUs : 0.388476 kWh. Total GPU Power : 68.52599874317525 W
[codecarbon INFO @ 23:59:46] 0.752605 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 00:00:


  FEW-SHOT LEARNING RESULTS (500 examples)

📦 Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 500
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

📈 Performance:
   F1 Score: 0.0794
   Exact Match: 0.0681
   Eval Loss: 4.2415

⚡ Energy:
   Total: 0.026376 kWh
   GPU: 0.013853 kWh (52.5%)
   CPU: 0.008516 kWh (32.3%)

🌍 Carbon:
   CO₂ Emissions: 0.016943 kg
   Training Time: 0.20 hours
✅ Model saved to results_bert_fewshot_500shots/final_model

  EXPERIMENT 3: 1000-shot Learning

  Few-Shot Learning with 1000 examples
🎯 Creating few-shot dataset with 1000 examples...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

   Original examples: 1000
   After tokenization (with sliding window): 1027 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 00:00:45] [setup] RAM Tracking...
[codecarbon INFO @ 00:00:45] [setup] CPU Tracking...



📦 Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


[codecarbon INFO @ 00:00:46] Energy consumed for RAM : 0.116849 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 00:00:46] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 00:00:46] Energy consumed for All CPU : 0.248330 kWh
[codecarbon INFO @ 00:00:46] Energy consumed for all GPUs : 0.389605 kWh. Total GPU Power : 65.00325750200675 W
[codecarbon INFO @ 00:00:46] 0.754784 kWh of electricity and 0.000000 L of water were used since the beginning.
 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 00:00:46] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 00:00:46] [setup] GPU Tracking...
[codecarbon INFO @ 00:00:46] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 00:00:46] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Me


🚀 Training few-shot model (1000 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,5.8973,5.586273,0.000577,0.013589
2,5.2018,5.083384,0.004533,0.01866
3,4.6505,4.728249,0.006099,0.021722
4,4.1642,4.508087,0.007664,0.0243
5,4.0834,4.352459,0.009478,0.025784
6,3.983,4.254833,0.010796,0.02814
7,3.8632,4.190761,0.012444,0.030061
8,3.8426,4.153254,0.013433,0.031232
9,3.8404,4.137147,0.013516,0.031563
10,3.801,4.137458,0.013681,0.031751


[codecarbon INFO @ 00:01:01] Energy consumed for RAM : 0.116932 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 00:01:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 00:01:01] Energy consumed for All CPU : 0.248507 kWh
[codecarbon INFO @ 00:01:01] Energy consumed for all GPUs : 0.389862 kWh. Total GPU Power : 61.655355414840884 W
[codecarbon INFO @ 00:01:01] 0.755301 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 00:01:01] Energy consumed for RAM : 0.116929 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 00:01:01] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 00:01:01] Energy consumed for All CPU : 0.248501 kWh
[codecarbon INFO @ 00:01:01] Energy consumed for all GPUs : 0.389870 kWh. Total GPU Power : 63.38332519949937 W
[codecarbon INFO @ 00:01:01] 0.755301 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 00:0

📊 Evaluating few-shot model...


[codecarbon INFO @ 00:13:31] Energy consumed for RAM : 0.121096 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 00:13:31] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 00:13:31] Energy consumed for All CPU : 0.257356 kWh
[codecarbon INFO @ 00:13:31] Energy consumed for all GPUs : 0.404261 kWh. Total GPU Power : 68.48894514723001 W
[codecarbon INFO @ 00:13:31] 0.782713 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 00:13:32] Energy consumed for RAM : 0.121093 kWh. RAM Power : 20.0 W
[codecarbon INFO @ 00:13:32] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 00:13:32] Energy consumed for All CPU : 0.257351 kWh
[codecarbon INFO @ 00:13:32] Energy consumed for all GPUs : 0.404269 kWh. Total GPU Power : 68.03766543073634 W
[codecarbon INFO @ 00:13:32] 0.782714 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 00:13


  FEW-SHOT LEARNING RESULTS (1000 examples)

📦 Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 1000
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

📈 Performance:
   F1 Score: 0.0318
   Exact Match: 0.0137
   Eval Loss: 4.1375

⚡ Energy:
   Total: 0.027601 kWh
   GPU: 0.014495 kWh (52.5%)
   CPU: 0.008912 kWh (32.3%)

🌍 Carbon:
   CO₂ Emissions: 0.017730 kg
   Training Time: 0.21 hours
✅ Model saved to results_bert_fewshot_1000shots/final_model


# STEP 4: Run Few-shot Experiments

In [None]:
# STEP 8.1: Results and Analysis
# ============================================================================

# Create summary DataFrame
results_df_fewshot = pd.DataFrame(result_fewshot)

print("\n" + "="*60)
print("  FEW-SHOT LEARNING RESULTS SUMMARY")
print("="*60)
print(results_df_fewshot[['num_shots', 'trainable_percentage', 'f1_score',
                          'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Save to CSV
results_df_fewshot.to_csv("/content/drive/MyDrive/bert_fewshot_results.csv", index=False)
print("\n✅ Few-shot results saved!")


  FEW-SHOT LEARNING RESULTS SUMMARY
 num_shots  trainable_percentage  f1_score  exact_match  emissions_kg  training_time_hours
       100              0.001412  0.014177     0.000000      0.016342             0.193151
       500              0.001412  0.079430     0.068073      0.016943             0.200310
      1000              0.001412  0.031751     0.013681      0.017730             0.209650

✅ Few-shot results saved!


# STEP 5: Execute Few-shot Experiments

In [None]:
# FEW-SHOT EFFICIENCY ANALYSIS
# ============================================================================

print("\n" + "="*80)
print("  FEW-SHOT EFFICIENCY ANALYSIS")
print("="*80)

# Use 500-shot as baseline (middle ground)
baseline = results_df_fewshot[results_df_fewshot['num_shots'] == 500].iloc[0]

for _, row in results_df_fewshot.iterrows():
    shots = row['num_shots']
    samples_ratio = row['num_shots'] / baseline['num_shots']
    f1_diff = row['f1_score'] - baseline['f1_score']
    emissions_diff = row['emissions_kg'] - baseline['emissions_kg']
    time_diff = row['training_time_hours'] - baseline['training_time_hours']

    print(f"\n{shots}-Shot Learning:")
    print(f"  Training Examples: {row['num_shots']:,}")
    print(f"  Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.4f}%)")
    print(f"  vs 500-shot: {samples_ratio:.2f}x training data")
    print(f"  F1 Score: {row['f1_score']:.4f} ({f1_diff:+.4f} vs 500-shot)")
    print(f"  Emissions: {row['emissions_kg']:.6f} kg ({emissions_diff:+.6f} vs 500-shot)")
    print(f"  Training Time: {row['training_time_hours']:.2f} hours ({time_diff:+.2f} vs 500-shot)")

    # Efficiency metrics
    efficiency_co2 = row['f1_score'] / row['emissions_kg'] if row['emissions_kg'] > 0 else 0
    efficiency_time = row['f1_score'] / row['training_time_hours'] if row['training_time_hours'] > 0 else 0
    efficiency_samples = row['f1_score'] / row['num_shots'] if row['num_shots'] > 0 else 0

    print(f"  Efficiency (F1/kg CO₂): {efficiency_co2:.2f}")
    print(f"  Efficiency (F1/hour): {efficiency_time:.4f}")
    print(f"  Efficiency (F1/sample): {efficiency_samples:.6f}")

# ============================================================================



  FEW-SHOT EFFICIENCY ANALYSIS

100-Shot Learning:
  Training Examples: 100
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 0.20x training data
  F1 Score: 0.0142 (-0.0653 vs 500-shot)
  Emissions: 0.016342 kg (-0.000601 vs 500-shot)
  Training Time: 0.19 hours (-0.01 vs 500-shot)
  Efficiency (F1/kg CO₂): 0.87
  Efficiency (F1/hour): 0.0734
  Efficiency (F1/sample): 0.000142

500-Shot Learning:
  Training Examples: 500
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 1.00x training data
  F1 Score: 0.0794 (+0.0000 vs 500-shot)
  Emissions: 0.016943 kg (+0.000000 vs 500-shot)
  Training Time: 0.20 hours (+0.00 vs 500-shot)
  Efficiency (F1/kg CO₂): 4.69
  Efficiency (F1/hour): 0.3965
  Efficiency (F1/sample): 0.000159

1000-Shot Learning:
  Training Examples: 1,000
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 2.00x training data
  F1 Score: 0.0318 (-0.0477 vs 500-shot)
  Emissions: 0.017730 kg (+0.000787 vs 500-shot)
  Training Time: 0.21 hours (+0.01 vs 500-shot)
  Efficie

# STEP 6: Save Few-shot Results

In [None]:
# Few-shot Visualizations
# ============================================================================

# PLOT 1: Few-Shot Energy Consumption by Shots
print("\n📊 Creating Few-Shot Energy Plot...")
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_energy = go.Figure()

fig_fewshot_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.update_layout(
    title=dict(text="Few-Shot: Energy Consumption by Number of Examples", font=dict(size=18)),
    xaxis_title='Number of Training Examples',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_fewshot_energy.show()
fig_fewshot_energy.write_html("/content/drive/MyDrive/fewshot_energy_by_shots.html")
print("✅ Few-Shot Energy Plot saved: fewshot_energy_by_shots.html")

# PLOT 2: Few-Shot Performance & Emissions by Shots (Dual Y-axis)
print("\n📊 Creating Few-Shot Performance vs Emissions Plot...")
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_fewshot_perf.add_trace(
    go.Bar(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['emissions_kg'],
        name='CO₂ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO₂</b>: %{y:.6f} kg<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=True
)

fig_fewshot_perf.update_xaxes(title_text="Number of Training Examples")
fig_fewshot_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_fewshot_perf.update_yaxes(title_text="CO₂ Emissions (kg)", secondary_y=True)

fig_fewshot_perf.update_layout(
    title=dict(text="Few-Shot: Performance vs Carbon Emissions", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_fewshot_perf.show()
fig_fewshot_perf.write_html("/content/drive/MyDrive/fewshot_performance_emissions.html")
print("✅ Few-Shot Performance Plot saved: fewshot_performance_emissions.html")




📊 Creating Few-Shot Energy Plot...


✅ Few-Shot Energy Plot saved: fewshot_energy_by_shots.html

📊 Creating Few-Shot Performance vs Emissions Plot...


✅ Few-Shot Performance Plot saved: fewshot_performance_emissions.html
