# Quantifying the Environmental Cost of AI: Carbon Emissions in Language Model Fine-Tuning for Question Answering

> ### **Project Goal** : As language models continue to play a larger role in natural language processing, their environmental impact has become an important issue to consider. While much of the research in this area focuses on improving model accuracy, the energy use and carbon footprint involved in training these systems are often overlooked or poorly documented. This project aims to explore that imbalance by studying how improvements in model performance relate to the environmental costs of fine-tuning.


# Training Strategy 1: Full Fine-Tuning (Model DistilBERT)

In [1]:
!pip install transformers
!pip install datasets
!pip install accelerate
!pip install codecarbon
!pip install evaluate codecarbon

Collecting codecarbon
  Downloading codecarbon-3.2.0-py3-none-any.whl.metadata (12 kB)
Collecting fief-client[cli] (from codecarbon)
  Downloading fief_client-0.20.0-py3-none-any.whl.metadata (2.1 kB)
Collecting psutil>=6.0.0 (from codecarbon)
  Downloading psutil-7.1.3-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl.metadata (23 kB)
Collecting rapidfuzz (from codecarbon)
  Downloading rapidfuzz-3.14.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB)
Collecting questionary (from codecarbon)
  Downloading questionary-2.1.1-py3-none-any.whl.metadata (5.4 kB)
Collecting httpx<0.28.0,>=0.21.3 (from fief-client[cli]->codecarbon)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jwcrypto<2.0.0,>=1.4 (from fief-client[cli]->codecarbon)
  Downloading jwcrypto-1.5.6-py3-none-any.whl.metadata (3.1 kB)
Collecting yaspin (from fief-client[cli]->codecarbon)
  Downloading yaspin-3.3.0-py3-none-any.whl.metadata (15 kB)


Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.1/84.1 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


KeyboardInterrupt: 

In [1]:
# Importing Necessary Libraries
import os
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForQuestionAnswering,
    TrainingArguments,
    Trainer,
    default_data_collator
)
import torch
import numpy as np
from datasets import Dataset
import evaluate
from codecarbon import EmissionsTracker
from google.colab import drive
import pandas as pd
from collections import defaultdict
import json

drive.mount('/content/drive')

Mounted at /content/drive


## STEP 1: Loading The Stanford Question Answering Dataset (SQuAD) Dataset

In [2]:
squad = load_dataset("squad_v2")
df_train = pd.DataFrame(squad['train'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

squad_v2/train-00000-of-00001.parquet:   0%|          | 0.00/16.4M [00:00<?, ?B/s]

squad_v2/validation-00000-of-00001.parqu(‚Ä¶):   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/130319 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11873 [00:00<?, ? examples/s]

In [3]:
print("SQuAD Format: ",squad)
print(f"\nFull training set size: {len(squad['train'])}")
print(f"\nValidation set size: {len(squad['validation'])}")

SQuAD Format:  DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

Full training set size: 130319

Validation set size: 11873


In [4]:
df_train.head(10)

Unnamed: 0,id,title,context,question,answers
0,56be85543aeaaa14008c9063,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,When did Beyonce start becoming popular?,"{'text': ['in the late 1990s'], 'answer_start'..."
1,56be85543aeaaa14008c9065,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,What areas did Beyonce compete in when she was...,"{'text': ['singing and dancing'], 'answer_star..."
2,56be85543aeaaa14008c9066,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,When did Beyonce leave Destiny's Child and bec...,"{'text': ['2003'], 'answer_start': [526]}"
3,56bf6b0f3aeaaa14008c9601,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,In what city and state did Beyonce grow up?,"{'text': ['Houston, Texas'], 'answer_start': [..."
4,56bf6b0f3aeaaa14008c9602,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,In which decade did Beyonce become famous?,"{'text': ['late 1990s'], 'answer_start': [276]}"
5,56bf6b0f3aeaaa14008c9603,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,In what R&B group was she the lead singer?,"{'text': ['Destiny's Child'], 'answer_start': ..."
6,56bf6b0f3aeaaa14008c9604,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,What album made her a worldwide known artist?,"{'text': ['Dangerously in Love'], 'answer_star..."
7,56bf6b0f3aeaaa14008c9605,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,Who managed the Destiny's Child group?,"{'text': ['Mathew Knowles'], 'answer_start': [..."
8,56d43c5f2ccc5a1400d830a9,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,When did Beyonc√© rise to fame?,"{'text': ['late 1990s'], 'answer_start': [276]}"
9,56d43c5f2ccc5a1400d830aa,Beyonc√©,Beyonc√© Giselle Knowles-Carter (/biÀêÀàj…ínse…™/ b...,What role did Beyonc√© have in Destiny's Child?,"{'text': ['lead singer'], 'answer_start': [290]}"


## STEP 2: Tokenization For the Model Function

In [5]:
#Autotokenizer automatically picks the correct tokenizer for given model

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [6]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    contexts = [c.strip() for c in examples["context"]]

    # Tokenize
    tokenized = tokenizer(
        questions,
        contexts,
        max_length=384,
        stride=128,
        padding="max_length",
        truncation="only_second",
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
    )

    # Mapping back to original samples
    sample_mapping = tokenized.pop("overflow_to_sample_mapping")
    offset_mapping = tokenized["offset_mapping"]

    start_positions = []
    end_positions = []

    for i, offsets in enumerate(offset_mapping):
        sample_idx = sample_mapping[i]
        answers = examples["answers"][sample_idx]

        # SQuAD v2: no answer case
        if len(answers["answer_start"]) == 0:
            start_positions.append(0)
            end_positions.append(0)
            continue

        start_char = answers["answer_start"][0]
        end_char = start_char + len(answers["text"][0])

        seq_ids = tokenized.sequence_ids(i)

        # Find context section
        context_start = seq_ids.index(1) if 1 in seq_ids else 0
        context_end = len(seq_ids) - 1 - seq_ids[::-1].index(1) if 1 in seq_ids else len(seq_ids) - 1

        # If answer not inside context ‚Üí mark no answer
        if not (offsets[context_start][0] <= start_char and offsets[context_end][1] >= end_char):
            start_positions.append(0)
            end_positions.append(0)
            continue

        # Find start token
        token_start = context_start
        while token_start <= context_end and offsets[token_start][0] <= start_char:
            token_start += 1
        start_positions.append(token_start - 1)

        # Find end token
        token_end = context_end
        while token_end >= context_start and offsets[token_end][1] >= end_char:
            token_end -= 1
        end_positions.append(token_end + 1)

    tokenized["start_positions"] = start_positions
    tokenized["end_positions"] = end_positions

    return tokenized

In [7]:
#Prepareing function for tokenization based of training size of the data.

def prepare_dataset(train_data, size_fraction, preprocess_fn):

    #Create and preprocess a subset of training data.
    num_samples = int(len(train_data) * size_fraction)
    train_subset = train_data.select(range(num_samples))

    print(f"üîÑ Preprocessing {num_samples} training samples...")
    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    return tokenized_train, num_samples

In [8]:
# Preprocess validation set (full)
print("\nüîÑ Preprocessing validation set...")
tokenized_validation = squad["validation"].map(
    preprocess_function,
    batched=True,
    remove_columns=squad["validation"].column_names
)


üîÑ Preprocessing validation set...


Map:   0%|          | 0/11873 [00:00<?, ? examples/s]

## STEP 3: Training The DistilBert Model Functions

In [9]:
#Model Architecture:
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")
print(f"\n{'='*80}")
print("\nüõ† BERT Model Architecture:")
print(f"{'='*80}")
print("\nTransformer layers:", model.config.num_hidden_layers)
print("Hidden size:", model.config.hidden_size)
print("Intermediate feed-forward size:", model.config.intermediate_size)
print("Attention heads:", model.config.num_attention_heads)
print("Max positional embeddings:", model.config.max_position_embeddings)
print("Vocabulary size:", model.config.vocab_size)


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.




üõ† BERT Model Architecture:

Transformer layers: 12
Hidden size: 768
Intermediate feed-forward size: 3072
Attention heads: 12
Max positional embeddings: 512
Vocabulary size: 30522


In [10]:
# Custom compute metrics function for F1 and Exact Match
def compute_metrics(pred):
    predictions, labels = pred
    start_preds = np.argmax(predictions[0], axis=1)
    end_preds = np.argmax(predictions[1], axis=1)

    start_true = labels[0]
    end_true = labels[1]

    # Calculate exact match
    exact_matches = ((start_preds == start_true) & (end_preds == end_true)).sum()
    exact_match = exact_matches / len(start_true)

    # Calculate F1 score (token overlap)
    f1_scores = []
    for start_p, end_p, start_t, end_t in zip(start_preds, end_preds, start_true, end_true):
        pred_tokens = set(range(start_p, end_p + 1))
        true_tokens = set(range(start_t, end_t + 1))

        if len(pred_tokens) == 0 and len(true_tokens) == 0:
            f1_scores.append(1.0)
        elif len(pred_tokens) == 0 or len(true_tokens) == 0:
            f1_scores.append(0.0)
        else:
            overlap = len(pred_tokens & true_tokens)
            precision = overlap / len(pred_tokens)
            recall = overlap / len(true_tokens)
            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
            f1_scores.append(f1)

    avg_f1 = np.mean(f1_scores)

    return {
        "exact_match": exact_match,
        "f1": avg_f1
    }

In [11]:
def train_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                size_fraction, model_name="bert-base-uncased"):

    # Load fresh model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Setup output directory
    output_dir = f"results_bert_{int(size_fraction*100)}pct"

    # Training arguments
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=2,
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_{int(size_fraction*100)}pct",
        output_dir=output_dir
    )
    tracker.start()

    # Train
    print("üèãÔ∏è Training model...")
    train_results = trainer.train()

    # Stop carbon tracking
    tracker.stop()

    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir


## STEP 4: Evaluating And Saving The Results Functions

In [12]:
def evaluate_and_save(trainer, train_results, emissions_data, output_dir,
                      size_fraction, num_samples):
    #Evaluate model, print results, and save artifacts.

    # Evaluate
    print("üìä Evaluating model...")
    eval_results = trainer.evaluate()

    # Compile results
    result_entry = {
        "training_method": "Full Fine-Tuning",
        "model_name": "BERT",
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data (direct access to EmissionsData attributes)
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,  # Power Usage Effectiveness
        "codecarbon_version": emissions_data.codecarbon_version,

    }


    # Print summary
    print(f"\n{'='*80}")
    print(f"\nüìà FINE-TUNING RESULTS SUMMARY FOR {size_fraction*100}% DATASET:")
    print(f"{'='*80}")
    print(f"  Training Method: Full Fine-Tuning")
    print(f"  Model: BERT")

    print(f"\nüéØ Performance Metrics:")
    print(f"  F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"  Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"  Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n‚ö° Energy Consumption:")
    print(f"  Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"  CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"  GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"  RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nüîå Average Power Draw:")
    print(f"  CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"  GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"  RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"  Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\nüå± Carbon Footprint:")
    print(f"  Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"  Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"  Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"  Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\nüìç Location & Infrastructure:")
    print(f"  Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"  Region: {emissions_data.region}")
    print(f"  On Cloud: {emissions_data.on_cloud}")
    print(f"  PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\nüíª System Specifications:")
    print(f"  OS: {emissions_data.os}")
    print(f"  CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"  GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"  GPU: None detected")
    print(f"  RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"  Python: {emissions_data.python_version}")

    print(f"\n{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")

    # Clear GPU memory
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

### STEP 4.1: Training and Evaluating The Model On Different DataSet Sizes

> We will be training our model on various data sizes from our SQuAD dataset.
>
> Training Data Variation: [25%, 50%, 80%]

In [13]:
def run_experiment(size_fraction, train_data, eval_data, tokenizer,
                   preprocess_fn, compute_metrics_fn, model_name="bert-base-uncased"):

    #Run a complete training experiment for a given dataset size.


    print(f"\n{'='*60}")
    print(f"üöÄ Training with {size_fraction*100}% of training data")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train model
    trainer, train_results, emissions_data, output_dir = train_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save(
        trainer, train_results, emissions_data, output_dir,
        size_fraction, num_samples
    )

    return result_entry

In [14]:
# Store results
results_summary = []

In [16]:
#Considering 25% of data for training the model
result1 = run_experiment(
        size_fraction=0.25,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )

results_summary.append(result1)


üöÄ Training with 25.0% of training data
üîÑ Preprocessing 32579 training samples...


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(
[codecarbon INFO @ 03:24:53] [setup] RAM Tracking...
[codecarbon INFO @ 03:24:53] [setup] CPU Tracking...
[codecarbon INFO @ 03:24:53] Energy consumed for RAM : 0.003332 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:24:53] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:24:53] Energy consumed for All CPU : 0.002833 kWh
[codecarbon INFO @ 03:24:53] Energy consumed for all GPUs : 0.003974 kWh. Total GPU Power : 59.48959249769069 W
[codecarbon INFO @ 03:24:53] 0.010139 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:24:53] 0.019886 g.CO2eq/s mean an estimation of 627.1188506875019 kg.CO2eq/

üèãÔ∏è Training model...
 ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msanjanasawant524[0m ([33msanjanasawant524-rutgers-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.2275,1.594132,0.422284,0.499222
2,0.8929,1.706412,0.44058,0.523057


[codecarbon INFO @ 03:25:08] Energy consumed for RAM : 0.003541 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:25:08] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:25:08] Energy consumed for All CPU : 0.003010 kWh
[codecarbon INFO @ 03:25:08] Energy consumed for all GPUs : 0.004450 kWh. Total GPU Power : 114.05013518681825 W
[codecarbon INFO @ 03:25:08] 0.011000 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:25:11] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:25:11] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:25:11] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:25:11] Energy consumed for all GPUs : 0.000621 kWh. Total GPU Power : 148.98878552483558 W
[codecarbon INFO @ 03:25:11] 0.001007 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:

üìä Evaluating model...


[codecarbon INFO @ 03:30:23] Energy consumed for RAM : 0.007913 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:30:23] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:30:23] Energy consumed for All CPU : 0.006727 kWh
[codecarbon INFO @ 03:30:23] Energy consumed for all GPUs : 0.026408 kWh. Total GPU Power : 195.31797090526973 W
[codecarbon INFO @ 03:30:23] 0.041048 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:30:38] Energy consumed for RAM : 0.008122 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:30:38] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:30:38] Energy consumed for All CPU : 0.006904 kWh
[codecarbon INFO @ 03:30:38] Energy consumed for all GPUs : 0.027281 kWh. Total GPU Power : 209.6747912251801 W
[codecarbon INFO @ 03:30:38] 0.042307 kWh of electricity and 0.000000 L of water were used since the beginning.




üìà FINE-TUNING RESULTS SUMMARY FOR 25.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

üéØ Performance Metrics:
  F1 Score: 0.5231
  Exact Match: 0.4406
  Eval Loss: 1.7064

‚ö° Energy Consumption:
  Total Energy: 0.030597 kWh
  CPU Energy: 0.003833 kWh (12.5%)
  GPU Energy: 0.022256 kWh (72.7%)
  RAM Energy: 0.004508 kWh (14.7%)

üîå Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 245.82 W
  RAM Power: 50.00 W
  Total Power: 338.32 W

üå± Carbon Footprint:
  Total CO2 Emissions: 0.014405 kg
  Emissions Rate: 0.000044345 kg/s
  Duration: 0.09 hours
  Training Time (Trainer): 0.09 hours

üìç Location & Infrastructure:
  Country: Singapore (SGP)
  Region: 
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

üíª System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
  GPU: 1 x NVIDIA A100-SXM4-80GB (Count: 1)
  RAM: 167.05 GB
  Python: 3.12.12



In [17]:
#Considering 50% of data for training the model
result2 = run_experiment(
        size_fraction=0.5,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )
results_summary.append(result2)


üöÄ Training with 50.0% of training data
üîÑ Preprocessing 65159 training samples...


Map:   0%|          | 0/65159 [00:00<?, ? examples/s]

[codecarbon INFO @ 03:30:53] Energy consumed for RAM : 0.008330 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:30:53] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:30:53] Energy consumed for All CPU : 0.007081 kWh
[codecarbon INFO @ 03:30:53] Energy consumed for all GPUs : 0.027727 kWh. Total GPU Power : 106.76510166965157 W
[codecarbon INFO @ 03:30:53] 0.043138 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:30:53] 0.041140 g.CO2eq/s mean an estimation of 1,297.3957056187803 kg.CO2eq/year
[codecarbon INFO @ 03:31:08] Energy consumed for RAM : 0.008537 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:31:08] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:31:08] Energy consumed for All CPU : 0.007258 kWh
[codecarbon INFO @ 03:31:08] Energy consumed for all GPUs : 0.027985 kWh. Total GPU Power : 61.98113146802924 W
[codecarbon INFO @ 03:31:08] 0.

üèãÔ∏è Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.1511,1.310618,0.512115,0.595766
2,0.8639,1.258228,0.553816,0.639952


[codecarbon INFO @ 03:31:38] Energy consumed for RAM : 0.008946 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:31:38] Delta energy consumed for CPU with constant : 0.000175 kWh, power : 42.5 W
[codecarbon INFO @ 03:31:38] Energy consumed for All CPU : 0.007607 kWh
[codecarbon INFO @ 03:31:38] Energy consumed for all GPUs : 0.028833 kWh. Total GPU Power : 143.97684155382632 W
[codecarbon INFO @ 03:31:38] 0.045387 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:31:45] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:31:45] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:31:45] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:31:45] Energy consumed for all GPUs : 0.001003 kWh. Total GPU Power : 240.67474399619036 W
[codecarbon INFO @ 03:31:45] 0.001389 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:

üìä Evaluating model...


[codecarbon INFO @ 03:41:38] Energy consumed for RAM : 0.017276 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:41:38] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:41:38] Energy consumed for All CPU : 0.014688 kWh
[codecarbon INFO @ 03:41:38] Energy consumed for all GPUs : 0.071444 kWh. Total GPU Power : 194.08123237061278 W
[codecarbon INFO @ 03:41:38] 0.103409 kWh of electricity and 0.000000 L of water were used since the beginning.




üìà FINE-TUNING RESULTS SUMMARY FOR 50.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

üéØ Performance Metrics:
  F1 Score: 0.6400
  Exact Match: 0.5538
  Eval Loss: 1.2582

‚ö° Energy Consumption:
  Total Energy: 0.057790 kWh
  CPU Energy: 0.007042 kWh (12.2%)
  GPU Energy: 0.042463 kWh (73.5%)
  RAM Energy: 0.008284 kWh (14.3%)

üîå Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 255.89 W
  RAM Power: 50.00 W
  Total Power: 348.39 W

üå± Carbon Footprint:
  Total CO2 Emissions: 0.027207 kg
  Emissions Rate: 0.000045585 kg/s
  Duration: 0.17 hours
  Training Time (Trainer): 0.17 hours

üìç Location & Infrastructure:
  Country: Singapore (SGP)
  Region: 
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

üíª System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
  GPU: 1 x NVIDIA A100-SXM4-80GB (Count: 1)
  RAM: 167.05 GB
  Python: 3.12.12



In [18]:
#Considering 80% of data for training the model
result3 = run_experiment(
        size_fraction=0.8,
        train_data=squad["train"],
        eval_data=tokenized_validation,
        tokenizer=tokenizer,
        preprocess_fn=preprocess_function,
        compute_metrics_fn=compute_metrics,
        model_name="bert-base-uncased"
    )
results_summary.append(result3)


üöÄ Training with 80.0% of training data
üîÑ Preprocessing 104255 training samples...


Map:   0%|          | 0/104255 [00:00<?, ? examples/s]

[codecarbon INFO @ 03:41:53] Energy consumed for RAM : 0.017485 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:41:53] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:41:53] Energy consumed for All CPU : 0.014866 kWh
[codecarbon INFO @ 03:41:53] Energy consumed for all GPUs : 0.072147 kWh. Total GPU Power : 168.8113172807224 W
[codecarbon INFO @ 03:41:53] 0.104498 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:42:08] Energy consumed for RAM : 0.017693 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:42:08] Delta energy consumed for CPU with constant : 0.000179 kWh, power : 42.5 W
[codecarbon INFO @ 03:42:08] Energy consumed for All CPU : 0.015044 kWh
[codecarbon INFO @ 03:42:08] Energy consumed for all GPUs : 0.072411 kWh. Total GPU Power : 62.811261057040355 W
[codecarbon INFO @ 03:42:08] 0.105149 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:4

üèãÔ∏è Training model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.0986,1.051526,0.590819,0.6715
2,0.814,1.15806,0.595434,0.682083


[codecarbon INFO @ 03:43:08] Energy consumed for RAM : 0.018524 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:43:08] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:08] Energy consumed for All CPU : 0.015751 kWh
[codecarbon INFO @ 03:43:08] Energy consumed for all GPUs : 0.073481 kWh. Total GPU Power : 74.30110548955314 W
[codecarbon INFO @ 03:43:08] 0.107755 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:43:21] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:43:21] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:43:21] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 03:43:21] Energy consumed for all GPUs : 0.000994 kWh. Total GPU Power : 238.42190431055732 W
[codecarbon INFO @ 03:43:21] 0.001379 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:4

üìä Evaluating model...


[codecarbon INFO @ 03:58:38] Energy consumed for RAM : 0.031435 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:58:38] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:58:38] Energy consumed for All CPU : 0.026726 kWh
[codecarbon INFO @ 03:58:38] Energy consumed for all GPUs : 0.140104 kWh. Total GPU Power : 196.9415405190251 W
[codecarbon INFO @ 03:58:38] 0.198265 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:58:53] Energy consumed for RAM : 0.031643 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:58:53] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:58:53] Energy consumed for All CPU : 0.026903 kWh
[codecarbon INFO @ 03:58:53] Energy consumed for all GPUs : 0.140989 kWh. Total GPU Power : 212.2113961269066 W
[codecarbon INFO @ 03:58:53] 0.199535 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 03:58



üìà FINE-TUNING RESULTS SUMMARY FOR 80.0% DATASET:
  Training Method: Full Fine-Tuning
  Model: BERT

üéØ Performance Metrics:
  F1 Score: 0.6821
  Exact Match: 0.5954
  Eval Loss: 1.1581

‚ö° Energy Consumption:
  Total Energy: 0.090507 kWh
  CPU Energy: 0.010978 kWh (12.1%)
  GPU Energy: 0.066616 kWh (73.6%)
  RAM Energy: 0.012913 kWh (14.3%)

üîå Average Power Draw:
  CPU Power: 42.50 W
  GPU Power: 255.34 W
  RAM Power: 50.00 W
  Total Power: 347.84 W

üå± Carbon Footprint:
  Total CO2 Emissions: 0.042609 kg
  Emissions Rate: 0.000045802 kg/s
  Duration: 0.26 hours
  Training Time (Trainer): 0.26 hours

üìç Location & Infrastructure:
  Country: Singapore (SGP)
  Region: 
  On Cloud: N
  PUE (Power Usage Effectiveness): 1.0

üíª System Specifications:
  OS: Linux-6.6.105+-x86_64-with-glibc2.35
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
  GPU: 1 x NVIDIA A100-SXM4-80GB (Count: 1)
  RAM: 167.05 GB
  Python: 3.12.12



In [19]:
# Create summary DataFrame
results_df = pd.DataFrame(results_summary)
results_df['dataset_size%'] = (results_df['train_samples'] / len(squad['train']) * 100).round(0)

print("\n" + "="*60)
print("üìä FINAL RESULTS SUMMARY")
print("="*60)
print(results_df.to_string(index=False))


üìä FINAL RESULTS SUMMARY
 training_method model_name  train_samples  valid_samples  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count                 gpu_model  ram_total_size_gb  pue codecarbon_version  dataset_size%
Full Fine-Tuning       BERT          32579          12134  0.523057     0.440580   1.706412             0.090040                 0.000044      0.014405 2025-12-01T03:30:20        324.827223        0.090230             0.030597        0.003833        0.022256        0.004508         42.5   245.822358         50.0    Singapore              SGP                                           N Linux-6

In [20]:
results_df.to_csv("/content/drive/MyDrive/bert_dataset_size_results.csv", index=False)
print("\n‚úÖ Results saved to Google Drive!")


‚úÖ Results saved to Google Drive!


## MODEL EVALUATION WITH EXAMPLES

In [21]:
def test_model_manual(model, tokenizer, examples):
    model.eval()
    device = model.device
    results = []

    print("\n" + "="*80)
    print("üß™ MODEL EVALUATION (MANUAL MODE)")
    print("="*80)

    for i, example in enumerate(examples, 1):
        question = example['question']
        context = example['context']
        expected = example.get('expected_answer', None)

        # Tokenize
        inputs = tokenizer(question, context, return_tensors="pt",
                          max_length=384, truncation=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Get predictions
        with torch.no_grad():
            outputs = model(**inputs)

        # Extract answer
        start_idx = outputs.start_logits.argmax().item()
        end_idx = outputs.end_logits.argmax().item()

        # Get confidence scores
        start_score = torch.softmax(outputs.start_logits, dim=1)[0][start_idx].item()
        end_score = torch.softmax(outputs.end_logits, dim=1)[0][end_idx].item()
        confidence = (start_score + end_score) / 2

        # Decode answer
        if start_idx <= end_idx:
            answer_tokens = inputs["input_ids"][0][start_idx:end_idx+1]
            predicted_answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)
        else:
            predicted_answer = "[NO ANSWER]"

        # Store result
        result = {
            'question': question,
            'context': context[:100] + "..." if len(context) > 100 else context,
            'predicted_answer': predicted_answer,
            'expected_answer': expected,
            'confidence': confidence,
            'start_position': start_idx,
            'end_position': end_idx
        }
        results.append(result)

        # Print formatted output
        print(f"\nüìù Example {i}")
        print(f"Question: {question}")
        print(f"Context: {context[:150]}{'...' if len(context) > 150 else ''}")
        print(f"\n‚úÖ Predicted Answer: '{predicted_answer}'")
        print(f"   Confidence: {confidence:.2%}")

        if expected:
            match = predicted_answer.lower().strip() == expected.lower().strip()
            print(f"   Expected Answer: '{expected}'")
            print(f"   Exact Match: {'‚úì YES' if match else '‚úó NO'}")

        print("-" * 80)

    return results

In [22]:
test_examples = [
    {
        'question': "What does Google Colab provide access to?",
        'context': "Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.",
        'expected_answer': "GPUs and TPUs"
    },
    {
        'question': "What is the capital of France?",
        'context': "Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.",
        'expected_answer': "Paris"
    },
    {
        'question': "When was Python created?",
        'context': "Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.",
        'expected_answer': "1991"
    },
    {
        'question': "What is photosynthesis?",
        'context': "Photosynthesis is the process by which plants use sunlight, water and carbon dioxide to create oxygen and energy in the form of sugar.",
        'expected_answer': "process by which plants use sunlight, water and carbon dioxide to create oxygen and energy"
    },
    {
        'question': "Who invented the telephone?",
        'context': "The telephone was invented by Alexander Graham Bell in 1876. He made the first successful telephone call on March 10, 1876.",
        'expected_answer': "Alexander Graham Bell"
    }
]


In [23]:
print(os.listdir('/content/'))

['.config', 'wandb', 'results_bert_25pct', 'results_bert_80pct', 'drive', 'results_bert_50pct', 'sample_data']


In [24]:
model_path = "results_bert_80pct/final_model"
model = AutoModelForQuestionAnswering.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
results = test_model_manual(model, tokenizer, test_examples)


üß™ MODEL EVALUATION (MANUAL MODE)

üìù Example 1
Question: What does Google Colab provide access to?
Context: Google Colab provides free access to GPUs and TPUs, which makes it popular for deep learning.

‚úÖ Predicted Answer: 'gpus and tpus'
   Confidence: 96.23%
   Expected Answer: 'GPUs and TPUs'
   Exact Match: ‚úì YES
--------------------------------------------------------------------------------

üìù Example 2
Question: What is the capital of France?
Context: Paris is the capital and most populous city of France. It has been one of Europe's major centers of finance, diplomacy, commerce, fashion, and arts.

‚úÖ Predicted Answer: 'paris'
   Confidence: 98.61%
   Expected Answer: 'Paris'
   Exact Match: ‚úì YES
--------------------------------------------------------------------------------

üìù Example 3
Question: When was Python created?
Context: Python was created by Guido van Rossum and first released in 1991. Its design philosophy emphasizes code readability.

‚úÖ Predic

## Plots For Comparing Trends With Respect To The Change In Sizes.

In [25]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import numpy as np

In [26]:
# Load the dataset
full_ft_results = pd.read_csv("/content/drive/MyDrive/bert_dataset_size_results.csv")

print("üìä Data loaded successfully!")
print(f"Total experiments: {len(full_ft_results)}")
print("\nExperiments:")
print(full_ft_results[['train_samples', 'dataset_size%', 'f1_score', 'emissions_kg']])


üìä Data loaded successfully!
Total experiments: 3

Experiments:
   train_samples  dataset_size%  f1_score  emissions_kg
0          32579           25.0  0.523057      0.014405
1          65159           50.0  0.639952      0.027207
2         104255           80.0  0.682083      0.042609


In [27]:
# PLOT 1: Energy Consumption vs Dataset Size (Stacked Area)
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted['dataset_size%'],
    y=df_sorted['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.update_layout(
    title=dict(text="Energy Consumption Scaling with Dataset Size", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_energy_scaling.html")
print("\n‚úÖ Plot 1 saved: full_ft_energy_scaling.html")


‚úÖ Plot 1 saved: full_ft_energy_scaling.html


In [28]:
# PLOT 2: Performance & Emissions Growth (Dual Y-axis)
df_sorted = full_ft_results.sort_values('train_samples')

fig = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig.add_trace(
    go.Scatter(
        x=df_sorted['dataset_size%'],
        y=df_sorted['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig.add_trace(
    go.Bar(
        x=df_sorted['dataset_size%'],
        y=df_sorted['emissions_kg'],
        name='CO‚ÇÇ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO‚ÇÇ</b>: %{y:.6f} kg<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=True
)

fig.update_xaxes(title_text="Dataset Size (%)")
fig.update_yaxes(title_text="Performance Score", secondary_y=False)
fig.update_yaxes(title_text="CO‚ÇÇ Emissions (kg)", secondary_y=True)

fig.update_layout(
    title=dict(text="Performance vs Carbon Emissions by Dataset Size", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_performance_emissions.html")
print("\n‚úÖ Plot 2 saved: full_ft_performance_emissions.html")


‚úÖ Plot 2 saved: full_ft_performance_emissions.html


In [29]:
# PLOT 3: Efficiency Analysis (Diminishing Returns)
df_sorted = full_ft_results.sort_values('train_samples').copy()
df_sorted['f1_per_kg_co2'] = df_sorted['f1_score'] / df_sorted['emissions_kg']
df_sorted['em_per_kg_co2'] = df_sorted['exact_match'] / df_sorted['emissions_kg']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['f1_per_kg_co2'],
    name='F1 / kg CO‚ÇÇ',
    mode='lines+markers',
    line=dict(color='#4ECDC4', width=3),
    marker=dict(size=12),
    fill='tozeroy',
    fillcolor='rgba(78, 205, 196, 0.2)',
    hovertemplate='<b>F1 Efficiency</b>: %{y:.2f}<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['em_per_kg_co2'],
    name='EM / kg CO‚ÇÇ',
    mode='lines+markers',
    line=dict(color='#95E1D3', width=3, dash='dash'),
    marker=dict(size=10),
    hovertemplate='<b>EM Efficiency</b>: %{y:.2f}<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig.update_layout(
    title=dict(text="Carbon Efficiency: Performance per kg CO‚ÇÇ", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Efficiency (Score per kg CO‚ÇÇ)',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

# Add annotation for optimal point
optimal_idx = df_sorted['f1_per_kg_co2'].idxmax()
optimal_row = df_sorted.loc[optimal_idx]

fig.add_annotation(
    x=optimal_row['dataset_size%'],
    y=optimal_row['f1_per_kg_co2'],
    text=f"Most Efficient:<br>{optimal_row['dataset_size%']:.0f}%",
    showarrow=True,
    arrowhead=2,
    arrowcolor="#FF6B6B",
    font=dict(size=12, color="#FF6B6B")
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_efficiency.html")
print("\n‚úÖ Plot 3 saved: full_ft_efficiency.html")


‚úÖ Plot 3 saved: full_ft_efficiency.html


In [30]:
# PLOT 4: Training Time vs Energy Consumption
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['training_time_hours'],
    y=df_sorted['energy_consumed_kwh'],
    mode='markers+lines',
    marker=dict(
        size=df_sorted['train_samples'] / 1000,  # Size by dataset
        color=df_sorted['f1_score'],
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="F1 Score"),
        line=dict(width=2, color='white')
    ),
    line=dict(color='#4ECDC4', width=2, dash='dot'),
    text=df_sorted['dataset_size%'].astype(str) + '%',
    textposition='top center',
    hovertemplate='<b>Dataset: %{text}</b><br>' +
                  'Time: %{x:.2f} hours<br>' +
                  'Energy: %{y:.6f} kWh<br>' +
                  '<extra></extra>'
))

fig.update_layout(
    title=dict(text="Training Time vs Energy Consumption", font=dict(size=18)),
    xaxis_title='Training Time (hours)',
    yaxis_title='Total Energy Consumption (kWh)',
    template='plotly_white',
    height=500,
    font=dict(size=13)
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_time_energy.html")
print("\n‚úÖ Plot 4 saved: full_ft_time_energy.html")


‚úÖ Plot 4 saved: full_ft_time_energy.html


In [31]:
# PLOT 5: Component-wise Power Draw
df_sorted = full_ft_results.sort_values('train_samples')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['cpu_power_w'],
    name='CPU Power',
    mode='lines+markers',
    line=dict(color='#FF6B6B', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>CPU</b>: %{y:.2f} W<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['gpu_power_w'],
    name='GPU Power',
    mode='lines+markers',
    line=dict(color='#4ECDC4', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>GPU</b>: %{y:.2f} W<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=df_sorted['dataset_size%'],
    y=df_sorted['ram_power_w'],
    name='RAM Power',
    mode='lines+markers',
    line=dict(color='#95E1D3', width=3),
    marker=dict(size=10),
    stackgroup='one',
    hovertemplate='<b>RAM</b>: %{y:.2f} W<extra></extra>'
))

fig.update_layout(
    title=dict(text="Average Power Draw by Component", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Power Draw (Watts)',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_power_breakdown.html")
print("\n‚úÖ Plot 5 saved: full_ft_power_breakdown.html")


‚úÖ Plot 5 saved: full_ft_power_breakdown.html


In [32]:
# PLOT 6: Comprehensive Dashboard (All Metrics)
df_sorted = full_ft_results.sort_values('train_samples')

fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=(
        'Energy Consumption',
        'Performance Metrics',
        'Carbon Emissions',
        'Training Time',
        'Power Draw',
        'Cost-Benefit Analysis'
    ),
    specs=[
        [{"type": "bar"}, {"type": "scatter"}, {"type": "bar"}],
        [{"type": "scatter"}, {"type": "bar"}, {"type": "scatter"}]
    ]
)

# 1. Energy consumption (stacked)
fig.add_trace(
    go.Bar(name='CPU', x=df_sorted['dataset_size%'],
           y=df_sorted['cpu_energy_kwh'], marker_color='#FF6B6B'),
    row=1, col=1
)
fig.add_trace(
    go.Bar(name='GPU', x=df_sorted['dataset_size%'],
           y=df_sorted['gpu_energy_kwh'], marker_color='#4ECDC4'),
    row=1, col=1
)

# 2. Performance metrics
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['f1_score'],
               mode='lines+markers', name='F1', line=dict(color='#4ECDC4', width=3)),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['exact_match'],
               mode='lines+markers', name='EM', line=dict(color='#95E1D3', width=3, dash='dash')),
    row=1, col=2
)

# 3. Carbon emissions
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['emissions_kg'],
           marker_color='#FF6B6B', showlegend=False),
    row=1, col=3
)

# 4. Training time
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=df_sorted['training_time_hours'],
               mode='lines+markers', marker=dict(size=12, color='#FFA07A'),
               line=dict(color='#FFA07A', width=3), showlegend=False),
    row=2, col=1
)

# 5. Power draw (stacked)
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['cpu_power_w'],
           marker_color='#FF6B6B', showlegend=False),
    row=2, col=2
)
fig.add_trace(
    go.Bar(x=df_sorted['dataset_size%'], y=df_sorted['gpu_power_w'],
           marker_color='#4ECDC4', showlegend=False),
    row=2, col=2
)

# 6. Efficiency
efficiency = df_sorted['f1_score'] / df_sorted['emissions_kg']
fig.add_trace(
    go.Scatter(x=df_sorted['dataset_size%'], y=efficiency,
               mode='lines+markers', marker=dict(size=12, color='#9370DB'),
               line=dict(color='#9370DB', width=3), showlegend=False),
    row=2, col=3
)

# Update axes labels
fig.update_xaxes(title_text="Dataset %", row=1, col=1)
fig.update_xaxes(title_text="Dataset %", row=1, col=2)
fig.update_xaxes(title_text="Dataset %", row=1, col=3)
fig.update_xaxes(title_text="Dataset %", row=2, col=1)
fig.update_xaxes(title_text="Dataset %", row=2, col=2)
fig.update_xaxes(title_text="Dataset %", row=2, col=3)

fig.update_yaxes(title_text="Energy (kWh)", row=1, col=1)
fig.update_yaxes(title_text="Score", row=1, col=2)
fig.update_yaxes(title_text="CO‚ÇÇ (kg)", row=1, col=3)
fig.update_yaxes(title_text="Hours", row=2, col=1)
fig.update_yaxes(title_text="Power (W)", row=2, col=2)
fig.update_yaxes(title_text="F1/kg CO‚ÇÇ", row=2, col=3)

fig.update_layout(
    height=800,
    title_text="<b>Full Fine-tuning: Comprehensive Analysis Dashboard</b>",
    showlegend=True,
    template='plotly_white',
    barmode='stack',
    font=dict(size=11)
)

fig.show()
fig.write_html("/content/drive/MyDrive/full_ft_dashboard.html")
print("\n‚úÖ Plot 6 saved: full_ft_dashboard.html")


‚úÖ Plot 6 saved: full_ft_dashboard.html


In [33]:
# SUMMARY STATISTICS & KEY INSIGHTS
print("\n" + "="*80)
print("üìä FULL FINE-TUNING SUMMARY STATISTICS")
print("="*80)

df_sorted = full_ft_results.sort_values('train_samples')

print(f"\nüìà Performance Growth:")
f1_growth = ((df_sorted['f1_score'].iloc[-1] - df_sorted['f1_score'].iloc[0]) /
             df_sorted['f1_score'].iloc[0] * 100)
print(f"  F1 Score improvement (smallest to largest): +{f1_growth:.2f}%")
print(f"  Best F1 Score: {df_sorted['f1_score'].max():.4f} at {df_sorted.loc[df_sorted['f1_score'].idxmax(), 'dataset_size%']:.0f}%")

print(f"\nüå± Carbon Impact:")
emissions_growth = ((df_sorted['emissions_kg'].iloc[-1] - df_sorted['emissions_kg'].iloc[0]) /
                    df_sorted['emissions_kg'].iloc[0] * 100)
print(f"  Emissions growth (smallest to largest): +{emissions_growth:.2f}%")
print(f"  Total CO‚ÇÇ: {df_sorted['emissions_kg'].sum():.6f} kg")

print(f"\n‚ö° Energy Analysis:")
print(f"  Total Energy Consumed: {df_sorted['energy_consumed_kwh'].sum():.6f} kWh")
gpu_ratio = (df_sorted['gpu_energy_kwh'].sum() / df_sorted['energy_consumed_kwh'].sum()) * 100
cpu_ratio = (df_sorted['cpu_energy_kwh'].sum() / df_sorted['energy_consumed_kwh'].sum()) * 100
print(f"  GPU Energy: {gpu_ratio:.1f}% of total")
print(f"  CPU Energy: {cpu_ratio:.1f}% of total")

print(f"\nüí° Efficiency Insights:")
df_sorted['efficiency'] = df_sorted['f1_score'] / df_sorted['emissions_kg']
best_eff_idx = df_sorted['efficiency'].idxmax()
print(f"  Most efficient dataset size: {df_sorted.loc[best_eff_idx, 'dataset_size%']:.0f}%")
print(f"  Efficiency at this size: {df_sorted.loc[best_eff_idx, 'efficiency']:.2f} F1/kg CO‚ÇÇ")


üìä FULL FINE-TUNING SUMMARY STATISTICS

üìà Performance Growth:
  F1 Score improvement (smallest to largest): +30.40%
  Best F1 Score: 0.6821 at 80%

üå± Carbon Impact:
  Emissions growth (smallest to largest): +195.80%
  Total CO‚ÇÇ: 0.084220 kg

‚ö° Energy Analysis:
  Total Energy Consumed: 0.178894 kWh
  GPU Energy: 73.4% of total
  CPU Energy: 12.2% of total

üí° Efficiency Insights:
  Most efficient dataset size: 25%
  Efficiency at this size: 36.31 F1/kg CO‚ÇÇ


# Training Strategy 2: LoRA (Low-Rank Adaptation) fine-tuning (Model DistilBERT)

In [34]:
!pip install peft

[codecarbon INFO @ 03:59:08] Energy consumed for RAM : 0.031851 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 03:59:08] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 03:59:08] Energy consumed for All CPU : 0.027080 kWh
[codecarbon INFO @ 03:59:08] Energy consumed for all GPUs : 0.141431 kWh. Total GPU Power : 106.10792248443957 W
[codecarbon INFO @ 03:59:08] 0.200362 kWh of electricity and 0.000000 L of water were used since the beginning.




In [35]:


# Import PEFT for LoRA
from peft import LoraConfig, get_peft_model, TaskType, PeftModel


# STEP 1: Creating And Training LoRA Model

In [36]:
def create_lora_model(model_name="bert-base-uncased", r=8, lora_alpha=16, lora_dropout=0.1):
    """
    Create BERT model with LoRA adapters.

    Args:
        model_name: Base model name
        r: Rank of update matrices
        lora_alpha: Scaling factor
        lora_dropout: Dropout probability
    """
    # Load base model
    base_model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Configure LoRA
    lora_config = LoraConfig(
        task_type=TaskType.QUESTION_ANS,  # Task type for QA
        r=r,  # Rank of update matrices
        lora_alpha=lora_alpha,  # Scaling factor
        lora_dropout=lora_dropout,  # Dropout probability
        target_modules=["query", "value"],  # Which layers to apply LoRA to (BERT attention)
        bias="none",  # Don't train biases
        inference_mode=False,  # Training mode
    )

    # Apply LoRA to model
    lora_model = get_peft_model(base_model, lora_config)

    # Print trainable parameters
    lora_model.print_trainable_parameters()

    return lora_model

def train_lora_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                     size_fraction, lora_rank=8):
    """Train BERT model with LoRA fine-tuning."""

    # Create LoRA model
    print(f"\nüîß Creating LoRA model (rank={lora_rank})...")
    lora_model = create_lora_model(
        model_name="bert-base-uncased",
        r=lora_rank,
        lora_alpha=lora_rank * 2,  # Common practice: alpha = 2*r
        lora_dropout=0.1
    )

    # Setup output directory
    output_dir = f"results_bert_lora_r{lora_rank}_{int(size_fraction*100)}pct"

    # Training arguments (can use higher learning rate for LoRA)
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=3e-4,  # Higher LR for LoRA
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=3,  # More epochs for LoRA
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=100,
        greater_is_better=True
    )

    # Initialize trainer
    trainer = Trainer(
        model=lora_model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_LoRA_r{lora_rank}_{int(size_fraction*100)}pct",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print("üöÄ Training LoRA model...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, lora_model


# STEP 2: Evaluating The LoRA Model On Different Rank Sizes

In [37]:
def evaluate_and_save_lora(trainer, train_results, emissions_data, output_dir,
                           size_fraction, num_samples, lora_model):
    """Evaluate LoRA model and save results with detailed emissions."""

    print("üìä Evaluating LoRA model...")
    eval_results = trainer.evaluate()

    # Count trainable parameters
    trainable_params = sum(p.numel() for p in lora_model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in lora_model.parameters())
    trainable_percentage = 100 * trainable_params / total_params

    # Extract emissions data from EmissionsData object
    result_entry = {
        "training_method": "LoRA",
        "model_name": "BERT",
        "lora_rank": lora_model.peft_config['default'].r,
        'dataset_size%': int(size_fraction*100),
        "train_samples": num_samples,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance metrics
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions data
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,
        "emissions_kg": emissions_data.emissions,
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,

        # Energy consumption
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power draw
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location and system info
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System specifications
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_count": emissions_data.cpu_count,
        "cpu_model": emissions_data.cpu_model,
        "gpu_count": emissions_data.gpu_count,
        "gpu_model": emissions_data.gpu_model,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional metrics
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print detailed summary
    print(f"\n{'='*80}")
    print(f"  LoRA RESULTS SUMMARY (Rank {result_entry['lora_rank']}, {size_fraction*100}% Dataset)")
    print(f"{'='*80}")
    print(f"\nüì¶ Model Configuration:")
    print(f"   Training Method: LoRA")
    print(f"   LoRA Rank: {result_entry['lora_rank']}")
    print(f"   Trainable Parameters: {trainable_params:,} ({trainable_percentage:.2f}%)")
    print(f"   Total Parameters: {total_params:,}")
    print(f"   Dataset Size: {size_fraction*100}%")

    print(f"\nüìà Performance Metrics:")
    print(f"   F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"   Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"   Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n‚ö° Energy Consumption:")
    print(f"   Total Energy: {emissions_data.energy_consumed:.6f} kWh")
    print(f"   CPU Energy: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"   GPU Energy: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
    print(f"   RAM Energy: {emissions_data.ram_energy:.6f} kWh ({emissions_data.ram_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nüîå Average Power Draw:")
    print(f"   CPU Power: {emissions_data.cpu_power:.2f} W")
    print(f"   GPU Power: {emissions_data.gpu_power:.2f} W")
    print(f"   RAM Power: {emissions_data.ram_power:.2f} W")
    print(f"   Total Power: {emissions_data.cpu_power + emissions_data.gpu_power + emissions_data.ram_power:.2f} W")

    print(f"\nüåç Carbon Footprint:")
    print(f"   Total CO2 Emissions: {emissions_data.emissions:.6f} kg")
    print(f"   Emissions Rate: {emissions_data.emissions_rate:.9f} kg/s")
    print(f"   Duration: {emissions_data.duration/3600:.2f} hours")
    print(f"   Training Time (Trainer): {train_results.metrics['train_runtime']/3600:.2f} hours")

    print(f"\nüìç Location & Infrastructure:")
    print(f"   Country: {emissions_data.country_name} ({emissions_data.country_iso_code})")
    print(f"   Region: {emissions_data.region}")
    print(f"   On Cloud: {emissions_data.on_cloud}")
    print(f"   PUE (Power Usage Effectiveness): {emissions_data.pue}")

    print(f"\nüíª System Specifications:")
    print(f"   OS: {emissions_data.os}")
    print(f"   CPU: {emissions_data.cpu_model} ({emissions_data.cpu_count} cores)")
    if emissions_data.gpu_count and emissions_data.gpu_model:
        print(f"   GPU: {emissions_data.gpu_model} (Count: {emissions_data.gpu_count})")
    else:
        print(f"   GPU: None detected")
    print(f"   RAM: {emissions_data.ram_total_size:.2f} GB")
    print(f"   Python: {emissions_data.python_version}")
    print(f"\n{'='*80}")

    # Save LoRA adapters
    lora_model.save_pretrained(f"{output_dir}/lora_adapters")
    tokenizer.save_pretrained(f"{output_dir}/lora_adapters")
    print(f"‚úÖ LoRA adapters saved to {output_dir}/lora_adapters")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

def run_lora_experiment(size_fraction, train_data, eval_data, tokenizer, preprocess_fn,
                        compute_metrics_fn, lora_rank):
    """Run complete LoRA experiment for given dataset size and rank."""

    print(f"\n{'='*60}")
    print(f"  LoRA Training with {size_fraction*100}% of training data (Rank {lora_rank})")
    print(f"{'='*60}")

    # Step 1: Prepare dataset
    tokenized_train, num_samples = prepare_dataset(train_data, size_fraction, preprocess_fn)

    # Step 2: Train LoRA model
    trainer, train_results, emissions_data, output_dir, lora_model = train_lora_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        size_fraction, lora_rank
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_lora(
        trainer, train_results, emissions_data, output_dir,
        size_fraction, num_samples, lora_model
    )

    return result_entry

In [38]:
# We will be training our model with 80% of SQuAD dataset using LoRA Rank 16
# ============================================================================

result_lora = []

# Experiment: 80% data with Rank 16
print("\n" + "="*80)
print("  EXPERIMENT: LoRA FINE-TUNING WITH 80% TRAINING DATASET (Rank 16)")
print("="*80)

result_lora_80_r16 = run_lora_experiment(
    size_fraction=0.8,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    lora_rank=16
)
result_lora.append(result_lora_80_r16)


  EXPERIMENT: LoRA FINE-TUNING WITH 80% TRAINING DATASET (Rank 16)

  LoRA Training with 80.0% of training data (Rank 16)
üîÑ Preprocessing 104255 training samples...


Map:   0%|          | 0/104255 [00:00<?, ? examples/s]

[codecarbon INFO @ 04:12:08] Energy consumed for RAM : 0.042680 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:12:08] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:12:08] Energy consumed for All CPU : 0.036286 kWh
[codecarbon INFO @ 04:12:08] Energy consumed for all GPUs : 0.154503 kWh. Total GPU Power : 60.244903124806726 W
[codecarbon INFO @ 04:12:08] 0.233469 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:12:23] Energy consumed for RAM : 0.042886 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:12:23] Delta energy consumed for CPU with constant : 0.000176 kWh, power : 42.5 W
[codecarbon INFO @ 04:12:23] Energy consumed for All CPU : 0.036462 kWh
[codecarbon INFO @ 04:12:23] Energy consumed for all GPUs : 0.154754 kWh. Total GPU Power : 60.49178737855819 W
[codecarbon INFO @ 04:12:23] 0.234102 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:1


üîß Creating LoRA model (rank=16)...


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:13:16] [setup] RAM Tracking...
[codecarbon INFO @ 04:13:16] [setup] CPU Tracking...


trainable params: 591,362 || all params: 109,484,548 || trainable%: 0.5401


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:13:17] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:13:17] [setup] GPU Tracking...
[codecarbon INFO @ 04:13:17] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:13:17] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:13:17] >>> Tracker's metadata:
[codecarbon INFO @ 04:13:17]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:13:17]   Python version: 3.12.12
[codecarbon INFO @ 04:13:17]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:13:17]   Available RAM : 167.052 GB
[codecarbon INFO @ 04:13:17]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:13:

üöÄ Training LoRA model...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,1.3558,1.250248,0.514752,0.585803
2,1.2046,1.151266,0.562387,0.636961
3,1.1084,1.187044,0.555546,0.636183


[codecarbon INFO @ 04:13:23] Energy consumed for RAM : 0.043720 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:13:23] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:13:23] Energy consumed for All CPU : 0.037171 kWh
[codecarbon INFO @ 04:13:23] Energy consumed for all GPUs : 0.155927 kWh. Total GPU Power : 99.64118872585085 W
[codecarbon INFO @ 04:13:23] 0.236818 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:13:34] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:13:34] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:13:34] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:13:34] Energy consumed for all GPUs : 0.000848 kWh. Total GPU Power : 203.41798737851485 W
[codecarbon INFO @ 04:13:34] 0.001234 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:1

üìä Evaluating LoRA model...


[codecarbon INFO @ 04:36:54] Energy consumed for RAM : 0.063293 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:36:54] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:36:54] Energy consumed for All CPU : 0.053810 kWh
[codecarbon INFO @ 04:36:54] Energy consumed for all GPUs : 0.241485 kWh. Total GPU Power : 185.02704033679183 W
[codecarbon INFO @ 04:36:54] 0.358588 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:36:54] 0.039723 g.CO2eq/s mean an estimation of 1,252.7078279817554 kg.CO2eq/year
[codecarbon INFO @ 04:37:09] Energy consumed for RAM : 0.063501 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:37:09] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:37:09] Energy consumed for All CPU : 0.053987 kWh
[codecarbon INFO @ 04:37:09] Energy consumed for all GPUs : 0.242281 kWh. Total GPU Power : 191.1798024951619 W
[codecarbon INFO @ 04:37:09] 0.


  LoRA RESULTS SUMMARY (Rank 16, 80.0% Dataset)

üì¶ Model Configuration:
   Training Method: LoRA
   LoRA Rank: 16
   Trainable Parameters: 591,362 (0.54%)
   Total Parameters: 109,484,548
   Dataset Size: 80.0%

üìà Performance Metrics:
   F1 Score: 0.6370
   Exact Match: 0.5624
   Eval Loss: 1.1513

‚ö° Energy Consumption:
   Total Energy: 0.121954 kWh
   CPU Energy: 0.016668 kWh (13.7%)
   GPU Energy: 0.085680 kWh (70.3%)
   RAM Energy: 0.019607 kWh (16.1%)

üîå Average Power Draw:
   CPU Power: 42.50 W
   GPU Power: 218.00 W
   RAM Power: 50.00 W
   Total Power: 310.50 W

üåç Carbon Footprint:
   Total CO2 Emissions: 0.057414 kg
   Emissions Rate: 0.000040646 kg/s
   Duration: 0.39 hours
   Training Time (Trainer): 0.39 hours

üìç Location & Infrastructure:
   Country: Singapore (SGP)
   Region: 
   On Cloud: N
   PUE (Power Usage Effectiveness): 1.0

üíª System Specifications:
   OS: Linux-6.6.105+-x86_64-with-glibc2.35
   CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (12 cores)
   

# STEP 3: Results and Analysis

In [42]:
# Create summary DataFrame
results_df_lora = pd.DataFrame(result_lora)

print("\n" + "="*60)
print("  LoRA RESULTS SUMMARY")
print("="*60)
print(results_df_lora.to_string(index=False))

# Save to CSV
results_df_lora.to_csv("/content/drive/MyDrive/bert_lora_results.csv", index=False)
print("\n‚úÖ LoRA results saved!")

# Print comparison
print("\n" + "="*80)
print("  LoRA DATASET SIZE COMPARISON")
print("="*80)
print(results_df_lora[['dataset_size%', 'trainable_params', 'trainable_percentage',
                       'f1_score', 'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Efficiency Analysis
print("\n" + "="*80)
print("  LoRA EFFICIENCY METRICS")
print("="*80)

for _, row in results_df_lora.iterrows():
    dataset_pct = row['dataset_size%']

    print(f"\n{dataset_pct}% Dataset (Rank {row['lora_rank']}):")
    print(f"  Training Samples: {row['train_samples']:,}")
    print(f"  Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.4f}%)")
    print(f"  Total Params: {row['total_params']:,}")
    print(f"  F1 Score: {row['f1_score']:.4f}")
    print(f"  Exact Match: {row['exact_match']:.4f}")
    print(f"  Emissions: {row['emissions_kg']:.6f} kg CO‚ÇÇ")
    print(f"  Energy Consumed: {row['energy_consumed_kwh']:.6f} kWh")
    print(f"  Training Time: {row['training_time_hours']:.2f} hours")

    # Efficiency metrics
    efficiency_co2 = row['f1_score'] / row['emissions_kg'] if row['emissions_kg'] > 0 else 0
    efficiency_kwh = row['f1_score'] / row['energy_consumed_kwh'] if row['energy_consumed_kwh'] > 0 else 0
    efficiency_time = row['f1_score'] / row['training_time_hours'] if row['training_time_hours'] > 0 else 0

    print(f"  Efficiency (F1/kg CO‚ÇÇ): {efficiency_co2:.2f}")
    print(f"  Efficiency (F1/kWh): {efficiency_kwh:.2f}")
    print(f"  Efficiency (F1/hour): {efficiency_time:.2f}")



  LoRA RESULTS SUMMARY
training_method model_name  lora_rank  dataset_size%  train_samples  valid_samples  trainable_params  total_params  trainable_percentage  f1_score  exact_match  eval_loss  training_time_hours  emissions_rate_kg_per_s  emissions_kg           timestamp  duration_seconds  duration_hours  energy_consumed_kwh  cpu_energy_kwh  gpu_energy_kwh  ram_energy_kwh  cpu_power_w  gpu_power_w  ram_power_w country_name country_iso_code region cloud_provider cloud_region on_cloud                                   os python_version  cpu_count                      cpu_model  gpu_count                 gpu_model  ram_total_size_gb  pue codecarbon_version
           LoRA       BERT         16             80         104255          12134            591362     109484548              0.540133  0.636961     0.562387   1.151266             0.392177                 0.000041      0.057414 2025-12-01T04:36:51       1412.554506        0.392376             0.121954        0.016668         0.085

# LoRA Visualizations

In [43]:
# PLOT 1: LoRA Energy Consumption by Dataset Size
print("\nüìä Creating LoRA Energy Plot...")
df_sorted_lora = results_df_lora.sort_values('train_samples')

fig_lora_energy = go.Figure()

fig_lora_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_lora['dataset_size%'],
    y=df_sorted_lora['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Dataset: %{x:.0f}%<extra></extra>'
))

fig_lora_energy.update_layout(
    title=dict(text="LoRA: Energy Consumption by Dataset Size", font=dict(size=18)),
    xaxis_title='Dataset Size (%)',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_lora_energy.show()
fig_lora_energy.write_html("/content/drive/MyDrive/lora_energy_by_dataset.html")
print("‚úÖ LoRA Energy Plot saved: lora_energy_by_dataset.html")

# PLOT 2: LoRA Performance & Emissions (Dual Y-axis)
print("\nüìä Creating LoRA Performance vs Emissions Plot...")
df_sorted_lora = results_df_lora.sort_values('train_samples')

fig_lora_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_lora_perf.add_trace(
    go.Scatter(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_lora_perf.add_trace(
    go.Bar(
        x=df_sorted_lora['dataset_size%'],
        y=df_sorted_lora['emissions_kg'],
        name='CO‚ÇÇ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO‚ÇÇ</b>: %{y:.6f} kg<br>Dataset: %{x:.0f}%<extra></extra>'
    ),
    secondary_y=True
)

fig_lora_perf.update_xaxes(title_text="Dataset Size (%)")
fig_lora_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_lora_perf.update_yaxes(title_text="CO‚ÇÇ Emissions (kg)", secondary_y=True)

fig_lora_perf.update_layout(
    title=dict(text="LoRA: Performance vs Carbon Emissions by Dataset Size", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_lora_perf.show()
fig_lora_perf.write_html("/content/drive/MyDrive/lora_performance_emissions.html")
print("‚úÖ LoRA Performance Plot saved: lora_performance_emissions.html")


üìä Creating LoRA Energy Plot...


‚úÖ LoRA Energy Plot saved: lora_energy_by_dataset.html

üìä Creating LoRA Performance vs Emissions Plot...


‚úÖ LoRA Performance Plot saved: lora_performance_emissions.html


# Training Strategy 3: Few-shot Learning With Frozen Backbone (Model BERT)

In [None]:
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

# STEP 1: Custom Model with Frozen Backbone

In [44]:
def create_frozen_model(model_name="bert-base-uncased"):
    """Create model with frozen backbone (only QA head is trainable)."""

    # Load base model
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)

    # Freeze ALL parameters first
    for param in model.parameters():
        param.requires_grad = False

    # Unfreeze ONLY the QA head (classifier layer)
    # For BERT: qa_outputs layer
    for param in model.qa_outputs.parameters():
        param.requires_grad = True

    # Count parameters
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())

    print(f"\nüì¶ Model Configuration:")
    print(f"   Total Parameters: {total_params:,}")
    print(f"   Trainable Parameters: {trainable_params:,}")
    print(f"   Frozen Parameters: {total_params - trainable_params:,}")
    print(f"   Trainable Percentage: {100 * trainable_params / total_params:.4f}%")

    return model, trainable_params, total_params

def prepare_fewshot_dataset(train_data, num_shots, preprocess_fn):
    """Prepare few-shot dataset with specified number of examples."""

    # Select only num_shots examples
    train_subset = train_data.select(range(num_shots))

    print(f"üéØ Creating few-shot dataset with {num_shots} examples...")

    tokenized_train = train_subset.map(
        preprocess_fn,
        batched=True,
        remove_columns=train_subset.column_names
    )

    # After tokenization with sliding window, we get more samples
    actual_samples = len(tokenized_train)
    print(f"   Original examples: {num_shots}")
    print(f"   After tokenization (with sliding window): {actual_samples} samples")

    return tokenized_train, num_shots  # Return original num_shots for tracking

def train_fewshot_model(tokenized_train, tokenized_eval, tokenizer, compute_metrics_fn,
                        num_shots, model_name):
    """Train BERT model with frozen backbone (few-shot learning)."""

    # Create frozen model
    model, trainable_params, total_params = create_frozen_model(model_name)

    # Setup output directory
    output_dir = f"results_bert_fewshot_{num_shots}shots"

    # Training arguments - DIFFERENT from full fine-tuning
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        learning_rate=5e-4,  # Higher LR since we're only training the head
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=10,  # More epochs for few-shot
        weight_decay=0.01,
        fp16=torch.cuda.is_available(),
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        push_to_hub=False,
        logging_steps=50,
        greater_is_better=True,
        warmup_ratio=0.1
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics_fn
    )

    # Start carbon tracking
    tracker = EmissionsTracker(
        project_name=f"BERT_FewShot_{num_shots}shots",
        output_dir=output_dir,
        save_to_file=True,
        log_level="info"
    )
    tracker.start()

    # Train
    print(f"\nüöÄ Training few-shot model ({num_shots} examples)...")
    train_results = trainer.train()

    # Stop tracking and get detailed emissions data
    emissions_kg = tracker.stop()
    emissions_data = tracker.final_emissions_data

    return trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params

# STEP 2: Evaluating The Few-shot Model On Different Shot Sizes

In [45]:


def evaluate_and_save_fewshot(trainer, train_results, emissions_data, output_dir,
                               num_shots, trainable_params, total_params):
    """Evaluate few-shot model and save results."""

    print("üìä Evaluating few-shot model...")
    eval_results = trainer.evaluate()

    trainable_percentage = 100 * trainable_params / total_params

    # Compile results
    result_entry = {
        "training_method": "Few-Shot (Frozen Backbone)",
        "model_name": "BERT",
        "num_shots": num_shots,
        "train_samples": num_shots,
        "valid_samples": len(tokenized_validation),
        "trainable_params": trainable_params,
        "total_params": total_params,
        "trainable_percentage": trainable_percentage,

        # Performance
        "f1_score": eval_results["eval_f1"],
        "exact_match": eval_results["eval_exact_match"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_hours": train_results.metrics["train_runtime"] / 3600,

        # Emissions
        "timestamp": emissions_data.timestamp,
        "duration_seconds": emissions_data.duration,
        "duration_hours": emissions_data.duration / 3600,
        "emissions_kg": emissions_data.emissions,
        "emissions_rate_kg_per_s": emissions_data.emissions_rate,

        # Energy
        "energy_consumed_kwh": emissions_data.energy_consumed,
        "cpu_energy_kwh": emissions_data.cpu_energy,
        "gpu_energy_kwh": emissions_data.gpu_energy,
        "ram_energy_kwh": emissions_data.ram_energy,

        # Power
        "cpu_power_w": emissions_data.cpu_power,
        "gpu_power_w": emissions_data.gpu_power,
        "ram_power_w": emissions_data.ram_power,

        # Location
        "country_name": emissions_data.country_name,
        "country_iso_code": emissions_data.country_iso_code,
        "region": emissions_data.region,
        "cloud_provider": emissions_data.cloud_provider,
        "cloud_region": emissions_data.cloud_region,
        "on_cloud": emissions_data.on_cloud,

        # System
        "os": emissions_data.os,
        "python_version": emissions_data.python_version,
        "cpu_model": emissions_data.cpu_model,
        "cpu_count": emissions_data.cpu_count,
        "gpu_model": emissions_data.gpu_model,
        "gpu_count": emissions_data.gpu_count,
        "ram_total_size_gb": emissions_data.ram_total_size,

        # Additional
        "pue": emissions_data.pue,
        "codecarbon_version": emissions_data.codecarbon_version,
    }

    # Print summary
    print(f"\n{'='*80}")
    print(f"  FEW-SHOT LEARNING RESULTS ({num_shots} examples)")
    print(f"{'='*80}")
    print(f"\nüì¶ Model Configuration:")
    print(f"   Training Method: Few-Shot (Frozen Backbone)")
    print(f"   Training Examples: {num_shots}")
    print(f"   Trainable Parameters: {trainable_params:,} ({trainable_percentage:.4f}%)")
    print(f"   Frozen Parameters: {total_params - trainable_params:,}")

    print(f"\nüìà Performance:")
    print(f"   F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"   Exact Match: {eval_results['eval_exact_match']:.4f}")
    print(f"   Eval Loss: {eval_results['eval_loss']:.4f}")

    print(f"\n‚ö° Energy:")
    print(f"   Total: {emissions_data.energy_consumed:.6f} kWh")
    if emissions_data.energy_consumed > 0:
        print(f"   GPU: {emissions_data.gpu_energy:.6f} kWh ({emissions_data.gpu_energy/emissions_data.energy_consumed*100:.1f}%)")
        print(f"   CPU: {emissions_data.cpu_energy:.6f} kWh ({emissions_data.cpu_energy/emissions_data.energy_consumed*100:.1f}%)")

    print(f"\nüåç Carbon:")
    print(f"   CO‚ÇÇ Emissions: {emissions_data.emissions:.6f} kg")
    print(f"   Training Time: {train_results.metrics['train_runtime']/3600:.2f} hours")
    print(f"{'='*80}")

    # Save model
    trainer.save_model(f"{output_dir}/final_model")
    print(f"‚úÖ Model saved to {output_dir}/final_model")

    # Clean up
    del trainer.model
    del trainer
    torch.cuda.empty_cache()

    return result_entry

def run_fewshot_experiment(num_shots, train_data, eval_data, tokenizer, preprocess_fn,
                           compute_metrics_fn, model_name):
    """Run complete few-shot learning experiment."""

    print(f"\n{'='*60}")
    print(f"  Few-Shot Learning with {num_shots} examples")
    print(f"{'='*60}")

    # Step 1: Prepare few-shot dataset
    tokenized_train, num_shots = prepare_fewshot_dataset(train_data, num_shots, preprocess_fn)

    # Step 2: Train with frozen backbone
    trainer, train_results, emissions_data, output_dir, model, trainable_params, total_params = train_fewshot_model(
        tokenized_train, eval_data, tokenizer, compute_metrics_fn,
        num_shots, model_name
    )

    # Step 3: Evaluate and save
    result_entry = evaluate_and_save_fewshot(
        trainer, train_results, emissions_data, output_dir,
        num_shots, trainable_params, total_params
    )

    return result_entry

# STEP 3: Few-shot Training Function

In [46]:
result_fewshot = []

# Experiment 1: 100-shot
print("\n" + "="*80)
print("  EXPERIMENT 1: 100-shot Learning")
print("="*80)

result_100 = run_fewshot_experiment(
    num_shots=100,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_100)

# Experiment 2: 500-shot
print("\n" + "="*80)
print("  EXPERIMENT 2: 500-shot Learning")
print("="*80)

result_500 = run_fewshot_experiment(
    num_shots=500,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_500)

# Experiment 3: 1000-shot
print("\n" + "="*80)
print("  EXPERIMENT 3: 1000-shot Learning")
print("="*80)

result_1000 = run_fewshot_experiment(
    num_shots=1000,
    train_data=squad["train"],
    eval_data=tokenized_validation,
    tokenizer=tokenizer,
    preprocess_fn=preprocess_function,
    compute_metrics_fn=compute_metrics,
    model_name="bert-base-uncased"
)
result_fewshot.append(result_1000)



  EXPERIMENT 1: 100-shot Learning

  Few-Shot Learning with 100 examples
üéØ Creating few-shot dataset with 100 examples...


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

   Original examples: 100
   After tokenization (with sliding window): 100 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:50:45] [setup] RAM Tracking...
[codecarbon INFO @ 04:50:45] [setup] CPU Tracking...



üì¶ Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:50:47] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:50:47] [setup] GPU Tracking...
[codecarbon INFO @ 04:50:47] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:50:47] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:50:47] >>> Tracker's metadata:
[codecarbon INFO @ 04:50:47]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:50:47]   Python version: 3.12.12
[codecarbon INFO @ 04:50:47]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:50:47]   Available RAM : 167.052 GB
[codecarbon INFO @ 04:50:47]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:50:


üöÄ Training few-shot model (100 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.924752,0.0,0.014575
2,No log,5.855825,0.0,0.0127
3,No log,5.808675,0.000165,0.012308
4,No log,5.774853,0.000577,0.012367
5,No log,5.749101,0.000742,0.01291
6,No log,5.727791,0.000742,0.012797
7,No log,5.712149,0.000742,0.013081
8,5.625500,5.701196,0.000742,0.013084
9,5.625500,5.695786,0.000742,0.013134
10,5.625500,5.693785,0.000742,0.013062


[codecarbon INFO @ 04:50:54] Energy consumed for RAM : 0.074954 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:50:54] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:50:54] Energy consumed for All CPU : 0.063723 kWh
[codecarbon INFO @ 04:50:54] Energy consumed for all GPUs : 0.256585 kWh. Total GPU Power : 101.25730933709399 W
[codecarbon INFO @ 04:50:54] 0.395263 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:50:54] 0.020644 g.CO2eq/s mean an estimation of 651.0169095307442 kg.CO2eq/year
[codecarbon INFO @ 04:51:03] Energy consumed for RAM : 0.000209 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:51:03] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:51:03] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:51:03] Energy consumed for all GPUs : 0.000806 kWh. Total GPU Power : 193.14858855392205 W
[codecarbon INFO @ 04:51:03] 0.0

üìä Evaluating few-shot model...


[codecarbon INFO @ 04:54:39] Energy consumed for RAM : 0.078077 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:54:39] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:54:39] Energy consumed for All CPU : 0.066379 kWh
[codecarbon INFO @ 04:54:39] Energy consumed for all GPUs : 0.269451 kWh. Total GPU Power : 203.16342907277226 W
[codecarbon INFO @ 04:54:39] 0.413907 kWh of electricity and 0.000000 L of water were used since the beginning.



  FEW-SHOT LEARNING RESULTS (100 examples)

üì¶ Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 100
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

üìà Performance:
   F1 Score: 0.0146
   Exact Match: 0.0000
   Eval Loss: 5.9248

‚ö° Energy:
   Total: 0.017912 kWh
   GPU: 0.012341 kWh (68.9%)
   CPU: 0.002560 kWh (14.3%)

üåç Carbon:
   CO‚ÇÇ Emissions: 0.008433 kg
   Training Time: 0.06 hours
‚úÖ Model saved to results_bert_fewshot_100shots/final_model

  EXPERIMENT 2: 500-shot Learning

  Few-Shot Learning with 500 examples
üéØ Creating few-shot dataset with 500 examples...


Map:   0%|          | 0/500 [00:00<?, ? examples/s]

   Original examples: 500
   After tokenization (with sliding window): 527 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:54:48] [setup] RAM Tracking...
[codecarbon INFO @ 04:54:48] [setup] CPU Tracking...



üì¶ Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:54:49] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:54:49] [setup] GPU Tracking...
[codecarbon INFO @ 04:54:49] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:54:49] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:54:49] >>> Tracker's metadata:
[codecarbon INFO @ 04:54:49]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:54:49]   Python version: 3.12.12
[codecarbon INFO @ 04:54:49]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:54:49]   Available RAM : 167.052 GB
[codecarbon INFO @ 04:54:49]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:54:


üöÄ Training few-shot model (500 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,No log,5.75598,0.0,0.012284
2,5.770100,5.371894,0.001401,0.014978
3,5.770100,5.057966,0.002967,0.016482
4,4.997200,4.795357,0.005357,0.01791
5,4.610500,4.569274,0.018461,0.029619
6,4.610500,4.460274,0.027361,0.038394
7,4.360200,4.342306,0.04747,0.058119
8,4.243800,4.287438,0.055711,0.066543
9,4.243800,4.254592,0.061398,0.07197
10,4.177900,4.242732,0.062964,0.073544


[codecarbon INFO @ 04:54:54] Energy consumed for RAM : 0.078286 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:54:54] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:54:54] Energy consumed for All CPU : 0.066556 kWh
[codecarbon INFO @ 04:54:54] Energy consumed for all GPUs : 0.270103 kWh. Total GPU Power : 156.53881007613268 W
[codecarbon INFO @ 04:54:54] 0.414944 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:54:54] 0.038316 g.CO2eq/s mean an estimation of 1,208.3394693046523 kg.CO2eq/year
[codecarbon INFO @ 04:55:06] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:55:06] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:55:06] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:55:06] Energy consumed for all GPUs : 0.000820 kWh. Total GPU Power : 196.80410673068698 W
[codecarbon INFO @ 04:55:06] 0

üìä Evaluating few-shot model...


[codecarbon INFO @ 04:58:39] Energy consumed for RAM : 0.081410 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:58:39] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:58:39] Energy consumed for All CPU : 0.069211 kWh
[codecarbon INFO @ 04:58:39] Energy consumed for all GPUs : 0.283008 kWh. Total GPU Power : 204.82361456618924 W
[codecarbon INFO @ 04:58:39] 0.433629 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:58:54] Energy consumed for RAM : 0.081618 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:58:54] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:58:54] Energy consumed for All CPU : 0.069389 kWh
[codecarbon INFO @ 04:58:54] Energy consumed for all GPUs : 0.283893 kWh. Total GPU Power : 212.27084237372742 W
[codecarbon INFO @ 04:58:54] 0.434899 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:


  FEW-SHOT LEARNING RESULTS (500 examples)

üì¶ Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 500
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

üìà Performance:
   F1 Score: 0.0735
   Exact Match: 0.0630
   Eval Loss: 4.2427

‚ö° Energy:
   Total: 0.018724 kWh
   GPU: 0.012918 kWh (69.0%)
   CPU: 0.002668 kWh (14.3%)

üåç Carbon:
   CO‚ÇÇ Emissions: 0.008815 kg
   Training Time: 0.06 hours
‚úÖ Model saved to results_bert_fewshot_500shots/final_model

  EXPERIMENT 3: 1000-shot Learning

  Few-Shot Learning with 1000 examples
üéØ Creating few-shot dataset with 1000 examples...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

   Original examples: 1000
   After tokenization (with sliding window): 1027 samples


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.

[codecarbon INFO @ 04:58:59] [setup] RAM Tracking...
[codecarbon INFO @ 04:58:59] [setup] CPU Tracking...



üì¶ Model Configuration:
   Total Parameters: 108,893,186
   Trainable Parameters: 1,538
   Frozen Parameters: 108,891,648
   Trainable Percentage: 0.0014%


 Linux OS detected: Please ensure RAPL files exist, and are readable, at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 04:59:01] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 04:59:01] [setup] GPU Tracking...
[codecarbon INFO @ 04:59:01] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 04:59:01] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: pynvml
            
[codecarbon INFO @ 04:59:01] >>> Tracker's metadata:
[codecarbon INFO @ 04:59:01]   Platform system: Linux-6.6.105+-x86_64-with-glibc2.35
[codecarbon INFO @ 04:59:01]   Python version: 3.12.12
[codecarbon INFO @ 04:59:01]   CodeCarbon version: 3.2.0
[codecarbon INFO @ 04:59:01]   Available RAM : 167.052 GB
[codecarbon INFO @ 04:59:01]   CPU count: 12 thread(s) in 1 physical CPU(s)
[codecarbon INFO @ 04:59:


üöÄ Training few-shot model (1000 examples)...


Epoch,Training Loss,Validation Loss,Exact Match,F1
1,5.9013,5.600582,0.000659,0.013632
2,5.189,5.104867,0.004285,0.017886
3,4.6614,4.710816,0.005934,0.02163
4,4.1831,4.469881,0.007747,0.023964
5,4.0879,4.324586,0.008818,0.025534
6,3.9962,4.237072,0.011043,0.027836
7,3.8591,4.189314,0.01228,0.029932
8,3.8485,4.15816,0.013351,0.031304
9,3.8366,4.13944,0.013763,0.03189
10,3.7789,4.138565,0.013763,0.031691


[codecarbon INFO @ 04:59:09] Energy consumed for RAM : 0.081826 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:59:09] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:59:09] Energy consumed for All CPU : 0.069566 kWh
[codecarbon INFO @ 04:59:09] Energy consumed for all GPUs : 0.284532 kWh. Total GPU Power : 153.25851142057635 W
[codecarbon INFO @ 04:59:09] 0.435924 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:59:17] Energy consumed for RAM : 0.000208 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 04:59:17] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 04:59:17] Energy consumed for All CPU : 0.000177 kWh
[codecarbon INFO @ 04:59:17] Energy consumed for all GPUs : 0.000828 kWh. Total GPU Power : 198.55349976473596 W
[codecarbon INFO @ 04:59:17] 0.001214 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 04:

üìä Evaluating few-shot model...


[codecarbon INFO @ 05:03:09] Energy consumed for RAM : 0.085160 kWh. RAM Power : 50.0 W
[codecarbon INFO @ 05:03:09] Delta energy consumed for CPU with constant : 0.000177 kWh, power : 42.5 W
[codecarbon INFO @ 05:03:09] Energy consumed for All CPU : 0.072400 kWh
[codecarbon INFO @ 05:03:09] Energy consumed for all GPUs : 0.298286 kWh. Total GPU Power : 203.96594628020665 W
[codecarbon INFO @ 05:03:09] 0.455846 kWh of electricity and 0.000000 L of water were used since the beginning.



  FEW-SHOT LEARNING RESULTS (1000 examples)

üì¶ Model Configuration:
   Training Method: Few-Shot (Frozen Backbone)
   Training Examples: 1000
   Trainable Parameters: 1,538 (0.0014%)
   Frozen Parameters: 108,891,648

üìà Performance:
   F1 Score: 0.0319
   Exact Match: 0.0138
   Eval Loss: 4.1394

‚ö° Energy:
   Total: 0.019472 kWh
   GPU: 0.013425 kWh (68.9%)
   CPU: 0.002779 kWh (14.3%)

üåç Carbon:
   CO‚ÇÇ Emissions: 0.009167 kg
   Training Time: 0.07 hours
‚úÖ Model saved to results_bert_fewshot_1000shots/final_model


# STEP 4: Results and Analysis

In [47]:
# Create summary DataFrame
results_df_fewshot = pd.DataFrame(result_fewshot)

print("\n" + "="*60)
print("  FEW-SHOT LEARNING RESULTS SUMMARY")
print("="*60)
print(results_df_fewshot[['num_shots', 'trainable_percentage', 'f1_score',
                          'exact_match', 'emissions_kg', 'training_time_hours']].to_string(index=False))

# Save to CSV
results_df_fewshot.to_csv("/content/drive/MyDrive/bert_fewshot_results.csv", index=False)
print("\n‚úÖ Few-shot results saved!")


  FEW-SHOT LEARNING RESULTS SUMMARY
 num_shots  trainable_percentage  f1_score  exact_match  emissions_kg  training_time_hours
       100              0.001412  0.014575     0.000000      0.008433             0.060057
       500              0.001412  0.073544     0.062964      0.008815             0.062605
      1000              0.001412  0.031890     0.013763      0.009167             0.065212

‚úÖ Few-shot results saved!


# FEW-SHOT EFFICIENCY ANALYSIS

In [48]:
print("\n" + "="*80)
print("  FEW-SHOT EFFICIENCY ANALYSIS")
print("="*80)

# Use 500-shot as baseline (middle ground)
baseline = results_df_fewshot[results_df_fewshot['num_shots'] == 500].iloc[0]

for _, row in results_df_fewshot.iterrows():
    shots = row['num_shots']
    samples_ratio = row['num_shots'] / baseline['num_shots']
    f1_diff = row['f1_score'] - baseline['f1_score']
    emissions_diff = row['emissions_kg'] - baseline['emissions_kg']
    time_diff = row['training_time_hours'] - baseline['training_time_hours']

    print(f"\n{shots}-Shot Learning:")
    print(f"  Training Examples: {row['num_shots']:,}")
    print(f"  Trainable Params: {row['trainable_params']:,} ({row['trainable_percentage']:.4f}%)")
    print(f"  vs 500-shot: {samples_ratio:.2f}x training data")
    print(f"  F1 Score: {row['f1_score']:.4f} ({f1_diff:+.4f} vs 500-shot)")
    print(f"  Emissions: {row['emissions_kg']:.6f} kg ({emissions_diff:+.6f} vs 500-shot)")
    print(f"  Training Time: {row['training_time_hours']:.2f} hours ({time_diff:+.2f} vs 500-shot)")

    # Efficiency metrics
    efficiency_co2 = row['f1_score'] / row['emissions_kg'] if row['emissions_kg'] > 0 else 0
    efficiency_time = row['f1_score'] / row['training_time_hours'] if row['training_time_hours'] > 0 else 0
    efficiency_samples = row['f1_score'] / row['num_shots'] if row['num_shots'] > 0 else 0

    print(f"  Efficiency (F1/kg CO‚ÇÇ): {efficiency_co2:.2f}")
    print(f"  Efficiency (F1/hour): {efficiency_time:.4f}")
    print(f"  Efficiency (F1/sample): {efficiency_samples:.6f}")



  FEW-SHOT EFFICIENCY ANALYSIS

100-Shot Learning:
  Training Examples: 100
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 0.20x training data
  F1 Score: 0.0146 (-0.0590 vs 500-shot)
  Emissions: 0.008433 kg (-0.000383 vs 500-shot)
  Training Time: 0.06 hours (-0.00 vs 500-shot)
  Efficiency (F1/kg CO‚ÇÇ): 1.73
  Efficiency (F1/hour): 0.2427
  Efficiency (F1/sample): 0.000146

500-Shot Learning:
  Training Examples: 500
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 1.00x training data
  F1 Score: 0.0735 (+0.0000 vs 500-shot)
  Emissions: 0.008815 kg (+0.000000 vs 500-shot)
  Training Time: 0.06 hours (+0.00 vs 500-shot)
  Efficiency (F1/kg CO‚ÇÇ): 8.34
  Efficiency (F1/hour): 1.1747
  Efficiency (F1/sample): 0.000147

1000-Shot Learning:
  Training Examples: 1,000
  Trainable Params: 1,538 (0.0014%)
  vs 500-shot: 2.00x training data
  F1 Score: 0.0319 (-0.0417 vs 500-shot)
  Emissions: 0.009167 kg (+0.000352 vs 500-shot)
  Training Time: 0.07 hours (+0.00 vs 500-shot)
  Eff

# Few-shot Visualizations

In [49]:
# PLOT 1: Few-Shot Energy Consumption by Shots
print("\nüìä Creating Few-Shot Energy Plot...")
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_energy = go.Figure()

fig_fewshot_energy.add_trace(go.Bar(
    name='CPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['cpu_energy_kwh'],
    marker_color='#FF6B6B',
    hovertemplate='<b>CPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='GPU Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['gpu_energy_kwh'],
    marker_color='#4ECDC4',
    hovertemplate='<b>GPU Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.add_trace(go.Bar(
    name='RAM Energy',
    x=df_sorted_fewshot['num_shots'],
    y=df_sorted_fewshot['ram_energy_kwh'],
    marker_color='#95E1D3',
    hovertemplate='<b>RAM Energy</b><br>%{y:.6f} kWh<br>Shots: %{x}<extra></extra>'
))

fig_fewshot_energy.update_layout(
    title=dict(text="Few-Shot: Energy Consumption by Number of Examples", font=dict(size=18)),
    xaxis_title='Number of Training Examples',
    yaxis_title='Energy Consumption (kWh)',
    barmode='stack',
    template='plotly_white',
    height=500,
    font=dict(size=13),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='x unified'
)

fig_fewshot_energy.show()
fig_fewshot_energy.write_html("/content/drive/MyDrive/fewshot_energy_by_shots.html")
print("‚úÖ Few-Shot Energy Plot saved: fewshot_energy_by_shots.html")

# PLOT 2: Few-Shot Performance & Emissions by Shots (Dual Y-axis)
print("\nüìä Creating Few-Shot Performance vs Emissions Plot...")
df_sorted_fewshot = results_df_fewshot.sort_values('num_shots')

fig_fewshot_perf = make_subplots(specs=[[{"secondary_y": True}]])

# F1 Score line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['f1_score'],
        name='F1 Score',
        mode='lines+markers',
        line=dict(color='#4ECDC4', width=3),
        marker=dict(size=12, line=dict(width=2, color='white')),
        hovertemplate='<b>F1 Score</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# Exact Match line
fig_fewshot_perf.add_trace(
    go.Scatter(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['exact_match'],
        name='Exact Match',
        mode='lines+markers',
        line=dict(color='#95E1D3', width=3, dash='dash'),
        marker=dict(size=10),
        hovertemplate='<b>Exact Match</b>: %{y:.4f}<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=False
)

# CO2 Emissions bar
fig_fewshot_perf.add_trace(
    go.Bar(
        x=df_sorted_fewshot['num_shots'],
        y=df_sorted_fewshot['emissions_kg'],
        name='CO‚ÇÇ Emissions',
        marker_color='#FF6B6B',
        opacity=0.6,
        hovertemplate='<b>CO‚ÇÇ</b>: %{y:.6f} kg<br>Shots: %{x}<extra></extra>'
    ),
    secondary_y=True
)

fig_fewshot_perf.update_xaxes(title_text="Number of Training Examples")
fig_fewshot_perf.update_yaxes(title_text="Performance Score", secondary_y=False)
fig_fewshot_perf.update_yaxes(title_text="CO‚ÇÇ Emissions (kg)", secondary_y=True)

fig_fewshot_perf.update_layout(
    title=dict(text="Few-Shot: Performance vs Carbon Emissions", font=dict(size=18)),
    template='plotly_white',
    height=500,
    font=dict(size=13),
    hovermode='x unified',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig_fewshot_perf.show()
fig_fewshot_perf.write_html("/content/drive/MyDrive/fewshot_performance_emissions.html")
print("‚úÖ Few-Shot Performance Plot saved: fewshot_performance_emissions.html")




üìä Creating Few-Shot Energy Plot...


‚úÖ Few-Shot Energy Plot saved: fewshot_energy_by_shots.html

üìä Creating Few-Shot Performance vs Emissions Plot...


‚úÖ Few-Shot Performance Plot saved: fewshot_performance_emissions.html
