# Fine-Tuning Indonesian NLI Models with Experiment Tracking

This notebook fine-tunes two models (IndoRoBERTa and IndoBERT) on the Indonesian NLI (IndoNLI) dataset. It integrates MLFlow, TensorBoard, and DVC for comprehensive experiment tracking. It is designed to run on Google Colab.

## 1. Setup and Dependencies

In [None]:
!pip install --upgrade torch torchvision torchaudio mlflow transformers datasets scikit-learn accelerate evaluate tensorboard dvc[gdrive] pyngrok Pillow

## 2. Imports and Configuration

In [None]:
import torch
import mlflow
import json
import os
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import evaluate
from getpass import getpass
from pyngrok import ngrok

### Configuration Parameters
Set `USE_SMALL_SUBSET` to `True` for a quick test run. Set `NUM_TRAIN_EPOCHS` to control the training duration.

In [None]:
# Set to True for a quick run with a small portion of the data
USE_SMALL_SUBSET = True

# Number of training epochs
NUM_TRAIN_EPOCHS = 1

## 3. Data Loading and Preprocessing

In [None]:
print("Loading IndoNLI dataset...")
dataset = load_dataset("afaji/indonli", trust_remote_code=True)

if USE_SMALL_SUBSET:
    print("Using a small subset of the data for a quick run.")
    dataset["train"] = dataset["train"].select(range(100))
    dataset["validation"] = dataset["validation"].select(range(50))
    dataset["test_lay"] = dataset["test_lay"].select(range(50))

print("Dataset loaded.")
print(dataset)

def preprocess_function(examples, tokenizer):
    return tokenizer(examples["premise"], examples["hypothesis"], truncation=True, padding="max_length")

## 4. Model Fine-Tuning

In [None]:
def run_training():
    # --- Fine-tuning indoreoberta ---
    print("
--- Fine-tuning indoreoberta ---")
    model_name_roberta = "flax-community/indonesian-roberta-base"
    tokenizer_roberta = AutoTokenizer.from_pretrained(model_name_roberta)
    model_roberta = AutoModelForSequenceClassification.from_pretrained(model_name_roberta, num_labels=3)

    tokenized_datasets_roberta = dataset.map(lambda x: preprocess_function(x, tokenizer_roberta), batched=True)

    training_args_roberta = TrainingArguments(
        output_dir="./results_indoreoberta",
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=NUM_TRAIN_EPOCHS,
        weight_decay=0.01,
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="accuracy",
        report_to=["mlflow", "tensorboard"],
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = torch.argmax(torch.tensor(logits), dim=-1)
        return metric.compute(predictions=predictions, references=labels)

    trainer_roberta = Trainer(
        model=model_roberta,
        args=training_args_roberta,
        train_dataset=tokenized_datasets_roberta["train"],
        eval_dataset=tokenized_datasets_roberta["validation"],
        tokenizer=tokenizer_roberta,
        compute_metrics=compute_metrics,
    )

    mlflow.start_run(run_name="indoreoberta-finetune")
    trainer_roberta.train()
    eval_results_roberta = trainer_roberta.evaluate()
    mlflow.log_metrics(eval_results_roberta)
    os.makedirs("metrics", exist_ok=True)
    with open("metrics/roberta_metrics.json", "w") as f:
        json.dump(eval_results_roberta, f, indent=4)
    mlflow.end_run()

    # --- Fine-tuning indoBERT ---
    print("
--- Fine-tuning indoBERT ---")
    model_name_bert = "indobenchmark/indobert-base-p1"
    tokenizer_bert = AutoTokenizer.from_pretrained(model_name_bert)
    model_bert = AutoModelForSequenceClassification.from_pretrained(model_name_bert, num_labels=3)

    tokenized_datasets_bert = dataset.map(lambda x: preprocess_function(x, tokenizer_bert), batched=True)

    training_args_bert = TrainingArguments(
        output_dir="./results_indobert",
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=NUM_TRAIN_EPOCHS,
        weight_decay=0.01,
        save_total_limit=1,
        load_best_model_at_end=True,
        metric_for_best_model="accuracy",
        report_to=["mlflow", "tensorboard"],
    )

    trainer_bert = Trainer(
        model=model_bert,
        args=training_args_bert,
        train_dataset=tokenized_datasets_bert["train"],
        eval_dataset=tokenized_datasets_bert["validation"],
        tokenizer=tokenizer_bert,
        compute_metrics=compute_metrics,
    )

    mlflow.start_run(run_name="indobert-finetune")
    trainer_bert.train()
    eval_results_bert = trainer_bert.evaluate()
    mlflow.log_metrics(eval_results_bert)
    with open("metrics/bert_metrics.json", "w") as f:
        json.dump(eval_results_bert, f, indent=4)
    mlflow.end_run()

run_training()

## 5. Experiment Tracking

### MLFlow UI
We will use `ngrok` to create a public URL for the MLFlow UI running in the Colab instance.

In [None]:
# Terminate open tunnels if any
ngrok.kill()

# Get your ngrok authtoken from https://dashboard.ngrok.com/get-started/your-authtoken
NGROK_AUTH_TOKEN = getpass('Enter your ngrok authtoken: ')
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Start MLFlow UI in the background
get_ipython().system_raw('mlflow ui --port 5000 &')

# Open a tunnel to the MLFlow UI
public_url = ngrok.connect(5000)
print(f"MLFlow UI is running at: {public_url}")

### TensorBoard
Launch TensorBoard using the magic command.

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs

### DVC
Initialize DVC and track the generated metrics files.

In [None]:
!dvc init -f
!dvc add metrics/roberta_metrics.json metrics/bert_metrics.json

# To track with Google Drive, you would typically run:
# !dvc remote add -d gdrive gdrive://<your_folder_id>
# !dvc push

# For now, let's just see the status
!dvc status