# XARELLO Training on Google Colab

This notebook trains the XARELLO Q-learning adversarial attacker against a BiLSTM victim model.

## Prerequisites
- Upload `dev.tsv` and `BiLSTM-512.pth` when prompted
- Runtime > Change runtime type > GPU (recommended) or CPU

## Step 1: Install Python 3.10 (Required - OpenAttack doesn't support Python 3.12)

Run this cell, then **restart runtime** (Runtime → Restart runtime) before continuing.

In [None]:
# Install conda and Python 3.10
!wget -qO- https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
!./bin/micromamba create -n py310 python=3.10 -y -c conda-forge
!./bin/micromamba run -n py310 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
!./bin/micromamba run -n py310 pip install transformers==4.38.1 huggingface-hub==0.21.0
!./bin/micromamba run -n py310 pip install gymnasium fasttext OpenAttack datasets peft psutil matplotlib

# Verify
!./bin/micromamba run -n py310 python --version

## Step 2: Clone repositories

In [None]:
# Clone BODEGA (your fork, playground branch)
!git clone -b playground https://github.com/marti-farre/BODEGA.git

# Clone XARELLO (your fork, testing-stuff branch)
!git clone -b testing-stuff https://github.com/marti-farre/xarello.git

## Step 3: Install dependencies

In [None]:
# Install compatible versions
!pip install transformers==4.38.1 huggingface-hub==0.21.0
!pip install gymnasium fasttext OpenAttack datasets peft

# Verify versions
import transformers
import huggingface_hub
print(f"transformers: {transformers.__version__}")
print(f"huggingface_hub: {huggingface_hub.__version__}")

## Step 4: Upload your data files

Run this cell and upload:
1. `dev.tsv` - training data
2. `BiLSTM-512.pth` - your trained victim model

In [None]:
from google.colab import files
import os

# Create data directory structure
os.makedirs('/root/data/BODEGA/RD', exist_ok=True)
os.makedirs('/root/data/xarello/models/wide/RD-BiLSTM', exist_ok=True)

print("Please upload dev.tsv and BiLSTM-512.pth")
uploaded = files.upload()

# Move files to correct location
for filename in uploaded.keys():
    if filename == 'dev.tsv':
        os.rename(filename, '/root/data/BODEGA/RD/dev.tsv')
        print(f"Moved {filename} to /root/data/BODEGA/RD/")
    elif filename.endswith('.pth'):
        os.rename(filename, '/root/data/BODEGA/RD/BiLSTM-512.pth')
        print(f"Moved {filename} to /root/data/BODEGA/RD/")

# Verify
print("\nFiles in /root/data/BODEGA/RD/:")
!ls -la /root/data/BODEGA/RD/

## Step 5: Set up environment

In [None]:
import os

# Create data directory structure
os.makedirs('/root/data/BODEGA/RD', exist_ok=True)
os.makedirs('/root/data/xarello/models/wide/RD-BiLSTM', exist_ok=True)

# Set HOME for data paths (XARELLO uses pathlib.Path.home())
os.environ['HOME'] = '/root'

print(f"HOME: {os.environ['HOME']}")
print("Data directories created.")

## Step 6: Upload your data files

Run this cell and upload:
1. `dev.tsv` - training data
2. `BiLSTM-512.pth` - your trained victim model

from google.colab import files
import os

print("Please upload dev.tsv and BiLSTM-512.pth")
uploaded = files.upload()

# Move files to correct location
for filename in uploaded.keys():
    if filename == 'dev.tsv':
        os.rename(filename, '/root/data/BODEGA/RD/dev.tsv')
        print(f"Moved {filename} to /root/data/BODEGA/RD/")
    elif filename.endswith('.pth'):
        os.rename(filename, '/root/data/BODEGA/RD/BiLSTM-512.pth')
        print(f"Moved {filename} to /root/data/BODEGA/RD/")

# Verify
print("\nFiles in /root/data/BODEGA/RD/:")
!ls -la /root/data/BODEGA/RD/

## Step 7: Train XARELLO

This will take ~1-2 hours with GPU, longer with CPU.

Training parameters (from your code):
- TRAIN_SIZE: 3200 samples
- EVAL_SIZE: 400 samples  
- MAX_EPOCHS: 20

# Run training with Python 3.10
!./bin/micromamba run -n py310 bash -c "cd /content/xarello && PYTHONPATH=/content/BODEGA python main-train-eval.py RD BiLSTM /root/data/xarello/models/wide/RD-BiLSTM"

In [None]:
# Run training with Python 3.10
!./bin/micromamba run -n py310 bash -c "cd /content/xarello && PYTHONPATH=/content/BODEGA python main-train-eval.py RD BiLSTM /root/data/xarello/models/wide/RD-BiLSTM"

## Step 8: Download trained model

Download the trained Q-model to use locally

In [None]:
from google.colab import files
import os

model_dir = '/root/data/xarello/models/wide/RD-BiLSTM'

print("Files in model directory:")
!ls -la {model_dir}

# Download the main model file
model_path = f'{model_dir}/xarello-qmodel.pth'
if os.path.exists(model_path):
    print(f"\nDownloading {model_path}...")
    files.download(model_path)
else:
    print("Model file not found. Check training output above.")

## Step 9: Download training plots (optional)

In [None]:
from google.colab import files
import glob

# Download all PDF plots
plot_dir = '/root/data/xarello/models/wide/RD-BiLSTM'
pdfs = glob.glob(f'{plot_dir}/*.pdf')

print(f"Found {len(pdfs)} plot files")
for pdf in pdfs:
    print(f"Downloading {pdf}...")
    files.download(pdf)

---

## Done!

You now have:
1. `xarello-qmodel.pth` - trained Q-model
2. Training plots (PDFs)

### To use locally:

1. Place `xarello-qmodel.pth` at:
   ```
   ~/data/xarello/models/wide/RD-BiLSTM/xarello-qmodel.pth
   ```

2. Run evaluation:
   ```bash
   cd /mnt/c/Users/usuari/Documents/Acadèmic/UPF/TFM/xarello
   export PYTHONPATH="/mnt/c/Users/usuari/Documents/Acadèmic/UPF/TFM/BODEGA:$PYTHONPATH"
   python evaluation/attack.py RD true XARELLO BiLSTM \
       ~/data/BODEGA/RD ~/data/BODEGA/RD/BiLSTM-512.pth ~/data/xarello/results
   ```