# 1) Select Colab runtime and verify accelerator

- In Colab: Runtime → Change runtime type → Hardware accelerator: GPU
- Then run the next two cells to verify GPU and Torch

If you see "No GPU detected", re-check the runtime setting.

# TFTmodel – Train on Google Colab

This notebook prepares the environment, installs dependencies, verifies the pipeline, and trains the Temporal Fusion Transformer (TFT) model on GPU. Run cells top-to-bottom. If something fails, re-run the failed cell after fixing the issue.
%%bash
nvidia-smi || true
import torch
print('Torch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('CUDA device:', torch.cuda.get_device_name(0))
else:
    print('No GPU detected. In Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU')
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Set your project directory inside Drive
PROJECT_DIR = '/content/drive/MyDrive/TFTmodelAI'
WORK_DIR = PROJECT_DIR  # Train in-place so artifacts persist to Drive

# Create the directory if it doesn't exist yet (you can upload your project there)
os.makedirs(PROJECT_DIR, exist_ok=True)
print('Project dir:', PROJECT_DIR)
%cd $WORK_DIR
!ls -la
%%bash
set -e
python -V

# Upgrade core tooling
python -m pip install -U pip setuptools wheel

# Install CUDA-enabled PyTorch (Colab typically uses CUDA 12.1)
pip install --quiet torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 || true

# Project dependencies
if [ -f requirements.txt ]; then
  pip install --quiet -r requirements.txt
fi

# Ensure these are present if not already pinned
pip install --quiet pytorch-lightning pytorch-forecasting optuna pytorch_optimizer rich scikit-learn

# Option A: If your project is on GitHub (public/private)
# !git clone https://github.com/emiflair/TFTmodel.git /content/TFTmodel
# %cd /content/TFTmodel

# Option B: Upload project folder to Drive under MyDrive/TFTmodelAI (recommended)
# Then run the notebook as-is; it will operate in-place so outputs persist to Drive.

In [None]:
# Verify training setup
# This will print a summary and check data file availability
%run verify_training_setup.py

In [None]:
# Start training (this can take a while; ensure GPU is ON)
# Artifacts (checkpoints, scalers, manifests, metrics) will be saved under artifacts/
%%bash
set -e
python -m src.training.train_tft

In [None]:
# List produced artifacts and show latest checkpoint
from pathlib import Path
art = Path('artifacts')
ckpts = sorted((art / 'checkpoints').glob('*.ckpt'))
for p in ckpts[-10:]:
    print(p)
print('\nLatest checkpoint:' , ckpts[-1] if ckpts else 'None found')
print('Scalers:', len(list((art / 'scalers').glob('*.pkl'))))
print('Manifests:', list((art / 'manifests').glob('*.json')))
print('Metrics files:', len(list((art / 'metrics').glob('*'))))

# Troubleshooting notes

- If verify step fails on missing data file: upload `XAUUSD_15M.csv` to the project root (same folder as this notebook) or to your Drive project folder.
- If pip conflicts occur: restart runtime after installs (Runtime → Restart runtime), then re-run from the install cell.
- If CUDA mismatch: the torch install command uses CUDA 12.1 wheels; if your Colab runtime differs, adjust the index-url accordingly.
- To resume training later: re-run until the training cell; artifacts will persist in Drive.

In [None]:
# Long training with resume across sessions (recommended)
# This resumes from the 'last' checkpoint if present, otherwise starts fresh.
%%bash
set -e
python -m src.training.train_tft --resume last