<a href="https://colab.research.google.com/github/rizkisyaf/zdml/blob/main/notebooks/train_bitcoin_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bitcoin Futures Prediction Model Training

This notebook trains the LSTM model for Bitcoin futures prediction using Google Colab's GPU and pushes it to Hugging Face Hub.

## Overview
1. Setup Environment
2. Load and Preprocess Data
3. Train Model
4. Push to Hugging Face Hub

Make sure you have:
- Your Hugging Face token ready
- The order book data file (`futures_orderbook_data.csv`)
- Access to your GitHub repository

In [None]:
# Install required packages
!pip install transformers huggingface_hub gradio torch pandas numpy scikit-learn matplotlib seaborn tqdm pywavelets tensorboard pytest jupyter

# Clone your repository (replace with your repo URL)
!git clone https://github.com/your-username/DataResearch.git
%cd DataResearch

# Import required libraries
import torch
import pandas as pd
import numpy as np
from huggingface_hub import login
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Check GPU availability
print(f"GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

## Data Loading and Preprocessing

Upload your order book data and prepare it for training.

In [None]:
# Upload data file
from google.colab import files
print("Please upload futures_orderbook_data.csv")
uploaded = files.upload()

# Save the uploaded file
!mv futures_orderbook_data.csv data/futures_orderbook_data.csv

# Load and preprocess data
from src.data.preprocessing import load_orderbook_data, preprocess_data, prepare_training_data, save_processed_data

# Load data
df = load_orderbook_data('data/futures_orderbook_data.csv')
print(f"Loaded {len(df)} rows of order book data")

# Preprocess data
features_df = preprocess_data(df, window_size=20)
print(f"Preprocessed data shape: {features_df.shape}")

# Display first few rows of key features
key_features = ['mid_price', 'weighted_mid_price', 'spread', 'imbalance']
print("
First few rows of key features:")
print(features_df[key_features].head())

## Data Preparation

Prepare sequences for LSTM training and split into train/val/test sets.

In [None]:
# Prepare training data
data = prepare_training_data(
    features_df,
    sequence_length=100,
    prediction_horizon=1,
    test_size=0.2,
    val_size=0.1
)

# Save processed data
save_processed_data(data, 'processed_data.pkl')

print("Data shapes:")
print(f"X_train: {data['X_train'].shape}")
print(f"y_train: {data['y_train'].shape}")
print(f"X_val: {data['X_val'].shape}")
print(f"y_val: {data['y_val'].shape}")
print(f"X_test: {data['X_test'].shape}")
print(f"y_test: {data['y_test'].shape}")

## Model Training

Train the LSTM model using GPU acceleration.

In [None]:
# Login to Hugging Face
from huggingface_hub import login
print("Please enter your Hugging Face token when prompted")
login()

In [None]:
# Train and push model
from train_on_colab import train_and_push

# Start training
results = train_and_push()

print("
Training completed!")
print(f"Model saved to: {results['model_path']}")
print("
Test metrics:")
for k, v in results['test_metrics'].items():
    print(f"{k}: {v:.4f}")