# 🚀 RL Trading Bot - Interactive Tutorial

Welcome to the **BTC/USD Reinforcement Learning Trading Bot** interactive tutorial!

This notebook will walk you through:
1. 📊 **Data Loading & Exploration** - Load and visualize BTC data
2. 🏋️ **Training Environment** - Understand the trading environment
3. 🤖 **Training Quick Demo** - Train a model (quick version)
4. 📈 **Evaluation & Results** - Analyze performance

**Estimated time:** 15-20 minutes

---

## Setup

First, let's import all necessary libraries and configure the environment.

In [None]:
# Standard imports
import sys
import os
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
project_root = os.path.abspath('..')
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Data & ML libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# RL libraries
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

# Our custom modules
from data_manager import load_data, add_technical_indicators, split_data
from trading_env import BtcUsdTradingEnv

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("✅ All imports successful!")
print(f"📁 Project root: {project_root}")

---
## 📊 Part 1: Data Loading & Exploration

Let's load the sample BTC/USDT data and explore it.

In [None]:
# Load sample data
data_path = os.path.join(project_root, 'data', 'sample_btcusdt_1h.csv')
print(f"Loading data from: {data_path}")

df = load_data(data_path)

print(f"\n✅ Loaded {len(df)} samples")
print(f"📅 Date range: {df.index.min()} to {df.index.max()}")
print(f"⏱️  Duration: {(df.index.max() - df.index.min()).days} days")

# Display first few rows
df.head()

### Price Statistics

In [None]:
# Calculate statistics
stats = df['close'].describe()
print("\n💰 Price Statistics (USD):")
print(f"  Min:    ${stats['min']:,.2f}")
print(f"  Max:    ${stats['max']:,.2f}")
print(f"  Mean:   ${stats['mean']:,.2f}")
print(f"  Median: ${stats['50%']:,.2f}")
print(f"  Std:    ${stats['std']:,.2f}")

# Calculate returns
total_return = ((df['close'].iloc[-1] / df['close'].iloc[0]) - 1) * 100
print(f"\n📈 Total Return: {total_return:.2f}%")

### Visualize Price Action

In [None]:
# Plot price
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))

# Price chart
ax1.plot(df.index, df['close'], linewidth=1.5, color='#2E86AB')
ax1.set_title('BTC/USDT Price Action', fontsize=14, fontweight='bold')
ax1.set_ylabel('Price (USD)', fontsize=12)
ax1.grid(True, alpha=0.3)

# Volume chart
ax2.bar(df.index, df['volume'], width=0.03, color='#A23B72', alpha=0.6)
ax2.set_title('Trading Volume', fontsize=14, fontweight='bold')
ax2.set_ylabel('Volume (BTC)', fontsize=12)
ax2.set_xlabel('Date', fontsize=12)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ Price visualization complete")

### Add Technical Indicators

In [None]:
# Add technical indicators
print("Adding technical indicators...")
df_with_indicators = add_technical_indicators(df)

print(f"\n✅ Added {len(df_with_indicators.columns) - 5} indicators")
print(f"📊 Total features: {len(df_with_indicators.columns)}")
print(f"\nAvailable indicators:")
indicator_cols = [col for col in df_with_indicators.columns if col not in ['open', 'high', 'low', 'close', 'volume']]
for col in indicator_cols[:10]:  # Show first 10
    print(f"  - {col}")
print(f"  ... and {len(indicator_cols) - 10} more")

### Visualize Technical Indicators

In [None]:
# Plot RSI and MACD
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Price with Bollinger Bands
ax1.plot(df_with_indicators.index, df_with_indicators['close'], label='Close', linewidth=1.5)
ax1.plot(df_with_indicators.index, df_with_indicators['bb_high'], label='BB High', linestyle='--', alpha=0.7)
ax1.plot(df_with_indicators.index, df_with_indicators['bb_low'], label='BB Low', linestyle='--', alpha=0.7)
ax1.fill_between(df_with_indicators.index, df_with_indicators['bb_low'], df_with_indicators['bb_high'], alpha=0.1)
ax1.set_ylabel('Price (USD)')
ax1.set_title('Price with Bollinger Bands', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# RSI
ax2.plot(df_with_indicators.index, df_with_indicators['rsi_14'], color='purple', linewidth=1.5)
ax2.axhline(y=70, color='r', linestyle='--', alpha=0.5, label='Overbought')
ax2.axhline(y=30, color='g', linestyle='--', alpha=0.5, label='Oversold')
ax2.set_ylabel('RSI')
ax2.set_title('RSI (14)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# MACD
ax3.plot(df_with_indicators.index, df_with_indicators['macd'], label='MACD', linewidth=1.5)
ax3.plot(df_with_indicators.index, df_with_indicators['macd_signal'], label='Signal', linewidth=1.5)
ax3.bar(df_with_indicators.index, df_with_indicators['macd_diff'], label='Histogram', alpha=0.3)
ax3.set_ylabel('MACD')
ax3.set_xlabel('Date')
ax3.set_title('MACD', fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ Technical indicators visualized")

---
## 🏋️ Part 2: Trading Environment

Let's explore the custom Gymnasium trading environment.

In [None]:
# Split data
train_df, val_df, test_df = split_data(df_with_indicators, train_ratio=0.6, val_ratio=0.2)

print(f"📊 Data Split:")
print(f"  Train: {len(train_df)} samples ({len(train_df)/len(df_with_indicators)*100:.1f}%)")
print(f"  Val:   {len(val_df)} samples ({len(val_df)/len(df_with_indicators)*100:.1f}%)")
print(f"  Test:  {len(test_df)} samples ({len(test_df)/len(df_with_indicators)*100:.1f}%)")

In [None]:
# Create environment
env = BtcUsdTradingEnv(
    df=train_df,
    window_size=24,
    initial_cash=10000.0,
    transaction_cost=0.001,
    slippage=0.0005,
    max_steps=100  # Short for demo
)

print("\n✅ Environment created!")
print(f"\n📋 Environment Specs:")
print(f"  Action space: {env.action_space}")
print(f"  Observation space: {env.observation_space}")
print(f"  Initial cash: ${env.initial_cash:,.2f}")
print(f"  Transaction cost: {env.transaction_cost*100:.2f}%")
print(f"  Slippage: {env.slippage*100:.3f}%")

### Test Environment with Random Actions

In [None]:
# Run a random agent to test environment
obs = env.reset()
portfolio_values = []
actions = []

print("Running random agent for 50 steps...\n")

for i in range(50):
    # Take random action
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    
    portfolio_values.append(info['portfolio_value'])
    actions.append(action)
    
    if done:
        break

print(f"✅ Completed {len(portfolio_values)} steps")
print(f"\n📊 Random Agent Results:")
print(f"  Initial value: ${10000:.2f}")
print(f"  Final value: ${portfolio_values[-1]:.2f}")
print(f"  Return: {(portfolio_values[-1]/10000 - 1)*100:.2f}%")

In [None]:
# Plot random agent performance
plt.figure(figsize=(12, 5))
plt.plot(portfolio_values, linewidth=2)
plt.axhline(y=10000, color='r', linestyle='--', alpha=0.5, label='Initial Value')
plt.xlabel('Step')
plt.ylabel('Portfolio Value ($)')
plt.title('Random Agent Portfolio Performance', fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Random agent test complete!")

---
## 🤖 Part 3: Train RL Agent (Quick Demo)

Now let's train a PPO agent! We'll use a very small number of timesteps for demo purposes.

In [None]:
# Create vectorized environment
train_env = DummyVecEnv([lambda: BtcUsdTradingEnv(
    df=train_df,
    window_size=24,
    initial_cash=10000.0,
    transaction_cost=0.001,
    slippage=0.0005,
    max_steps=200
)])

print("✅ Training environment ready")

In [None]:
# Initialize PPO agent
model = PPO(
    policy='MlpPolicy',
    env=train_env,
    learning_rate=0.0003,
    n_steps=512,
    batch_size=64,
    gamma=0.99,
    verbose=1,
    seed=42
)

print("\n✅ PPO agent initialized")
print("\n⚠️  Note: This is a VERY quick demo training.")
print("For real results, use: python train.py (200k timesteps)")

In [None]:
# Train (very short for demo)
print("\n🏋️ Training for 2000 timesteps (demo)...\n")

model.learn(total_timesteps=2000, progress_bar=True)

print("\n✅ Training complete!")

---
## 📈 Part 4: Evaluation & Results

Let's evaluate our trained agent on test data.

In [None]:
# Create test environment
test_env = BtcUsdTradingEnv(
    df=test_df,
    window_size=24,
    initial_cash=10000.0,
    transaction_cost=0.001,
    slippage=0.0005,
    max_steps=None  # Use all test data
)

print("✅ Test environment created")

In [None]:
# Run evaluation
obs = test_env.reset()
done = False

portfolio_values = []
actions = []
prices = []

print("Running trained agent on test data...\n")

while not done:
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = test_env.step(action)
    
    portfolio_values.append(info['portfolio_value'])
    actions.append(int(action))
    prices.append(info['current_price'])

print(f"✅ Evaluation complete: {len(portfolio_values)} steps")

# Calculate metrics
final_value = portfolio_values[-1]
total_return = (final_value / 10000 - 1) * 100

print(f"\n📊 RL Agent Performance:")
print(f"  Initial value: $10,000.00")
print(f"  Final value: ${final_value:.2f}")
print(f"  Total return: {total_return:.2f}%")
print(f"  Total trades: {sum(1 for a in actions if a != 0)}")

### Visualize Agent Performance

In [None]:
# Plot results
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Portfolio value
ax1.plot(portfolio_values, linewidth=2, color='#2E86AB', label='RL Agent')
ax1.axhline(y=10000, color='r', linestyle='--', alpha=0.5, label='Initial Value')
ax1.set_ylabel('Portfolio Value ($)', fontsize=12)
ax1.set_title('RL Agent Portfolio Performance', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Price with actions
ax2.plot(prices, linewidth=1.5, color='black', alpha=0.7, label='BTC Price')

# Mark buy/sell points
buy_points = [i for i, a in enumerate(actions) if a == 1]
sell_points = [i for i, a in enumerate(actions) if a == 2]

if buy_points:
    ax2.scatter(buy_points, [prices[i] for i in buy_points], 
               color='green', marker='^', s=100, label='Buy', zorder=5)
if sell_points:
    ax2.scatter(sell_points, [prices[i] for i in sell_points], 
               color='red', marker='v', s=100, label='Sell', zorder=5)

ax2.set_xlabel('Time Step', fontsize=12)
ax2.set_ylabel('Price (USD)', fontsize=12)
ax2.set_title('Trading Actions', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ Visualization complete!")

### Compare with Buy & Hold

In [None]:
# Calculate Buy & Hold performance
first_price = prices[0]
btc_amount = (10000 * 0.999) / first_price  # After transaction cost
bh_values = [btc_amount * p for p in prices]
bh_final = bh_values[-1] * 0.999  # After selling cost
bh_return = (bh_final / 10000 - 1) * 100

print("📊 Comparison:")
print(f"\n  RL Agent:")
print(f"    Return: {total_return:.2f}%")
print(f"    Final: ${final_value:.2f}")
print(f"\n  Buy & Hold:")
print(f"    Return: {bh_return:.2f}%")
print(f"    Final: ${bh_final:.2f}")
print(f"\n  Difference: {total_return - bh_return:.2f}%")

if total_return > bh_return:
    print("\n✅ RL Agent BEATS Buy & Hold! 🎉")
else:
    print("\n⚠️  RL Agent underperforms Buy & Hold")
    print("   (This is expected with only 2000 training timesteps)")
    print("   Try full training: python train.py")

### Plot Comparison

In [None]:
# Plot comparison
plt.figure(figsize=(14, 6))

rl_returns = [(v / 10000 - 1) * 100 for v in portfolio_values]
bh_returns = [(v / 10000 - 1) * 100 for v in bh_values]

plt.plot(rl_returns, linewidth=2, label='RL Agent', color='#2E86AB')
plt.plot(bh_returns, linewidth=2, label='Buy & Hold', color='#F18F01')
plt.axhline(y=0, color='black', linestyle='--', alpha=0.3)

plt.xlabel('Time Step', fontsize=12)
plt.ylabel('Return (%)', fontsize=12)
plt.title('RL Agent vs Buy & Hold Strategy', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Comparison visualization complete!")

---
## 🎓 Summary & Next Steps

### What You Learned:

1. ✅ **Data Pipeline** - Loading, cleaning, and adding technical indicators
2. ✅ **Trading Environment** - Custom Gymnasium environment with realistic costs
3. ✅ **RL Training** - PPO algorithm for trading strategy learning
4. ✅ **Evaluation** - Comparing RL agent with baseline strategies

### ⚠️ Important Notes:

- This notebook uses **minimal training** (2000 timesteps) for demonstration
- Real training requires **200,000+ timesteps** (2-4 hours)
- Results shown here are **NOT production quality**

### 🚀 Next Steps:

1. **Full Training:**
   ```bash
   python train.py
   ```

2. **Comprehensive Evaluation:**
   ```bash
   python evaluate.py --model ckpts/best_model/best_model.zip
   ```

3. **Experiment with Configs:**
   - Modify `configs/training.yaml` for different hyperparameters
   - Try `configs/env.yaml` for different trading costs
   - Adjust `configs/features.yaml` for different indicators

4. **Real Data:**
   ```bash
   python fetch_data.py --days 365
   ```

5. **Quick Start:**
   ```bash
   python quickstart.py --mode full
   ```

---

### 📚 Additional Resources:

- **README.md** - Complete documentation
- **DATA_FETCHING.md** - Binance data fetching guide
- **TRAINING_GUIDE.md** - Detailed training instructions

---

**Happy Trading! 📈**