# HIMARI OPUS 2 - RL Position Sizing Training

Train PPO agent to optimize position sizing using live market data.

**Features:**
- Live price feed from Binance API
- Paper trading environment
- PPO reinforcement learning
- Model checkpointing
- Performance visualization

**Runtime:** Google Colab (CPU or GPU)

## 1. Setup Environment

In [None]:
# Install dependencies
!pip install torch numpy requests matplotlib tqdm

print("‚úÖ Dependencies installed")

In [None]:
# Clone HIMARI Layer 3 from GitHub
!git clone https://github.com/nimallansa937/HIMARI-LAYER-3-POSITIONING-.git
%cd HIMARI-LAYER-3-POSITIONING-

print("‚úÖ HIMARI Layer 3 cloned from GitHub")
print("üìÅ Repository: https://github.com/nimallansa937/HIMARI-LAYER-3-POSITIONING-")

In [None]:
# Import modules
import sys
sys.path.insert(0, 'src')

import torch
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

from rl.trainer import RLTrainer, TrainingConfig
from rl.trading_env import EnvConfig
from rl.ppo_agent import PPOConfig

print(f"‚úÖ Imports successful")
print(f"üîß PyTorch version: {torch.__version__}")
print(f"üîß Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

## 2. Configure Training

In [None]:
# Training configuration
training_config = TrainingConfig(
    num_episodes=1000,           # Number of training episodes
    max_steps_per_episode=500,   # Max steps per episode
    batch_size=64,               # Batch size for PPO updates
    ppo_epochs=10,               # PPO optimization epochs
    save_interval=50,            # Save checkpoint every N episodes
    log_interval=10,             # Log progress every N episodes
    checkpoint_dir='checkpoints', # Checkpoint directory
    use_live_prices=True         # Use live Binance prices
)

# Environment configuration
env_config = EnvConfig(
    initial_capital=100000.0,    # Starting capital
    max_position_pct=0.5,        # Max 50% per position
    commission_rate=0.001,       # 0.1% commission
    slippage_bps=5,              # 0.05% slippage
    reward_window=10,            # Sharpe calculation window
    max_steps=500,               # Max steps per episode
    symbol='BTC-USD'             # Trading symbol
)

# PPO agent configuration
ppo_config = PPOConfig(
    state_dim=16,                # State dimension
    action_dim=1,                # Action dimension (position multiplier)
    hidden_dim=128,              # Hidden layer size
    learning_rate=3e-4,          # Learning rate
    gamma=0.99,                  # Discount factor
    lambda_gae=0.95,             # GAE parameter
    clip_epsilon=0.2,            # PPO clip parameter
)

# Device
device = 'cuda' if torch.cuda.is_available() else 'cpu'

print("‚öôÔ∏è Configuration:")
print(f"  Episodes: {training_config.num_episodes}")
print(f"  Initial Capital: ${env_config.initial_capital:,.0f}")
print(f"  Symbol: {env_config.symbol}")
print(f"  Live Prices: {training_config.use_live_prices}")
print(f"  Device: {device}")

## 3. Initialize Trainer

In [None]:
# Create trainer
trainer = RLTrainer(
    training_config=training_config,
    env_config=env_config,
    ppo_config=ppo_config,
    device=device
)

print("‚úÖ Trainer initialized")
print(f"  Environment: {trainer.env.__class__.__name__}")
print(f"  Agent: {trainer.agent.__class__.__name__}")

## 4. Train Agent

In [None]:
# Run training
print("üöÄ Starting training...")
print("=" * 80)

training_stats = trainer.train()

print("=" * 80)
print("‚úÖ Training complete!")

## 5. Visualize Results

In [None]:
# Plot training curves
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Episode rewards
axes[0, 0].plot(training_stats['episode_rewards'])
axes[0, 0].set_title('Episode Rewards')
axes[0, 0].set_xlabel('Episode')
axes[0, 0].set_ylabel('Reward')
axes[0, 0].grid(True)

# Sharpe ratio
axes[0, 1].plot(training_stats['episode_sharpes'])
axes[0, 1].set_title('Episode Sharpe Ratio')
axes[0, 1].set_xlabel('Episode')
axes[0, 1].set_ylabel('Sharpe Ratio')
axes[0, 1].grid(True)

# P&L percentage
axes[1, 0].plot(np.array(training_stats['episode_pnls']) * 100)
axes[1, 0].set_title('Episode P&L %')
axes[1, 0].set_xlabel('Episode')
axes[1, 0].set_ylabel('P&L %')
axes[1, 0].grid(True)

# Episode lengths
axes[1, 1].plot(training_stats['episode_lengths'])
axes[1, 1].set_title('Episode Lengths')
axes[1, 1].set_xlabel('Episode')
axes[1, 1].set_ylabel('Steps')
axes[1, 1].grid(True)

plt.tight_layout()
plt.savefig('training_curves.png', dpi=150, bbox_inches='tight')
plt.show()

print("üìä Training curves saved to training_curves.png")

## 6. Evaluate Trained Agent

In [None]:
# Evaluate agent
print("üß™ Evaluating trained agent...")
eval_results = trainer.evaluate(num_episodes=20)

print("\nüìà Evaluation Results:")
print(f"  Average Reward:    {eval_results['avg_reward']:.3f} ¬± {eval_results['std_reward']:.3f}")
print(f"  Average Sharpe:    {eval_results['avg_sharpe']:.3f} ¬± {eval_results['std_sharpe']:.3f}")
print(f"  Average P&L:       {eval_results['avg_pnl']:.2%}")
print(f"  Average Win Rate:  {eval_results['avg_win_rate']:.1%}")

## 7. Save Final Model

In [None]:
# Save final model
final_model_path = 'checkpoints/ppo_final.pt'
trainer.agent.save(final_model_path)

print(f"üíæ Final model saved to: {final_model_path}")
print("\nüì¶ To download:")
print("  1. Right-click on file in Colab file browser")
print("  2. Select 'Download'")
print("  3. Place in HIMARI/LAYER 3 POSITIONING LAYER/models/")

## 8. Test Inference

In [None]:
# Test inference with random state
print("üî¨ Testing inference...")

test_state = np.random.randn(16).astype(np.float32)
action, log_prob = trainer.agent.get_action(test_state, deterministic=True)

print(f"  Test state shape: {test_state.shape}")
print(f"  Output action:    {action:.3f} (position multiplier)")
print(f"  Valid range:      [0.0, 2.0]")
print(f"  Status:           {'‚úÖ PASS' if 0 <= action <= 2 else '‚ùå FAIL'}")

## Summary

**Training Complete! üéâ**

Next steps:
1. Download the trained model (`ppo_final.pt`)
2. Place it in your local HIMARI directory
3. Use `Layer3Phase1RL` with `rl_model_path='models/ppo_final.pt'`
4. Deploy to paper trading

**Key Files:**
- `checkpoints/ppo_final.pt` - Trained model weights
- `checkpoints/stats_episode_*.json` - Training statistics
- `training_curves.png` - Performance visualization

In [None]:
# Auto-save important files to Google Drive
print("üíæ Saving files to Google Drive...")

from google.colab import drive
import shutil
import os

# Mount Google Drive
drive.mount('/content/drive', force_remount=True)

# Create save directory
save_dir = '/content/drive/MyDrive/HIMARI_RL_Models/'
os.makedirs(save_dir, exist_ok=True)

# Save final model
shutil.copy('checkpoints/ppo_final.pt', f'{save_dir}ppo_final.pt')
print(f"‚úÖ Model saved: {save_dir}ppo_final.pt")

# Save training curves
shutil.copy('training_curves.png', f'{save_dir}training_curves.png')
print(f"‚úÖ Curves saved: {save_dir}training_curves.png")

# Save all checkpoints
checkpoint_save = f'{save_dir}checkpoints/'
os.makedirs(checkpoint_save, exist_ok=True)
for file in os.listdir('checkpoints/'):
    if file.endswith('.pt') or file.endswith('.json'):
        shutil.copy(f'checkpoints/{file}', f'{checkpoint_save}{file}')

print(f"‚úÖ Checkpoints saved: {checkpoint_save}")
print("\nüéâ All files backed up to Google Drive!")
print("‚úÖ Safe to close Colab now - your work is saved!")

## 9. Auto-Save to Google Drive (Recommended)