# Active Inference Portfolio Trading Agent

This notebook implements an **Active Inference** agent for portfolio management over Bitcoin, Solana, Ethereum, SPY, and QQQ, using the same data pipeline as our DQN agent. It incorporates a **hyperbolic discounting** scheme (inspired by Paul Glimcher) in its expected free energy calculation. This is designed to showcase a principled neuro-inspired decision-making framework in a financial context.

## Table of Contents

1. Setup & Data Loading  
2. Trading Environment  
3. Active Inference Agent Architecture  
4. Hyperbolic Discounting Function  
5. Belief Updating & Policy Selection  
6. Simulation & Performance Evaluation  
7. Visualization of Beliefs & Allocations  
8. Conclusions & Next Steps  

In [8]:
# 1. Setup & Data Loading
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import torch.nn.functional as F
import seaborn as sns
sns.set_style('darkgrid')

# Reuse technical indicators function from rl_trading_agent
def calculate_technical_indicators(prices, volumes):
    indicators = pd.DataFrame(index=prices.index)
    windows = [7, 30, 90, 200, 365]
    for t in prices.columns:
        p = prices[t]; v = volumes[t]; r = p.pct_change().dropna()
        for w in windows:
            if len(p) >= w:
                ma = p.rolling(w).mean()
                vol_ma = v.rolling(w).mean()
                vol_rolling = r.rolling(w).std() * np.sqrt(252)
                indicators[f'{t}_price_ma_{w}'] = p / ma
                indicators[f'{t}_sharpe_{w}'] = (r.rolling(w).mean() * 252) / (r.rolling(w).std() * np.sqrt(252))
        delta = p.diff()
        gain = delta.clip(lower=0).rolling(14).mean()
        loss = (-delta).clip(lower=0).rolling(14).mean()
        rs = gain / loss
        indicators[f'{t}_rsi'] = 100 - (100 / (1 + rs))
    return indicators.fillna(0)

# Download price & volume data
# tickers = ['BTC-USD','SOL-USD','ETH-USD','SPY','QQQ']
tickers = ['QQQ', 'SPY', 'IWM', 'DIA', 'GLD', 'TLT']
# start, end = '2017-01-01', datetime.today().strftime('%Y-%m-%d')
start, end = '2002-01-01', datetime.today().strftime('%Y-%m-%d')

data = yf.download(tickers, start=start, end=end, progress=False)
price_data = data['Close'].ffill().dropna()
volume_data = data['Volume'].ffill().dropna()

# Compute indicators
tech_indicators = calculate_technical_indicators(price_data, volume_data)
print(f"Data from {price_data.index[0]} to {price_data.index[-1]}")
print(f"Features: {tech_indicators.shape[1]} technical indicators")

  data = yf.download(tickers, start=start, end=end, progress=False)


Data from 2004-11-18 00:00:00 to 2025-07-08 00:00:00
Features: 66 technical indicators


In [9]:
# 2. Trading Environment
class TradingEnvironment:
    def __init__(self, price_data, tech_ind, annual_investment=10000, tc=0.001):
        self.p = price_data
        self.tech = tech_ind
        self.daily_cash = annual_investment / 365
        self.tc = tc
        self.tickers = price_data.columns.tolist()
        self.n = len(self.tickers)
        self.reset()

    def reset(self):
        self.step_idx = 0
        self.cash = self.daily_cash
        self.hold = np.zeros(self.n)
        self.pv = 0
        self.history = []
        return self._state()

    def _state(self):
        if self.step_idx >= len(self.p):
            return np.zeros(self._get_state_size())
        
        # Current prices (normalized)
        prices = self.p.iloc[self.step_idx].values
        norm = prices / prices.max()
        
        # Technical indicators
        tech = self.tech.iloc[self.step_idx].values
        
        # Portfolio weights
        total = self.cash + (self.hold * prices).sum()
        weights = (self.hold * prices) / total if total > 0 else np.zeros(self.n)
        cash_ratio = self.cash / total if total > 0 else 1.0
        
        return np.concatenate([norm, tech, weights, [cash_ratio]]).astype(np.float32)
    
    def _get_state_size(self):
        return self.n + len(self.tech.columns) + self.n + 1
    
    def get_portfolio_composition(self):
        if self.step_idx >= len(self.p):
            return {'cash': {'amount': self.cash, 'percentage': 100}}
        
        prices = self.p.iloc[self.step_idx].values
        total = self.cash + (self.hold * prices).sum()
        
        comp = {'cash': {'amount': self.cash, 'percentage': (self.cash/total*100) if total>0 else 0}}
        
        for i, ticker in enumerate(self.tickers):
            if self.hold[i] > 0:
                value = self.hold[i] * prices[i]
                comp[ticker] = {
                    'shares': self.hold[i],
                    'value': value,
                    'percentage': (value/total*100) if total>0 else 0
                }
        return comp

    def step(self, acts):
        if self.step_idx >= len(self.p) - 1:
            return self._state(), 0, True, {}
        
        # Add daily investment
        self.cash += self.daily_cash
        
        # Current prices
        prices = self.p.iloc[self.step_idx].values
        
        # Execute actions (acts is a list of actions for each asset)
        for i, act in enumerate(acts):
            if act == 1:  # Buy
                max_shares = self.cash / (prices[i] * (1 + self.tc))
                shares_to_buy = min(max_shares, self.cash * 0.2 / prices[i])  # Max 20% per asset
                cost = shares_to_buy * prices[i] * (1 + self.tc)
                if cost <= self.cash and shares_to_buy > 0:
                    self.hold[i] += shares_to_buy
                    self.cash -= cost
            elif act == 2:  # Sell
                if self.hold[i] > 0:
                    shares_to_sell = self.hold[i] * 0.2  # Sell 20%
                    proceeds = shares_to_sell * prices[i] * (1 - self.tc)
                    self.hold[i] -= shares_to_sell
                    self.cash += proceeds
        
        # Calculate portfolio value
        new_pv = self.cash + (self.hold * prices).sum()
        reward = (new_pv - self.pv) / self.pv if self.pv > 0 else 0
        self.pv = new_pv
        self.history.append(new_pv)
        
        self.step_idx += 1
        done = self.step_idx >= len(self.p) - 1
        
        return self._state(), reward, done, {}

env = TradingEnvironment(price_data, tech_indicators)
s0 = env.reset()
print(f"State dim: {s0.shape}, Assets: {env.tickers}")
print(f"Sample state shape: {len(s0)}")
print(f"Portfolio composition: {env.get_portfolio_composition()}")

State dim: (79,), Assets: ['DIA', 'GLD', 'IWM', 'QQQ', 'SPY', 'TLT']
Sample state shape: 79
Portfolio composition: {'cash': {'amount': 27.397260273972602, 'percentage': np.float64(100.0)}}


## 3. Active Inference Agent Architecture

We implement a simple **generative model** (p(o|s), p(s'|s,a)) and an **approximate posterior** (q(s|o)) using small neural networks. Decision making minimizes the **Expected Free Energy** under a **hyperbolic discount**.

In [10]:
# 4. Hyperbolic Discounting
def hyperbolic_discount(t, k=0.01):
    return 1.0/(1.0 + k*t)

print([hyperbolic_discount(t,0.05) for t in [1,5,10,30]])

[0.9523809523809523, 0.8, 0.6666666666666666, 0.4]


In [11]:
# 5. ActiveInferenceAgent
class ActiveInferenceAgent:
    def __init__(self, state_dim, n_assets, hidden=128, lr=1e-3, risk_lambda=0.0):
        self.n_assets = n_assets
        self.n_actions = 3  # hold, buy, sell
        self.risk_lambda = risk_lambda
        
        # Inference network q(s|o)
        self.inf_net = nn.Sequential(
            nn.Linear(state_dim, hidden),
            nn.ReLU(),
            nn.Linear(hidden, hidden),
            nn.ReLU(),
            nn.Linear(hidden, hidden//2)
        )
        
        # Observation model p(o|s)
        self.obs_net = nn.Sequential(
            nn.Linear(hidden//2, hidden),
            nn.ReLU(),
            nn.Linear(hidden, state_dim),
            nn.Sigmoid()
        )
        
        # Transition model p(s'|s,a)
        self.trans_net = nn.Sequential(
            nn.Linear(hidden//2 + self.n_actions, hidden),
            nn.ReLU(),
            nn.Linear(hidden, hidden//2)
        )
        
        # Prior beliefs (learnable)
        self.prior_mean = nn.Parameter(torch.zeros(hidden//2))
        self.prior_logvar = nn.Parameter(torch.zeros(hidden//2))
        
        self.optimizer = optim.Adam(
            list(self.inf_net.parameters()) + 
            list(self.obs_net.parameters()) + 
            list(self.trans_net.parameters()) +
            [self.prior_mean, self.prior_logvar], lr=lr
        )

    def infer(self, state):
        """Infer latent state q(s|o)"""
        return self.inf_net(torch.FloatTensor(state))

    def predict_obs(self, latents):
        """Predict observations p(o|s)"""
        return self.obs_net(latents)

    def transition(self, latents, action_onehot):
        """Predict next latent state p(s'|s,a)"""
        inp = torch.cat([latents, action_onehot], dim=-1)
        return self.trans_net(inp)

    def expected_free_energy(self, latents, candidate_action, t_step):
        """Calculate expected free energy for a candidate action"""
        # One-hot encode action
        a_onehot = torch.zeros(self.n_actions)
        a_onehot[candidate_action] = 1.0
        
        # Predict next latent state
        s_next = self.transition(latents, a_onehot)
        
        # Predict observation from next state
        o_pred = self.predict_obs(s_next)
        
        # Reconstruction error (extrinsic value) - compare predicted obs with current obs
        current_obs = self.predict_obs(latents)
        recon_error = torch.mean((o_pred - current_obs.detach())**2)
        
        # Epistemic value (information gain)
        epistemic = -torch.mean(s_next * torch.log(torch.abs(s_next) + 1e-8))
        
        # Risk penalty (variance of next state)
        risk_penalty = self.risk_lambda * torch.var(s_next)
        
        # Apply hyperbolic discounting
        discount = hyperbolic_discount(t_step)
        
        # Expected free energy
        efe = discount * (recon_error + epistemic + risk_penalty)
        
        return efe

    def act(self, state, t_step):
        """Choose action that minimizes expected free energy"""
        latents = self.infer(state)
        
        # Calculate expected free energy for each action
        free_energies = []
        for action in range(self.n_actions):
            fe = self.expected_free_energy(latents, action, t_step)
            free_energies.append(fe)
        
        # Choose action with minimum expected free energy
        best_action = torch.argmin(torch.stack(free_energies)).item()
        
        # Return action for each asset (could be sophisticated allocation)
        # For simplicity, apply same action to all assets with some randomness
        actions = []
        for i in range(self.n_assets):
            if np.random.random() < 0.7:  # 70% chance to use best action
                actions.append(best_action)
            else:
                actions.append(np.random.randint(0, self.n_actions))  # Exploration
        
        return actions
    
    def update_beliefs(self, state, action, next_state, reward):
        """Update beliefs using variational inference"""
        latents = self.infer(state)
        next_latents = self.infer(next_state)
        
        # Reconstruction loss
        obs_pred = self.predict_obs(latents)
        recon_loss = F.mse_loss(obs_pred, torch.FloatTensor(state))
        
        # Transition loss
        action_onehot = torch.zeros(self.n_actions)
        if len(action) > 0:
            action_onehot[action[0]] = 1.0  # Use first asset's action
        
        trans_pred = self.transition(latents, action_onehot)
        trans_loss = F.mse_loss(trans_pred, next_latents.detach())
        
        # KL divergence with prior
        kl_loss = torch.mean(
            0.5 * (latents**2 - 2*torch.log(torch.abs(latents)+1e-8) - 1)
        )
        
        # Total loss
        total_loss = recon_loss + trans_loss + 0.1 * kl_loss
        
        # Update
        self.optimizer.zero_grad()
        total_loss.backward()
        self.optimizer.step()
        
        return total_loss.item()

# Create agents with different risk sensitivities
agent_neutral = ActiveInferenceAgent(state_dim=len(s0), n_assets=len(tickers), risk_lambda=0.0)
agent_averse = ActiveInferenceAgent(state_dim=len(s0), n_assets=len(tickers), risk_lambda=1.0)

print("Active Inference Agents initialized.")
print(f"State dim: {len(s0)}, Assets: {len(tickers)}")
print("Neural network architecture:")
print("- Inference net: state -> latents")
print("- Observation model: latents -> observations") 
print("- Transition model: (latents, action) -> next_latents")

Active Inference Agents initialized.
State dim: 79, Assets: 6
Neural network architecture:
- Inference net: state -> latents
- Observation model: latents -> observations
- Transition model: (latents, action) -> next_latents


## 7. Visualization of Beliefs & Allocations

- **Latent Beliefs** over time  
- **Portfolio Value** & **Holdings** evolution  

In [None]:
# Comprehensive Strategy Comparison: Equal Allocation vs 3 Risk Profiles
print("="*80)
print("COMPREHENSIVE PORTFOLIO STRATEGY COMPARISON")
print("="*80)
print(f"Data Period: {price_data.index[0].strftime('%Y-%m-%d')} to {price_data.index[-1].strftime('%Y-%m-%d')}")
print(f"Assets: {', '.join(tickers)}")
print(f"Annual Investment: $10,000 (${10000/365:.2f} per day)")
print(f"Total Steps: 10,000 (with updates every 1,000 steps)")
print("="*80)

# Create agents with different risk sensitivities
agent_neutral = ActiveInferenceAgent(state_dim=len(s0), n_assets=len(tickers), risk_lambda=0.0)
agent_averse = ActiveInferenceAgent(state_dim=len(s0), n_assets=len(tickers), risk_lambda=1.0)
agent_seeking = ActiveInferenceAgent(state_dim=len(s0), n_assets=len(tickers), risk_lambda=-0.5)

def simulate_equal_allocation_strategy(env, max_steps=10000, update_interval=1000):
    """Equal allocation baseline: split daily investment equally across all assets"""
    state = env.reset()
    pv_history = []
    daily_investment = env.daily_cash
    n_assets = len(env.tickers)
    total_invested = 0
    
    print(f"\n=== EQUAL ALLOCATION STRATEGY ===")
    print(f"Daily allocation per asset: ${daily_investment/n_assets:.2f}")
    
    for t in range(min(len(price_data) - 1, max_steps)):
        # Add daily investment
        env.cash += daily_investment
        total_invested += daily_investment
        
        # Current prices
        if env.step_idx < len(env.p):
            prices = env.p.iloc[env.step_idx].values
            
            # Split available cash equally across all assets
            cash_per_asset = env.cash / n_assets
            
            for i in range(n_assets):
                if prices[i] > 0:
                    shares_to_buy = cash_per_asset / (prices[i] * (1 + env.tc))
                    cost = shares_to_buy * prices[i] * (1 + env.tc)
                    if cost <= env.cash:
                        env.hold[i] += shares_to_buy
                        env.cash -= cost
        
        # Calculate portfolio value
        if env.step_idx < len(env.p):
            prices = env.p.iloc[env.step_idx].values
            env.pv = env.cash + (env.hold * prices).sum()
        
        pv_history.append(env.pv)
        
        # Status update
        if t % update_interval == 0:
            comp = env.get_portfolio_composition()
            print(f"  Step {t}: PV=${env.pv:.2f}, Total Invested=${total_invested:.2f}")
            # Show top 3 allocations
            top_assets = sorted(comp.items(), key=lambda x: x[1]['percentage'], reverse=True)[:3]
            for asset, info in top_assets:
                if asset != 'cash' or info['percentage'] > 5:
                    print(f"    {asset}: {info['percentage']:.1f}%")
        
        env.step_idx += 1
        if env.step_idx >= len(env.p) - 1:
            break
    
    return {
        'portfolio_values': pv_history,
        'final_pv': env.pv,
        'final_composition': env.get_portfolio_composition(),
        'total_invested': total_invested
    }

def simulate_active_inference_agent(agent, env, name, max_steps=10000, update_interval=1000):
    """Simulate Active Inference agent with specified risk profile"""
    state = env.reset()
    pv_history = []
    fe_history = []
    loss_history = []
    
    print(f"\n=== {name.upper()} AGENT ===")
    print(f"Risk Lambda: {agent.risk_lambda}")
    
    for t in range(min(len(price_data) - 1, max_steps)):
        # Get agent action
        actions = agent.act(state, t)
        
        # Calculate free energy for monitoring
        latents = agent.infer(state)
        fe = agent.expected_free_energy(latents, actions[0], t).item()
        
        # Step environment
        next_state, reward, done, _ = env.step(actions)
        
        # Update agent beliefs
        loss = agent.update_beliefs(state, actions, next_state, reward)
        
        # Record metrics
        pv_history.append(env.pv)
        fe_history.append(fe)
        loss_history.append(loss)
        
        # Status update
        if t % update_interval == 0:
            comp = env.get_portfolio_composition()
            print(f"  Step {t}: PV=${env.pv:.2f}, FE={fe:.4f}, Loss={loss:.4f}")
            # Show top 3 allocations
            top_assets = sorted(comp.items(), key=lambda x: x[1]['percentage'], reverse=True)[:3]
            for asset, info in top_assets:
                if asset != 'cash' or info['percentage'] > 5:
                    print(f"    {asset}: {info['percentage']:.1f}%")
        
        state = next_state
        if done:
            break
    
    return {
        'portfolio_values': pv_history,
        'free_energies': fe_history,
        'losses': loss_history,
        'final_pv': env.pv,
        'final_composition': env.get_portfolio_composition()
    }

# Run all simulations
print("\n" + "="*80)
print("RUNNING SIMULATIONS...")
print("="*80)

# Equal allocation baseline
env.reset()
results_equal = simulate_equal_allocation_strategy(env, max_steps=10000, update_interval=1000)

# Risk-averse agent
env.reset()
results_averse = simulate_active_inference_agent(agent_averse, env, "Risk-Averse", max_steps=10000, update_interval=1000)

# Risk-neutral agent
env.reset()
results_neutral = simulate_active_inference_agent(agent_neutral, env, "Risk-Neutral", max_steps=10000, update_interval=1000)

# Risk-seeking agent
env.reset()
results_seeking = simulate_active_inference_agent(agent_seeking, env, "Risk-Seeking", max_steps=10000, update_interval=1000)

# Final Results Summary
print("\n" + "="*80)
print("FINAL RESULTS SUMMARY")
print("="*80)
print(f"Data Period: {price_data.index[0].strftime('%Y-%m-%d')} to {price_data.index[-1].strftime('%Y-%m-%d')}")
print(f"Assets: {', '.join(tickers)}")
print(f"Simulation Steps: 10,000")
print("-" * 80)

strategies = [
    ("Equal Allocation", results_equal),
    ("Risk-Averse", results_averse),
    ("Risk-Neutral", results_neutral),
    ("Risk-Seeking", results_seeking)
]

# Print performance summary
print(f"{'Strategy':<15} {'Final PV':<12} {'Return %':<10} {'Total Invested':<15}")
print("-" * 60)

for strategy_name, results in strategies:
    final_pv = results['final_pv']
    
    if 'total_invested' in results:
        total_invested = results['total_invested']
        return_pct = (final_pv - total_invested) / total_invested * 100
        print(f"{strategy_name:<15} ${final_pv:<11.2f} {return_pct:<9.1f} ${total_invested:<14.2f}")
    else:
        # For AI agents, estimate based on daily investment
        estimated_invested = 10000 / 365 * len(results['portfolio_values'])
        return_pct = (final_pv - estimated_invested) / estimated_invested * 100
        print(f"{strategy_name:<15} ${final_pv:<11.2f} {return_pct:<9.1f} ${estimated_invested:<14.2f}")

print("\n" + "="*80)
print("DETAILED PORTFOLIO COMPOSITIONS")
print("="*80)

for strategy_name, results in strategies:
    print(f"\n{strategy_name.upper()} STRATEGY:")
    print(f"Final Portfolio Value: ${results['final_pv']:.2f}")
    
    comp = results['final_composition']
    print("Asset Allocation:")
    
    # Sort by percentage
    sorted_assets = sorted(comp.items(), key=lambda x: x[1]['percentage'], reverse=True)
    
    for asset, info in sorted_assets:
        if info['percentage'] > 0.1:  # Only show assets with >0.1% allocation
            if asset == 'cash':
                print(f"  {asset.upper()}: ${info['amount']:.2f} ({info['percentage']:.1f}%)")
            else:
                print(f"  {asset}: {info['shares']:.6f} shares, ${info['value']:.2f} ({info['percentage']:.1f}%)")

print("\n" + "="*80)
print("SIMULATION COMPLETED")
print("="*80)

COMPREHENSIVE PORTFOLIO STRATEGY COMPARISON
Data Period: 2004-11-18 to 2025-07-08
Assets: QQQ, SPY, IWM, DIA, GLD, TLT
Annual Investment: $10,000 ($27.40 per day)
Total Steps: 10,000 (with updates every 1,000 steps)

RUNNING SIMULATIONS...

=== EQUAL ALLOCATION STRATEGY ===
Daily allocation per asset: $4.57
  Step 0: PV=$54.74, Total Invested=$27.40
    DIA: 16.7%
    GLD: 16.7%
    IWM: 16.7%
  Step 1000: PV=$24477.77, Total Invested=$27424.66
    GLD: 23.4%
    TLT: 17.6%
    DIA: 15.8%
  Step 2000: PV=$81571.79, Total Invested=$54821.92
    GLD: 24.0%
    QQQ: 17.6%
    DIA: 15.2%
  Step 3000: PV=$144698.81, Total Invested=$82219.18
    QQQ: 23.1%
    SPY: 17.5%
    IWM: 16.9%
  Step 4000: PV=$293894.38, Total Invested=$109616.44
    QQQ: 31.3%
    SPY: 17.2%
    DIA: 16.3%
  Step 5000: PV=$480659.26, Total Invested=$137013.70
    QQQ: 34.5%
    SPY: 19.7%
    DIA: 17.0%

=== RISK-AVERSE AGENT ===
Risk Lambda: 1.0
  Step 0: PV=$54.75, FE=-0.0370, Loss=0.5286
    cash: 26.3%
    DIA:

## 8. Conclusions & Next Steps

### Key Achievements

This notebook demonstrates a comprehensive **Active Inference** framework for portfolio management, incorporating several cutting-edge concepts:

1. **Hyperbolic Discounting**: Following Paul Glimcher's work on temporal decision-making, we implement hyperbolic discounting in the expected free energy calculation, creating more realistic time preferences.

2. **Risk Sensitivity**: The framework includes a risk penalty term (λ * variance) in the expected free energy, allowing for different risk profiles between agents.

3. **Belief Updating**: The agent continuously updates its beliefs about market dynamics through variational inference, learning both observation and transition models.

4. **Expected Free Energy Minimization**: Actions are chosen to minimize expected free energy, balancing:
   - **Extrinsic value** (reconstruction accuracy)
   - **Epistemic value** (information gain)
   - **Risk penalty** (uncertainty aversion)

### Theoretical Foundations

This implementation bridges **neuroscience** (Active Inference), **behavioral economics** (hyperbolic discounting), and **finance** (portfolio optimization), creating a biologically-plausible decision-making framework that Karl Friston would appreciate.

The agent operates under the **Free Energy Principle**, where actions are selected to minimize surprise and maintain homeostasis - a principle that extends from cellular biology to financial decision-making.

### Future Enhancements

1. **Hierarchical Active Inference**: Implement multi-scale temporal dynamics
2. **Precision-Weighted Beliefs**: Add attention mechanisms for feature weighting  
3. **Social Active Inference**: Include market sentiment and herding behaviors
4. **Continuous Learning**: Online adaptation to regime changes
5. **Multi-Objective Optimization**: Balance multiple financial objectives simultaneously

### Applications

This framework could be extended to:
- **Algorithmic Trading**: Real-time market making and execution
- **Risk Management**: Dynamic hedging and portfolio insurance
- **Behavioral Finance**: Understanding investor psychology and biases
- **Central Banking**: Monetary policy under uncertainty

The Active Inference approach provides a principled, neurobiologically-inspired alternative to traditional RL methods in finance, offering better interpretability and theoretical grounding.

## 7. Visualization of Beliefs & Allocations

- **Latent Beliefs** over time  
- **Portfolio Value** & **Holdings** evolution  