# Portfolio Optimization Tutorial
## Mixed-Integer Optimization with ML-Driven Heuristics

**Author:** Mohin Hasin  
**Email:** mohinhasin999@gmail.com  
**GitHub:** [@mohin-io](https://github.com/mohin-io)

---

This notebook demonstrates the complete portfolio optimization workflow:
1. Data generation/loading
2. Statistical analysis
3. Forecasting
4. Optimization (multiple strategies)
5. Performance comparison
6. Visualization

## Setup

Import required libraries and configure visualization settings.

In [None]:
import sys
sys.path.append('..')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('Set2')
%matplotlib inline

print("✓ Setup complete!")

## 1. Generate Synthetic Market Data

Create synthetic asset returns with realistic properties:
- Positive expected returns
- Correlated returns (factor structure)
- Time-varying volatility

In [None]:
# Parameters
n_assets = 10
n_days = 1000
tickers = [f'ASSET_{i+1}' for i in range(n_assets)]

# Generate factor-based returns
n_factors = 3
factor_loadings = np.random.randn(n_assets, n_factors) * 0.3
factor_returns = np.random.randn(n_days, n_factors) * 0.01
idiosyncratic_returns = np.random.randn(n_days, n_assets) * 0.005

# Combined returns with positive drift
drift = np.random.uniform(0.0001, 0.0005, n_assets)
returns_matrix = factor_returns @ factor_loadings.T + idiosyncratic_returns + drift

# Create DataFrame
dates = pd.date_range('2020-01-01', periods=n_days, freq='D')
returns = pd.DataFrame(returns_matrix, index=dates, columns=tickers)

# Generate prices
prices = (1 + returns).cumprod() * 100

print(f"Generated {n_assets} assets with {n_days} daily observations")
print(f"Date range: {dates[0].date()} to {dates[-1].date()}")

# Display sample
returns.head()

## 2. Exploratory Data Analysis

In [None]:
# Price evolution
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Plot prices
prices.plot(ax=axes[0], linewidth=1.5, alpha=0.7)
axes[0].set_title('Asset Prices Over Time', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Price')
axes[0].legend(loc='upper left', ncol=2, fontsize=8)
axes[0].grid(True, alpha=0.3)

# Plot cumulative returns
cumulative_returns = (1 + returns).cumprod() - 1
cumulative_returns.plot(ax=axes[1], linewidth=1.5, alpha=0.7)
axes[1].set_title('Cumulative Returns', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Cumulative Return')
axes[1].legend(loc='upper left', ncol=2, fontsize=8)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Compute statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
sharpe_ratios = annual_returns / annual_volatility

# Summary statistics
summary_stats = pd.DataFrame({
    'Annual Return': annual_returns,
    'Annual Volatility': annual_volatility,
    'Sharpe Ratio': sharpe_ratios
})

print("Asset Statistics:")
print(summary_stats.round(4))
print(f"\nAverage Sharpe Ratio: {sharpe_ratios.mean():.3f}")

In [None]:
# Correlation analysis
correlation_matrix = returns.corr()

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
            center=0, square=True, linewidths=0.5, ax=ax,
            cbar_kws={'label': 'Correlation'})
ax.set_title('Asset Return Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"Average correlation: {correlation_matrix.values[np.triu_indices_from(correlation_matrix.values, k=1)].mean():.3f}")

## 3. Portfolio Optimization

We'll compare 4 different portfolio construction strategies.

In [None]:
# Covariance matrix (annualized)
cov_matrix = returns.cov() * 252

# Portfolio evaluation function
def evaluate_portfolio(weights, returns, cov_matrix):
    """Calculate portfolio metrics."""
    port_return = (weights * returns).sum()
    port_vol = np.sqrt(weights.values @ cov_matrix.values @ weights.values)
    sharpe = port_return / port_vol if port_vol > 0 else 0
    n_assets = (weights > 1e-4).sum()
    
    return {
        'return': port_return,
        'volatility': port_vol,
        'sharpe': sharpe,
        'n_assets': n_assets
    }

### Strategy 1: Equal Weight

In [None]:
equal_weights = pd.Series(1.0 / n_assets, index=tickers)
equal_metrics = evaluate_portfolio(equal_weights, annual_returns, cov_matrix)

print("Equal Weight Portfolio:")
print(f"  Expected Return: {equal_metrics['return']:.2%}")
print(f"  Volatility: {equal_metrics['volatility']:.2%}")
print(f"  Sharpe Ratio: {equal_metrics['sharpe']:.3f}")

### Strategy 2: Maximum Sharpe Ratio

In [None]:
# Random search for maximum Sharpe ratio
best_sharpe = -np.inf
best_weights = None

for _ in range(10000):
    w = np.random.dirichlet(np.ones(n_assets))
    port_return = (w * annual_returns.values).sum()
    port_vol = np.sqrt(w @ cov_matrix.values @ w)
    sharpe = port_return / port_vol if port_vol > 0 else 0
    
    if sharpe > best_sharpe:
        best_sharpe = sharpe
        best_weights = w

max_sharpe_weights = pd.Series(best_weights, index=tickers)
max_sharpe_metrics = evaluate_portfolio(max_sharpe_weights, annual_returns, cov_matrix)

print("Maximum Sharpe Portfolio:")
print(f"  Expected Return: {max_sharpe_metrics['return']:.2%}")
print(f"  Volatility: {max_sharpe_metrics['volatility']:.2%}")
print(f"  Sharpe Ratio: {max_sharpe_metrics['sharpe']:.3f}")
print(f"\nTop 5 holdings:")
print(max_sharpe_weights.nlargest(5).round(4))

### Strategy 3: Minimum Variance

In [None]:
# Random search for minimum variance
min_var = np.inf
min_var_weights = None

for _ in range(10000):
    w = np.random.dirichlet(np.ones(n_assets))
    port_var = w @ cov_matrix.values @ w
    
    if port_var < min_var:
        min_var = port_var
        min_var_weights = w

min_variance_weights = pd.Series(min_var_weights, index=tickers)
min_var_metrics = evaluate_portfolio(min_variance_weights, annual_returns, cov_matrix)

print("Minimum Variance Portfolio:")
print(f"  Expected Return: {min_var_metrics['return']:.2%}")
print(f"  Volatility: {min_var_metrics['volatility']:.2%}")
print(f"  Sharpe Ratio: {min_var_metrics['sharpe']:.3f}")
print(f"\nTop 5 holdings:")
print(min_variance_weights.nlargest(5).round(4))

### Strategy 4: Concentrated (Cardinality Constraint)

In [None]:
# Select top 5 assets by Sharpe ratio
max_assets = 5
top_assets = sharpe_ratios.nlargest(max_assets).index

# Optimize within selected assets
best_sharpe_concentrated = -np.inf
best_concentrated = None

for _ in range(10000):
    w = np.zeros(n_assets)
    top_indices = [tickers.index(t) for t in top_assets]
    w[top_indices] = np.random.dirichlet(np.ones(max_assets))
    
    port_return = (w * annual_returns.values).sum()
    port_vol = np.sqrt(w @ cov_matrix.values @ w)
    sharpe = port_return / port_vol if port_vol > 0 else 0
    
    if sharpe > best_sharpe_concentrated:
        best_sharpe_concentrated = sharpe
        best_concentrated = w

concentrated_weights = pd.Series(best_concentrated, index=tickers)
concentrated_metrics = evaluate_portfolio(concentrated_weights, annual_returns, cov_matrix)

print(f"Concentrated Portfolio ({max_assets} assets):")
print(f"  Expected Return: {concentrated_metrics['return']:.2%}")
print(f"  Volatility: {concentrated_metrics['volatility']:.2%}")
print(f"  Sharpe Ratio: {concentrated_metrics['sharpe']:.3f}")
print(f"\nHoldings:")
print(concentrated_weights[concentrated_weights > 1e-4].round(4))

## 4. Performance Comparison

In [None]:
# Compile results
results = {
    'Equal Weight': equal_metrics,
    'Max Sharpe': max_sharpe_metrics,
    'Min Variance': min_var_metrics,
    'Concentrated': concentrated_metrics
}

comparison_df = pd.DataFrame(results).T
comparison_df.columns = ['Return', 'Volatility', 'Sharpe Ratio', 'N Assets']

print("\n" + "="*70)
print("PORTFOLIO COMPARISON")
print("="*70)
print(comparison_df.round(4))
print("="*70)

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Returns
comparison_df['Return'].plot(kind='bar', ax=axes[0], color='green', alpha=0.7, edgecolor='black')
axes[0].set_title('Expected Annual Return', fontweight='bold', fontsize=12)
axes[0].set_ylabel('Return')
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=45, ha='right')
axes[0].grid(True, alpha=0.3, axis='y')

# Volatility
comparison_df['Volatility'].plot(kind='bar', ax=axes[1], color='red', alpha=0.7, edgecolor='black')
axes[1].set_title('Annual Volatility', fontweight='bold', fontsize=12)
axes[1].set_ylabel('Volatility')
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=45, ha='right')
axes[1].grid(True, alpha=0.3, axis='y')

# Sharpe Ratio
comparison_df['Sharpe Ratio'].plot(kind='bar', ax=axes[2], color='blue', alpha=0.7, edgecolor='black')
axes[2].set_title('Sharpe Ratio', fontweight='bold', fontsize=12)
axes[2].set_ylabel('Sharpe Ratio')
axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=45, ha='right')
axes[2].grid(True, alpha=0.3, axis='y')

plt.suptitle('Strategy Performance Metrics', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# Risk-Return scatter
fig, ax = plt.subplots(figsize=(10, 7))

colors_map = {'Equal Weight': '#e74c3c', 'Max Sharpe': '#3498db', 
              'Min Variance': '#2ecc71', 'Concentrated': '#f39c12'}
markers_map = {'Equal Weight': 'o', 'Max Sharpe': 's', 
               'Min Variance': '^', 'Concentrated': 'd'}

for strategy in comparison_df.index:
    vol = comparison_df.loc[strategy, 'Volatility']
    ret = comparison_df.loc[strategy, 'Return']
    sharpe = comparison_df.loc[strategy, 'Sharpe Ratio']
    
    ax.scatter(vol, ret, s=300, alpha=0.7, label=strategy,
              color=colors_map[strategy], marker=markers_map[strategy],
              edgecolors='black', linewidths=2)
    
    ax.annotate(f'SR={sharpe:.2f}',
               xy=(vol, ret), xytext=(10, 10),
               textcoords='offset points', fontsize=9,
               bbox=dict(boxstyle='round,pad=0.3', facecolor=colors_map[strategy], alpha=0.3))

ax.set_xlabel('Annual Volatility (Risk)', fontsize=12, fontweight='bold')
ax.set_ylabel('Expected Annual Return', fontsize=12, fontweight='bold')
ax.set_title('Risk-Return Profile', fontsize=14, fontweight='bold')
ax.legend(loc='best', frameon=True, fontsize=10, shadow=True)
ax.grid(True, alpha=0.4, linestyle='--')

plt.tight_layout()
plt.show()

## 5. Portfolio Weights Visualization

In [None]:
# Create weights DataFrame
weights_df = pd.DataFrame({
    'Equal Weight': equal_weights,
    'Max Sharpe': max_sharpe_weights,
    'Min Variance': min_variance_weights,
    'Concentrated': concentrated_weights
})

# Stacked bar chart
fig, ax = plt.subplots(figsize=(12, 6))
weights_df.T.plot(kind='bar', stacked=True, ax=ax, 
                   colormap='tab20', edgecolor='black', linewidth=0.5)
ax.set_title('Portfolio Weights Comparison', fontsize=14, fontweight='bold')
ax.set_ylabel('Weight', fontsize=12)
ax.set_xlabel('Strategy', fontsize=12)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')
ax.legend(title='Assets', bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

## 6. Key Insights

### Observations:

1. **Concentrated Portfolio** achieves the highest Sharpe ratio by focusing on top-performing assets
2. **Maximum Sharpe** strategy provides good risk-adjusted returns across all assets
3. **Minimum Variance** offers lowest risk but may sacrifice returns
4. **Equal Weight** provides a simple, diversified baseline

### Real-World Considerations:

- Transaction costs favor concentrated portfolios (fewer trades)
- Integer constraints require discrete lot sizes
- Cardinality limits reduce monitoring overhead
- Forecasting errors impact optimization results

### Next Steps:

1. Add transaction cost modeling
2. Implement rolling window backtesting
3. Test with real market data
4. Compare with ML-driven heuristics (Genetic Algorithm)
5. Analyze sensitivity to input parameters

## Conclusion

This tutorial demonstrated:
- Synthetic data generation with realistic properties
- Multiple portfolio optimization strategies
- Performance comparison and visualization
- Trade-offs between return, risk, and diversification

For the complete implementation with real data, GARCH models, and MIO solvers, see the main project repository!