# Notebook 05: Project Exercises

## Topic 5: Apply What You've Learned

This notebook contains three mini-projects. Choose one (or more!) to work on:

1. **Project A**: Parallel Portfolio Optimizer
2. **Project B**: Parallel Parameter Sensitivity Analysis
3. **Project C**: Multi-Strategy Backtester

Each project includes:
- Problem description
- Starter code
- Tasks to complete
- Hints

---

In [None]:
import numpy as np
import pandas as pd
import time
import os
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
import matplotlib.pyplot as plt
from scipy.optimize import minimize

n_cores = os.cpu_count()
print(f"Available CPU cores: {n_cores}")

---

## Project A: Parallel Portfolio Optimizer

### Background

Portfolio optimization finds the optimal asset weights that maximize return for a given level of risk (or minimize risk for a given return). This often requires:
1. Generating many random portfolios (Monte Carlo)
2. Calculating risk/return for each
3. Finding the efficient frontier

### Your Task

Parallelize the Monte Carlo portfolio simulation to find the efficient frontier.

In [None]:
# Sample data: 5 assets with expected returns, volatilities, and correlations
np.random.seed(42)

assets = ['Stocks', 'Bonds', 'Gold', 'Real Estate', 'Commodities']
expected_returns = np.array([0.10, 0.04, 0.05, 0.08, 0.06])  # Annual expected returns
volatilities = np.array([0.18, 0.06, 0.15, 0.12, 0.20])  # Annual volatilities

# Correlation matrix
correlations = np.array([
    [1.00, 0.20, 0.05, 0.60, 0.40],
    [0.20, 1.00, 0.30, 0.10, 0.05],
    [0.05, 0.30, 1.00, 0.10, 0.25],
    [0.60, 0.10, 0.10, 1.00, 0.30],
    [0.40, 0.05, 0.25, 0.30, 1.00]
])

# Covariance matrix
cov_matrix = np.outer(volatilities, volatilities) * correlations

print("Assets and Expected Returns:")
for asset, ret, vol in zip(assets, expected_returns, volatilities):
    print(f"  {asset}: Return={ret*100:.1f}%, Vol={vol*100:.1f}%")

In [None]:
def calculate_portfolio_metrics(weights, expected_returns, cov_matrix):
    """
    Calculate portfolio return and volatility.
    
    Parameters:
    -----------
    weights : array of portfolio weights
    expected_returns : array of asset expected returns
    cov_matrix : covariance matrix of returns
    
    Returns:
    --------
    tuple of (portfolio_return, portfolio_volatility, sharpe_ratio)
    """
    portfolio_return = np.dot(weights, expected_returns)
    portfolio_volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    sharpe_ratio = portfolio_return / portfolio_volatility  # Assuming rf=0
    
    return portfolio_return, portfolio_volatility, sharpe_ratio

In [None]:
def generate_random_portfolios_batch(args):
    """
    Generate a batch of random portfolios and calculate their metrics.
    
    Parameters:
    -----------
    args : tuple of (n_portfolios, n_assets, expected_returns, cov_matrix, seed)
    
    Returns:
    --------
    list of dicts with portfolio metrics
    """
    n_portfolios, n_assets, expected_returns, cov_matrix, seed = args
    np.random.seed(seed)
    
    results = []
    for _ in range(n_portfolios):
        # Generate random weights that sum to 1
        weights = np.random.random(n_assets)
        weights = weights / weights.sum()
        
        ret, vol, sharpe = calculate_portfolio_metrics(weights, expected_returns, cov_matrix)
        
        results.append({
            'return': ret,
            'volatility': vol,
            'sharpe': sharpe,
            'weights': weights.copy()
        })
    
    return results

In [None]:
# TODO: Complete this code to run the portfolio simulation in parallel

# Parameters
total_portfolios = 50_000
n_batches = 8
portfolios_per_batch = total_portfolios // n_batches
n_assets = len(assets)

print(f"Generating {total_portfolios:,} random portfolios...\n")

# TODO: Create batch_args list
# batch_args = [
#     (portfolios_per_batch, n_assets, expected_returns, cov_matrix, seed)
#     for seed in range(n_batches)
# ]

# TODO: Implement sequential version and time it
# start = time.time()
# sequential_results = [generate_random_portfolios_batch(args) for args in batch_args]
# seq_time = time.time() - start

# TODO: Implement parallel version and time it
# start = time.time()
# with ProcessPoolExecutor(max_workers=n_cores) as executor:
#     parallel_results = list(executor.map(generate_random_portfolios_batch, batch_args))
# par_time = time.time() - start

# TODO: Flatten results and create DataFrame
# all_portfolios = [p for batch in parallel_results for p in batch]
# portfolios_df = pd.DataFrame(all_portfolios)

# Placeholder until you implement
print("TODO: Implement parallel portfolio generation")

In [None]:
# TODO: Visualize the efficient frontier
# 
# 1. Plot all portfolios as scatter plot (vol vs return)
# 2. Color by Sharpe ratio
# 3. Find and mark the maximum Sharpe ratio portfolio
# 4. Find and mark the minimum volatility portfolio

# plt.figure(figsize=(12, 8))
# plt.scatter(portfolios_df['volatility']*100, portfolios_df['return']*100, 
#             c=portfolios_df['sharpe'], cmap='viridis', alpha=0.5, s=5)
# plt.colorbar(label='Sharpe Ratio')
# plt.xlabel('Volatility (%)')
# plt.ylabel('Expected Return (%)')
# plt.title('Efficient Frontier via Monte Carlo Simulation')
# plt.show()

print("TODO: Implement visualization")

---

## Project B: Parallel Parameter Sensitivity Analysis

### Background

When pricing options or calculating risk metrics, we often want to understand how sensitive our results are to input parameters. This requires recalculating the metric many times with different inputs.

### Your Task

Perform sensitivity analysis on Black-Scholes option pricing:
- How does the option price change with volatility?
- How does it change with interest rate?
- How does it change with time to maturity?

Create a grid of all combinations and calculate in parallel.

In [None]:
from scipy.stats import norm

def black_scholes_call(S, K, T, r, sigma):
    """
    Calculate Black-Scholes price for European call option.
    
    Parameters:
    -----------
    S : float - Current stock price
    K : float - Strike price
    T : float - Time to maturity (years)
    r : float - Risk-free rate
    sigma : float - Volatility
    
    Returns:
    --------
    float - Option price
    """
    if T <= 0:
        return max(S - K, 0)
    
    d1 = (np.log(S/K) + (r + 0.5*sigma**2)*T) / (sigma*np.sqrt(T))
    d2 = d1 - sigma*np.sqrt(T)
    
    return S*norm.cdf(d1) - K*np.exp(-r*T)*norm.cdf(d2)

# Base case parameters
S0 = 100   # Stock price
K = 100    # Strike price
T0 = 1.0   # 1 year
r0 = 0.05  # 5% rate
sigma0 = 0.2  # 20% vol

base_price = black_scholes_call(S0, K, T0, r0, sigma0)
print(f"Base case option price: ${base_price:.4f}")

In [None]:
def calculate_price_for_params(args):
    """
    Calculate option price for given parameters.
    
    Parameters:
    -----------
    args : tuple of (S, K, T, r, sigma)
    
    Returns:
    --------
    dict with parameters and resulting price
    """
    S, K, T, r, sigma = args
    
    price = black_scholes_call(S, K, T, r, sigma)
    
    return {
        'S': S,
        'K': K,
        'T': T,
        'r': r,
        'sigma': sigma,
        'price': price
    }

In [None]:
# TODO: Create parameter ranges for sensitivity analysis

# Volatility range: 10% to 50%
sigma_range = np.linspace(0.10, 0.50, 21)

# Interest rate range: 0% to 10%
r_range = np.linspace(0.00, 0.10, 21)

# Time to maturity: 0.1 to 2 years
T_range = np.linspace(0.1, 2.0, 20)

# Create all combinations
from itertools import product

# TODO: Generate all parameter combinations
# param_grid = list(product([S0], [K], T_range, r_range, sigma_range))
# print(f"Total combinations to calculate: {len(param_grid)}")

# TODO: Calculate prices in parallel
# with ProcessPoolExecutor(max_workers=n_cores) as executor:
#     results = list(executor.map(calculate_price_for_params, param_grid))

# TODO: Create DataFrame and visualize results
# results_df = pd.DataFrame(results)

print("TODO: Implement parameter sensitivity analysis")

In [None]:
# TODO: Create visualization
#
# Suggested plots:
# 1. Option price vs volatility (for different T values)
# 2. Option price vs time to maturity (for different sigma values)
# 3. Heatmap of price vs (sigma, r)

print("TODO: Implement visualization")

---

## Project C: Multi-Strategy Backtester

### Background

Professional quant firms test many strategies simultaneously. Each strategy is independent, making this perfect for parallelization.

### Your Task

Implement and backtest three different trading strategies in parallel:
1. **Momentum Strategy**: Buy winners, sell losers
2. **Mean Reversion Strategy**: Buy losers, sell winners  
3. **Volatility Breakout Strategy**: Trade when price breaks out of recent range

In [None]:
# Generate sample price data
np.random.seed(42)
n_days = 2520  # 10 years

# Create realistic price series with trends and mean reversion
returns = np.random.standard_t(df=5, size=n_days) * 0.015 + 0.0003
prices = 100 * np.cumprod(1 + returns)

dates = pd.date_range('2014-01-01', periods=n_days, freq='D')
price_df = pd.DataFrame({'close': prices}, index=dates)

print(f"Price data: {len(price_df)} days")
print(f"Start: ${price_df['close'].iloc[0]:.2f}")
print(f"End: ${price_df['close'].iloc[-1]:.2f}")

In [None]:
def backtest_strategy(args):
    """
    Backtest a trading strategy.
    
    Parameters:
    -----------
    args : tuple of (strategy_name, prices, params)
    
    Returns:
    --------
    dict with strategy performance metrics
    """
    strategy_name, prices, params = args
    
    if strategy_name == 'momentum':
        return backtest_momentum(prices, params)
    elif strategy_name == 'mean_reversion':
        return backtest_mean_reversion(prices, params)
    elif strategy_name == 'volatility_breakout':
        return backtest_volatility_breakout(prices, params)
    else:
        raise ValueError(f"Unknown strategy: {strategy_name}")

def backtest_momentum(prices, params):
    """
    Momentum strategy: Go long when return over lookback period is positive.
    
    params: {'lookback': int}
    """
    lookback = params['lookback']
    
    # Calculate momentum signal
    returns = prices.pct_change(lookback)
    signal = (returns > 0).astype(int).shift(1)
    
    # Calculate strategy returns
    daily_returns = prices.pct_change()
    strategy_returns = signal * daily_returns
    strategy_returns = strategy_returns.dropna()
    
    return calculate_performance_metrics('momentum', lookback, strategy_returns)

def backtest_mean_reversion(prices, params):
    """
    Mean reversion: Go long when price is below moving average.
    
    params: {'window': int}
    """
    window = params['window']
    
    # Calculate signal
    ma = prices.rolling(window).mean()
    signal = (prices < ma).astype(int).shift(1)
    
    # Calculate strategy returns
    daily_returns = prices.pct_change()
    strategy_returns = signal * daily_returns
    strategy_returns = strategy_returns.dropna()
    
    return calculate_performance_metrics('mean_reversion', window, strategy_returns)

def backtest_volatility_breakout(prices, params):
    """
    Volatility breakout: Go long when price breaks above recent high.
    
    params: {'lookback': int}
    """
    lookback = params['lookback']
    
    # Calculate signal
    rolling_high = prices.rolling(lookback).max().shift(1)
    signal = (prices > rolling_high).astype(int).shift(1)
    
    # Calculate strategy returns
    daily_returns = prices.pct_change()
    strategy_returns = signal * daily_returns
    strategy_returns = strategy_returns.dropna()
    
    return calculate_performance_metrics('volatility_breakout', lookback, strategy_returns)

def calculate_performance_metrics(strategy_name, param_value, returns):
    """
    Calculate performance metrics for a strategy.
    """
    if len(returns) == 0:
        return None
    
    total_return = (1 + returns).prod() - 1
    annual_return = (1 + total_return) ** (252 / len(returns)) - 1
    volatility = returns.std() * np.sqrt(252)
    sharpe = annual_return / volatility if volatility > 0 else 0
    
    # Max drawdown
    cumulative = (1 + returns).cumprod()
    rolling_max = cumulative.cummax()
    drawdown = (cumulative - rolling_max) / rolling_max
    max_drawdown = drawdown.min()
    
    return {
        'strategy': strategy_name,
        'param': param_value,
        'total_return': total_return * 100,
        'annual_return': annual_return * 100,
        'volatility': volatility * 100,
        'sharpe': sharpe,
        'max_drawdown': max_drawdown * 100
    }

In [None]:
# TODO: Create list of all strategy/parameter combinations to test

prices = price_df['close']

# Parameter ranges for each strategy
momentum_params = [{'lookback': lb} for lb in range(5, 61, 5)]
mean_rev_params = [{'window': w} for w in range(10, 101, 10)]
vol_breakout_params = [{'lookback': lb} for lb in range(5, 41, 5)]

# TODO: Create args list for all strategies
# all_args = []
# all_args.extend([('momentum', prices, p) for p in momentum_params])
# all_args.extend([('mean_reversion', prices, p) for p in mean_rev_params])
# all_args.extend([('volatility_breakout', prices, p) for p in vol_breakout_params])

# print(f"Total strategy variants to test: {len(all_args)}")

# TODO: Run backtests in parallel
# with ProcessPoolExecutor(max_workers=n_cores) as executor:
#     results = list(executor.map(backtest_strategy, all_args))

# results = [r for r in results if r is not None]
# results_df = pd.DataFrame(results)

print("TODO: Implement multi-strategy backtest")

In [None]:
# TODO: Analyze and visualize results
#
# Suggested analysis:
# 1. Best parameters for each strategy
# 2. Performance comparison across strategies
# 3. Parameter sensitivity plot for each strategy

print("TODO: Implement analysis and visualization")

---

## Solutions

The solutions to these exercises are available in `solutions/project_solutions.ipynb`.

Try to complete the exercises on your own first!

---

## Best Practices Checklist

Before you finish, make sure your parallel code follows these best practices:

### Correctness
- [ ] Functions work correctly with `max_workers=1`
- [ ] No shared mutable state between workers
- [ ] Random seeds set for reproducibility
- [ ] Results match sequential version

### Performance
- [ ] Tasks are substantial enough (>10ms each)
- [ ] Worker count matches available cores for CPU-bound tasks
- [ ] Data is chunked appropriately if many small tasks
- [ ] Measured actual speedup vs sequential

### Robustness
- [ ] Exceptions are handled gracefully
- [ ] Functions are picklable (no lambdas)
- [ ] Context managers used (`with` statement)

### Code Quality
- [ ] Clear function documentation
- [ ] Meaningful variable names
- [ ] Progress feedback for long operations