# Lab 7: Digital Asset Data Analysis

Market structure, volatility, and efficiency testing

> **Expected time**
>
> -   Core lab: ‚âà 75 minutes
> -   Optional extensions: +30‚Äì60 minutes

<figure>
<a
href="https://colab.research.google.com/github/quinfer/financial-data-science/blob/main/labs/notebooks/lab07_crypto.ipynb"><img
src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
<figcaption>Open in Colab</figcaption>
</figure>

## Before You Code: The Big Picture

Cryptocurrencies promise **financial inclusion, decentralization, and
censorship resistance**. But do they deliver? Let‚Äôs test the claims
empirically using market microstructure analysis.

> **The Crypto Promise vs.¬†Reality**
>
> **The Promise:**
>
> 1.  **Inclusion**: Banking for the 1.7 billion unbanked (World Bank)
> 2.  **Efficiency**: Near-zero transaction costs, instant settlement
> 3.  **Decentralization**: No intermediaries, no gatekeepers
> 4.  **Transparency**: All transactions on public blockchain
>
> **The Reality (Empirical Evidence):**
>
> -   **Volatility**: Bitcoin std dev ~80% annualized (vs.¬†S&P 500 ~15%)
> -   **Correlation**: Bitcoin-S&P correlation increased from ~0 (2015)
>     to ~0.5 (2022)‚Äîno longer diversifying
> -   **Efficiency**: Autocorrelation tests show predictability
>     (inefficient markets)
> -   **Inclusion**: 95% of crypto holders are speculators, not unbanked
>     users (Makarov & Schoar 2022, JF)
> -   **Costs**: During congestion, Ethereum gas fees reached \$50+ per
>     transaction
>
> **The Academic Debate:**
>
> -   **Academic skeptics** (no financial stake): [Paul
>     Krugman](https://en.wikipedia.org/wiki/Paul_Krugman) (Nobel
>     Prize-winning economist, argues crypto lacks intrinsic value and
>     serves primarily for illegal transactions) and [Nouriel
>     Roubini](https://en.wikipedia.org/wiki/Nouriel_Roubini) (NYU
>     economist who predicted 2008 crisis, calls crypto ‚Äúthe mother of
>     all scams‚Äù) view cryptocurrency as a speculative bubble with no
>     fundamental value, poor unit of account properties, and dominated
>     by fraud. Their critique comes from outside the crypto ecosystem
>     with no personal financial interest.
>
> -   **Industry advocates** (significant skin in the game): [Andreas M.
>     Antonopoulos](https://aantonop.com/) (author of *Mastering
>     Bitcoin*, emphasizes censorship resistance and financial
>     sovereignty) and [Vitalik Buterin](https://vitalik.eth.limo/)
>     (Ethereum co-founder, argues for programmable money and
>     decentralized applications beyond payments) counter that crypto is
>     early-stage infrastructure‚Äîlike the internet in 1995‚Äîrequiring
>     time for legitimate use cases to mature beyond speculation. Both
>     have deep financial and reputational stakes in crypto‚Äôs success.
>
> -   **Evidence-based approach** (This lab): Understanding incentives
>     matters. Academic critics risk nothing by being wrong; industry
>     advocates benefit financially from adoption. Rather than choosing
>     sides, we test empirical claims with data‚Äîvolatility patterns,
>     correlation dynamics, market efficiency, and actual usage
>     statistics.

### What You‚Äôll Build Today

By the end of this lab, you will have:

-   ‚úÖ Real-time crypto data from public APIs (CoinGecko)
-   ‚úÖ Volatility analysis comparing crypto to traditional assets
-   ‚úÖ Return distribution analysis (fat tails, skewness)
-   ‚úÖ Market efficiency tests (autocorrelation, mean reversion)
-   ‚úÖ Critical perspective on crypto‚Äôs actual use cases

> **Why This Matters**
>
> Crypto is either the future of finance or a trillion-dollar
> speculative bubble. Your job as a data scientist: **test the claims
> empirically**, not ideologically. This lab shows you how.

## Learning Objectives

By the end of this lab, you will be able to:

-   Access cryptocurrency market data using public APIs
-   Calculate and compare volatility across crypto and traditional
    assets
-   Analyze return distributions and identify tail risk
-   Measure correlation patterns (within-crypto and cross-asset)
-   Test market efficiency using autocorrelation and arbitrage analysis
-   Visualize price dynamics and microstructure features
-   Evaluate crypto financial inclusion claims empirically

## Setup and Dependencies

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# For reading data files
try:
    import requests  # For downloading from GitHub
except ImportError:
    print("Installing requests...")
    !pip install -q requests
    import requests

# Note: openpyxl only needed if reading Bloomberg Excel directly
# We're using CSV from GitHub, so not required for students

# For statistical tests
try:
    import statsmodels.api as sm
    from statsmodels.tsa.stattools import adfuller, acf
except ImportError:
    print("Installing statsmodels...")
    !pip install -q statsmodels

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("‚úì Setup complete - ready for crypto market analysis")

## Exercise 1: Accessing Cryptocurrency Market Data

### Understanding Crypto Data Sources

Unlike traditional finance where Bloomberg terminals and licensed data
vendors dominate, cryptocurrency data comes from public APIs provided by
exchanges and aggregators. This democratizes access‚Äîyou can get the same
data professionals use‚Äîbut also creates challenges around data quality,
fragmentation, and standardization.

**Key data sources:**

-   **Aggregators**: CoinGecko, CoinMarketCap (volume-weighted prices
    across exchanges)
-   **Exchanges**: Coinbase Pro, Binance, Kraken (order books, trade
    data, official prices)
-   **Blockchain explorers**: On-chain data (transaction volumes,
    addresses, mining)
-   **Derivatives**: CME, Deribit (futures, options implied volatility)

We‚Äôll use **real data from Bloomberg Terminal**, downloaded via the
Excel add-in. This provides institutional-quality pricing with proper
corporate actions handling and validated sources.

> **Bloomberg Terminal Data**
>
> This lab uses data downloaded from Bloomberg Terminal (XBTUSD Curncy,
> ETHUSD Curncy, etc.). Bloomberg provides the most reliable crypto
> pricing for institutional use. If you don‚Äôt have Terminal access,
> alternatives include Yahoo Finance (free but less reliable) or
> CoinGecko Pro (paid API).

### Loading Bloomberg Data from GitHub

In [None]:
def load_bloomberg_crypto(github_url='https://quinfer.github.io/financial-data-science/data/chapter07/crypto_bloomberg.csv'):
    """
    Load cryptocurrency data from Bloomberg Terminal (CSV format).
    
    This function loads data from GitHub Pages by default (works in Colab).
    Falls back to local file if GitHub Pages is unavailable.
    
    Parameters
    ----------
    github_url : str
        URL to Bloomberg crypto CSV on GitHub Pages
        
    Returns
    -------
    pd.DataFrame
        Bitcoin price data with date index
    """
    # Try GitHub Pages first (default for Colab/remote students)
    try:
        print("üì• Loading Bloomberg data from GitHub Pages...")
        df = pd.read_csv(github_url, parse_dates=['date'])
        df = df.set_index('date')
        print(f"‚úÖ Loaded Bloomberg data: {len(df)} rows")
        print(f"   Source: GitHub Pages (quinfer.github.io/financial-data-science)")
        return df
    except Exception as e:
        print(f"‚ö†Ô∏è  Could not load from GitHub Pages: {e}")
        pass
    
    # Fallback to local file (if running on campus with repo)
    try:
        local_path = 'data/chapter07/crypto_bloomberg.csv'
        df = pd.read_csv(local_path, parse_dates=['date'])
        df = df.set_index('date')
        print(f"‚úÖ Loaded local Bloomberg data: {len(df)} rows")
        return df
    except FileNotFoundError:
        print(f"‚ö†Ô∏è  Local file not found")
        return None

# Load Bitcoin data from Bloomberg Terminal
print("Loading cryptocurrency data from Bloomberg Terminal...")
btc_bloomberg = load_bloomberg_crypto()

if btc_bloomberg is not None:
    # Use real Bloomberg data
    btc_data = btc_bloomberg[['price']].copy()
    btc_data['volume'] = btc_bloomberg['volume'] if 'volume' in btc_bloomberg else None
    
    print(f"‚úÖ Bitcoin (Bloomberg): {len(btc_data)} days of data")
    print(f"  Date range: {btc_data.index.min().date()} to {btc_data.index.max().date()}")
    print(f"  Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}")
    
    # Use last 2 years for analysis (to match typical lab scope)
    cutoff_date = btc_data.index.max() - pd.Timedelta(days=730)
    btc_data = btc_data[btc_data.index >= cutoff_date]
    print(f"  Using last 2 years: {len(btc_data)} days")
    
    # Create synthetic ETH and BNB for comparison (scaled from BTC)
    # Real multi-asset Bloomberg data would require separate Terminal queries
    eth_data = btc_data.copy()
    eth_data['price'] = btc_data['price'] * 0.05  # Roughly ETH/BTC ratio
    bnb_data = btc_data.copy()
    bnb_data['price'] = btc_data['price'] * 0.01  # Roughly BNB/BTC ratio
else:
    # Fallback to synthetic data if Bloomberg not available
    print("‚ö†Ô∏è  Bloomberg data not available, using synthetic data...")
    dates = pd.date_range(end=pd.Timestamp.now(), periods=730, freq='D')
    btc_data = pd.DataFrame({
        'price': 30000 + np.cumsum(np.random.randn(730) * 500),
        'volume': np.random.rand(730) * 1e9
    }, index=dates)
    eth_data = btc_data.copy()
    eth_data['price'] = btc_data['price'] * 0.05
    bnb_data = btc_data.copy()
    bnb_data['price'] = btc_data['price'] * 0.01

if btc_data is not None:
    print(f"‚úì Retrieved {len(btc_data)} days of Bitcoin data")
    print(f"  Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}")
    print("\nSample data:")
    print(btc_data.head())

### Visualizing Price Trends

In [None]:
# Create comprehensive price visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin price
axes[0, 0].plot(btc_data.index, btc_data['price'], color='orange', linewidth=2)
axes[0, 0].set_title('Bitcoin Price (USD)', fontsize=13, fontweight='bold')
axes[0, 0].set_ylabel('Price ($)')
axes[0, 0].grid(alpha=0.3)
axes[0, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

# Ethereum price
axes[0, 1].plot(eth_data.index, eth_data['price'], color='blue', linewidth=2)
axes[0, 1].set_title('Ethereum Price (USD)', fontsize=13, fontweight='bold')
axes[0, 1].set_ylabel('Price ($)')
axes[0, 1].grid(alpha=0.3)

# Trading volumes
axes[1, 0].plot(btc_data.index, btc_data['volume'], color='green', alpha=0.7, linewidth=1.5)
axes[1, 0].set_title('Bitcoin Trading Volume', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Volume ($)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].grid(alpha=0.3)
axes[1, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1e9:.1f}B'))

# Price comparison (normalized to 100)
btc_norm = 100 * btc_data['price'] / btc_data['price'].iloc[0]
eth_norm = 100 * eth_data['price'] / eth_data['price'].iloc[0]
bnb_norm = 100 * bnb_data['price'] / bnb_data['price'].iloc[0]

axes[1, 1].plot(btc_norm.index, btc_norm, label='Bitcoin', color='orange', linewidth=2)
axes[1, 1].plot(eth_norm.index, eth_norm, label='Ethereum', color='blue', linewidth=2)
axes[1, 1].plot(bnb_norm.index, bnb_norm, label='BNB', color='gold', linewidth=2)
axes[1, 1].set_title('Comparative Performance (Base = 100)', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Index Value')
axes[1, 1].set_xlabel('Date')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS (2-year period)")
print("="*60)
for name, data in [('Bitcoin', btc_data), ('Ethereum', eth_data), ('BNB', bnb_data)]:
    total_return = (data['price'].iloc[-1] / data['price'].iloc[0] - 1) * 100
    max_price = data['price'].max()
    min_price = data['price'].min()
    drawdown = ((data['price'] / data['price'].cummax()) - 1).min() * 100
    
    print(f"\n{name}:")
    print(f"  Total Return: {total_return:+.1f}%")
    print(f"  Price Range: ${min_price:,.0f} - ${max_price:,.0f}")
    print(f"  Max Drawdown: {drawdown:.1f}%")

### Reflection Questions (Exercise 1)

Write 200-250 words addressing:

1.  **Data Quality**: What challenges might arise from using free
    aggregator APIs versus licensed data feeds? How might wash trading
    on some exchanges affect aggregate data quality?

2.  **Price Fragmentation**: CoinGecko aggregates prices across
    exchanges. Why might Bitcoin trade at different prices
    simultaneously on different venues? What arbitrage mechanisms should
    eliminate these spreads?

3.  **Volume Interpretation**: How should we interpret trading volume
    data knowing that significant portion might be wash trading? What
    alternative metrics could measure genuine market activity?

## Exercise 2: Volatility and Risk Analysis

### Calculating Returns and Volatility

In [None]:
# Calculate log returns
btc_data['returns'] = np.log(btc_data['price'] / btc_data['price'].shift(1))
eth_data['returns'] = np.log(eth_data['price'] / eth_data['price'].shift(1))
bnb_data['returns'] = np.log(bnb_data['price'] / bnb_data['price'].shift(1))

# Remove NaN values
btc_returns = btc_data['returns'].dropna()
eth_returns = eth_data['returns'].dropna()
bnb_returns = bnb_data['returns'].dropna()

# Calculate volatility metrics
def calculate_volatility_metrics(returns, name):
    """
    Calculate comprehensive volatility statistics for cryptocurrency returns.
    
    Computes key risk metrics used by portfolio managers: realized volatility,
    tail risk measures (VaR), and distribution shape (skewness, kurtosis).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns (log or simple returns)
    name : str
        Asset name for display in output
        
    Returns
    -------
    dict
        Dictionary with keys:
        - 'daily_vol' : float, daily standard deviation
        - 'annual_vol' : float, annualized standard deviation (daily * sqrt(365))
        - 'rolling_vol' : pd.Series, 30-day rolling volatility
        - 'skew' : float, skewness (negative = left tail)
        - 'kurt' : float, excess kurtosis (> 0 = fat tails)
        - 'var_95' : float, 5th percentile return (1-day VaR at 95%)
        - 'var_99' : float, 1st percentile return (1-day VaR at 99%)
        
    Notes
    -----
    - Annualization assumes 365 trading days (crypto markets trade 24/7)
    - Traditional equity markets use 252 trading days
    - VaR is historical (empirical percentiles), not parametric (Gaussian assumption)
    - Fat tails (kurtosis > 3) mean VaR underestimates extreme losses
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> metrics = calculate_volatility_metrics(btc_returns, 'Bitcoin')
    >>> metrics['annual_vol']
    0.65  # 65% annualized volatility (vs. S&P 500 ~15%)
    """
    daily_vol = returns.std()
    annual_vol = daily_vol * np.sqrt(365)
    
    # Rolling volatility (30-day window)
    rolling_vol = returns.rolling(30).std() * np.sqrt(365)
    
    # Skewness and kurtosis
    skew = stats.skew(returns.dropna())
    kurt = stats.kurtosis(returns.dropna())
    
    # Value at Risk (95% and 99%)
    var_95 = np.percentile(returns.dropna(), 5)
    var_99 = np.percentile(returns.dropna(), 1)
    
    print(f"\n{name} Volatility Metrics:")
    print(f"  Daily Volatility: {daily_vol*100:.2f}%")
    print(f"  Annualized Volatility: {annual_vol*100:.1f}%")
    print(f"  Skewness: {skew:.3f} {'(negative tail)' if skew < 0 else '(positive tail)'}")
    print(f"  Kurtosis: {kurt:.3f} {'(fat tails)' if kurt > 3 else '(thin tails)'}")
    print(f"  VaR (95%): {var_95*100:.2f}% (1-day)")
    print(f"  VaR (99%): {var_99*100:.2f}% (1-day)")
    
    return {
        'daily_vol': daily_vol,
        'annual_vol': annual_vol,
        'rolling_vol': rolling_vol,
        'skew': skew,
        'kurt': kurt,
        'var_95': var_95,
        'var_99': var_99
    }

print("="*70)
print("VOLATILITY ANALYSIS")
print("="*70)

btc_vol = calculate_volatility_metrics(btc_returns, "Bitcoin")
eth_vol = calculate_volatility_metrics(eth_returns, "Ethereum")
bnb_vol = calculate_volatility_metrics(bnb_returns, "BNB")

# Compare to traditional assets (typical values for reference)
print("\n" + "-"*70)
print("COMPARISON TO TRADITIONAL ASSETS (typical values):")
print("-"*70)
print("S&P 500:      Annual Vol ~15-20%, Skew ~-0.5, Kurtosis ~5-8")
print("Gold:         Annual Vol ~15-18%, Skew ~0.2, Kurtosis ~3-5")
print("Treasury Bonds: Annual Vol ~5-8%, Skew ~0.0, Kurtosis ~3-4")
print("\nCryptocurrency volatility is 3-5x higher than traditional assets!")

### Visualizing Return Distributions

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin return distribution
axes[0, 0].hist(btc_returns * 100, bins=50, alpha=0.7, color='orange', edgecolor='black')
axes[0, 0].axvline(btc_returns.mean() * 100, color='red', linestyle='--', linewidth=2, label=f'Mean: {btc_returns.mean()*100:.2f}%')
axes[0, 0].set_title('Bitcoin Daily Returns Distribution', fontsize=13, fontweight='bold')
axes[0, 0].set_xlabel('Daily Return (%)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# QQ plot for normality test
stats.probplot(btc_returns.dropna(), dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot: Bitcoin Returns vs Normal Distribution', fontsize=13, fontweight='bold')
axes[0, 1].grid(alpha=0.3)

# Rolling volatility
axes[1, 0].plot(btc_vol['rolling_vol'].index, btc_vol['rolling_vol'] * 100, 
                color='purple', linewidth=2, label='BTC Rolling Vol (30d)')
axes[1, 0].plot(eth_vol['rolling_vol'].index, eth_vol['rolling_vol'] * 100, 
                color='blue', linewidth=2, alpha=0.7, label='ETH Rolling Vol (30d)')
axes[1, 0].set_title('Rolling Volatility (30-day window)', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Annualized Volatility (%)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Volatility comparison bar chart
vol_comparison = pd.DataFrame({
    'Bitcoin': [btc_vol['annual_vol'] * 100],
    'Ethereum': [eth_vol['annual_vol'] * 100],
    'BNB': [bnb_vol['annual_vol'] * 100],
    'S&P 500': [17.5],  # Typical value
    'Gold': [16.5]  # Typical value
})

vol_comparison.T.plot(kind='bar', ax=axes[1, 1], legend=False, color=['orange', 'blue', 'gold', 'green', 'brown'])
axes[1, 1].set_title('Annualized Volatility Comparison', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Volatility (%)')
axes[1, 1].set_xlabel('Asset')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha='right')
axes[1, 1].axhline(y=20, color='red', linestyle='--', alpha=0.5, label='20% threshold')
axes[1, 1].grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Statistical tests for normality
print("\n" + "="*70)
print("NORMALITY TESTS")
print("="*70)

for name, returns in [('Bitcoin', btc_returns), ('Ethereum', eth_returns)]:
    # Jarque-Bera test
    jb_stat, jb_pval = stats.jarque_bera(returns.dropna())
    
    # Shapiro-Wilk test (sample if too large)
    sample_returns = returns.dropna().sample(min(5000, len(returns)))
    sw_stat, sw_pval = stats.shapiro(sample_returns)
    
    print(f"\n{name}:")
    print(f"  Jarque-Bera test: statistic={jb_stat:.2f}, p-value={jb_pval:.4f}")
    print(f"    {'Reject normality' if jb_pval < 0.05 else 'Cannot reject normality'} (Œ±=0.05)")
    print(f"  Shapiro-Wilk test: statistic={sw_stat:.4f}, p-value={sw_pval:.4f}")
    print(f"    {'Reject normality' if sw_pval < 0.05 else 'Cannot reject normality'} (Œ±=0.05)")

print("\nüí° Returns exhibit fat tails and deviate significantly from normal distribution!")

### Correlation Analysis

In [None]:
# Combine returns into single DataFrame
returns_df = pd.DataFrame({
    'BTC': btc_returns,
    'ETH': eth_returns,
    'BNB': bnb_returns
}).dropna()

# Calculate correlation matrix
corr_matrix = returns_df.corr()

# Visualize correlations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Heatmap
sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='RdYlGn', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=axes[0])
axes[0].set_title('Cryptocurrency Correlation Matrix', fontsize=13, fontweight='bold')

# Scatter plot: BTC vs ETH
axes[1].scatter(returns_df['BTC'] * 100, returns_df['ETH'] * 100, alpha=0.5, s=20)
axes[1].set_xlabel('Bitcoin Daily Return (%)')
axes[1].set_ylabel('Ethereum Daily Return (%)')
axes[1].set_title(f'BTC-ETH Correlation: {corr_matrix.loc["BTC", "ETH"]:.3f}', 
                  fontsize=13, fontweight='bold')
axes[1].axhline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].axvline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].grid(alpha=0.3)

# Add regression line
z = np.polyfit(returns_df['BTC'], returns_df['ETH'], 1)
p = np.poly1d(z)
axes[1].plot(returns_df['BTC'] * 100, p(returns_df['BTC']) * 100, 
             "r--", alpha=0.8, linewidth=2, label=f'Regression line')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("CORRELATION ANALYSIS")
print("="*70)
print("\nWithin-Crypto Correlations:")
print(corr_matrix)
print("\nüí° High correlations (0.5-0.8) limit diversification within cryptocurrency portfolios")

# Rolling correlation
rolling_corr_btc_eth = returns_df['BTC'].rolling(90).corr(returns_df['ETH'])

plt.figure(figsize=(12, 5))
plt.plot(rolling_corr_btc_eth.index, rolling_corr_btc_eth, linewidth=2, color='purple')
plt.axhline(y=rolling_corr_btc_eth.mean(), color='red', linestyle='--', 
            label=f'Mean: {rolling_corr_btc_eth.mean():.3f}')
plt.title('Rolling Correlation: Bitcoin vs Ethereum (90-day window)', fontsize=13, fontweight='bold')
plt.ylabel('Correlation')
plt.xlabel('Date')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nRolling BTC-ETH correlation: Mean={rolling_corr_btc_eth.mean():.3f}, "
      f"Std={rolling_corr_btc_eth.std():.3f}")
print("Note: Correlation increases during volatile periods (contagion effect)")

### Reflection Questions (Exercise 2)

Write 250-300 words addressing:

1.  **Volatility Implications**: Bitcoin‚Äôs 60-80% annualized volatility
    is 3-4x higher than equities. What does this mean for: (a) using
    Bitcoin as currency (purchasing power stability)? (b) portfolio
    allocation (risk contribution)? (c) options pricing and risk
    management?

2.  **Fat Tails and Risk Models**: The Q-Q plot shows Bitcoin returns
    deviate from normality with fat tails. Why do standard risk models
    (VaR assuming normal distribution) underestimate tail risk? What
    practical consequences does this have?

3.  **Correlation Patterns**: Cryptocurrencies show high correlation
    with each other (0.5-0.8) but time-varying correlation with
    equities. What does this mean for diversification benefits within
    crypto portfolios versus across asset classes?

## Exercise 3: Market Efficiency Testing

### Autocorrelation Analysis

In [None]:
# Test for autocorrelation (do past returns predict future returns?)
def test_autocorrelation(returns, name, max_lag=20):
    """
    Test for serial correlation in returns (market efficiency diagnostic).
    
    Autocorrelation measures whether past returns predict future returns. If
    significant autocorrelation exists, markets are inefficient (predictable).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns series
    name : str
        Asset name for display
    max_lag : int, default=20
        Maximum lag to test (20 days = ~1 month)
        
    Returns
    -------
    None
        Prints test results and displays ACF plot
        
    Notes
    -----
    **Ljung-Box Test:**
    - Null hypothesis: No autocorrelation up to lag k
    - p-value < 0.05 ‚Üí Reject H0 ‚Üí Significant autocorrelation (inefficiency)
    
    **Efficient Market Hypothesis (weak form):**
    - If markets are efficient, past prices shouldn't predict future prices
    - ACF should be ~0 at all lags (within 95% confidence bands)
    - Crypto often shows significant autocorrelation (inefficient)
    
    **Why Crypto Markets Are Inefficient:**
    - Fragmented liquidity across hundreds of exchanges
    - High transaction costs (gas fees, spreads)
    - Retail-dominated (fewer arbitrageurs)
    - 24/7 trading ‚Üí slower price discovery
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> test_autocorrelation(btc_returns, 'Bitcoin', max_lag=20)
    Bitcoin Autocorrelation Analysis:
      Lag-1 Autocorrelation: 0.0234
      Ljung-Box p-value (lag 10): 0.0012  # Reject H0 ‚Üí Inefficient!
    """
    
    # Calculate autocorrelation function
    acf_values = acf(returns.dropna(), nlags=max_lag, fft=False)
    
    # Ljung-Box test for joint significance
    from statsmodels.stats.diagnostic import acorr_ljungbox
    lb_test = acorr_ljungbox(returns.dropna(), lags=[5, 10, 20], return_df=True)
    
    print(f"\n{name} Autocorrelation Analysis:")
    print(f"  Lag-1 Autocorrelation: {acf_values[1]:.4f}")
    print(f"  Lag-5 Autocorrelation: {acf_values[5]:.4f}")
    print("\nLjung-Box Test (joint significance):")
    print(lb_test)
    
    # Plot ACF
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.stem(range(len(acf_values)), acf_values, basefmt=" ")
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.axhline(y=1.96/np.sqrt(len(returns)), color='red', linestyle='--', label='95% CI')
    ax.axhline(y=-1.96/np.sqrt(len(returns)), color='red', linestyle='--')
    ax.set_title(f'{name} Autocorrelation Function', fontsize=13, fontweight='bold')
    ax.set_xlabel('Lag (days)')
    ax.set_ylabel('Autocorrelation')
    ax.legend()
    ax.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    return acf_values

print("="*70)
print("MARKET EFFICIENCY: AUTOCORRELATION TESTS")
print("="*70)

btc_acf = test_autocorrelation(btc_returns, "Bitcoin")
eth_acf = test_autocorrelation(eth_returns, "Ethereum")

print("\nüí° Interpretation: Significant autocorrelation suggests predictability (market inefficiency)")
print("   Small correlations may not be economically significant after transaction costs")

### Momentum Strategy Backtest

In [None]:
# Simple momentum strategy: buy if price > 50-day MA, sell otherwise
def momentum_strategy(prices, short_window=10, long_window=50):
    """
    Backtest simple moving average crossover momentum strategy.
    
    Classic technical analysis strategy: buy when short MA crosses above long MA,
    sell when it crosses below. Tests whether momentum exists in crypto markets.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    short_window : int, default=10
        Short moving average window (days)
    long_window : int, default=50
        Long moving average window (days)
        
    Returns
    -------
    pd.DataFrame
        Columns:
        - 'price' : original prices
        - 'MA_short' : short-window moving average
        - 'MA_long' : long-window moving average
        - 'signal' : trading position (+1 = long, -1 = short, 0 = no position)
        - 'returns' : buy-and-hold returns
        - 'strategy_returns' : strategy returns (position √ó market return)
        - 'cum_returns' : cumulative buy-and-hold
        - 'cum_strategy' : cumulative strategy performance
        
    Notes
    -----
    **Strategy Logic:**
    - Golden Cross: Short MA > Long MA ‚Üí Buy signal
    - Death Cross: Short MA < Long MA ‚Üí Sell signal
    
    **Reality Check:**
    - This is a **naive backtest** (ignores transaction costs, slippage, fees)
    - Crypto trading fees ~0.1-0.5% per trade ‚Üí eats into profits
    - No position sizing, risk management, or stop-losses
    - Past performance ‚â† future returns (overfitting risk)
    
    **Academic Evidence:**
    - Momentum works in equities (Jegadeesh & Titman 1993, JF)
    - Crypto momentum: mixed evidence, high volatility dominates
    - Transaction costs often exceed strategy alpha
    
    Examples
    --------
    >>> btc_momentum = momentum_strategy(btc_data['price'])
    >>> strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
    >>> print(f"Strategy return: {strategy_return:.1f}%")
    Strategy return: -5.2%  # Often underperforms buy-and-hold after costs
    """
    df = pd.DataFrame({'price': prices})
    
    # Calculate moving averages
    df['MA_short'] = df['price'].rolling(short_window).mean()
    df['MA_long'] = df['price'].rolling(long_window).mean()
    
    # Generate signals
    df['signal'] = 0
    df.loc[df['MA_short'] > df['MA_long'], 'signal'] = 1  # Buy signal
    df.loc[df['MA_short'] < df['MA_long'], 'signal'] = -1  # Sell signal
    
    # Calculate returns
    df['returns'] = df['price'].pct_change()
    df['strategy_returns'] = df['signal'].shift(1) * df['returns']
    
    # Cumulative returns
    df['cum_returns'] = (1 + df['returns']).cumprod()
    df['cum_strategy'] = (1 + df['strategy_returns']).cumprod()
    
    return df

# Run momentum strategy
btc_momentum = momentum_strategy(btc_data['price'])

# Calculate performance metrics
total_return = (btc_momentum['cum_returns'].iloc[-1] - 1) * 100
strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
excess_return = strategy_return - total_return

print("\n" + "="*70)
print("MOMENTUM STRATEGY BACKTEST")
print("="*70)
print(f"\nBuy-and-Hold Return: {total_return:.2f}%")
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Excess Return: {excess_return:+.2f}%")

# Visualize strategy performance
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Price and moving averages
axes[0].plot(btc_momentum.index, btc_momentum['price'], label='Bitcoin Price', color='orange', linewidth=2)
axes[0].plot(btc_momentum.index, btc_momentum['MA_short'], label=f'{10}D MA', color='blue', linewidth=1.5)
axes[0].plot(btc_momentum.index, btc_momentum['MA_long'], label=f'{50}D MA', color='red', linewidth=1.5)
axes[0].set_title('Bitcoin Price with Moving Averages', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Cumulative returns comparison
axes[1].plot(btc_momentum.index, btc_momentum['cum_returns'], 
             label='Buy and Hold', color='gray', linewidth=2)
axes[1].plot(btc_momentum.index, btc_momentum['cum_strategy'], 
             label='Momentum Strategy', color='green', linewidth=2)
axes[1].set_title('Cumulative Returns: Strategy vs Buy-and-Hold', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Cumulative Return (Base = 1)')
axes[1].set_xlabel('Date')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚ö†Ô∏è  Note: This backtest ignores transaction costs, slippage, and taxes")
print("    Real-world implementation would have lower returns")

### Mean Reversion Test

In [None]:
# Augmented Dickey-Fuller test for stationarity/mean reversion
def test_mean_reversion(prices, name):
    """
    Test for mean reversion using Augmented Dickey-Fuller (ADF) test.
    
    Tests whether prices follow a random walk (unit root) or revert to a mean
    (stationary). Critical for pairs trading and mean-reversion strategies.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    name : str
        Asset name for display
        
    Returns
    -------
    tuple
        ADF test results: (statistic, p-value, lags_used, nobs, critical_values, icbest)
        
    Notes
    -----
    **Augmented Dickey-Fuller Test:**
    - Null hypothesis (H0): Series has unit root (random walk, NOT mean-reverting)
    - Alternative (H1): Series is stationary (mean-reverting)
    - p-value < 0.05 ‚Üí Reject H0 ‚Üí Prices are stationary (mean-reverting)
    - p-value > 0.05 ‚Üí Cannot reject H0 ‚Üí Random walk (efficient market)
    
    **Implications for Trading:**
    - **Random walk** (efficient): Momentum strategies may work, mean-reversion won't
    - **Mean-reverting** (inefficient): Pairs trading, statistical arbitrage possible
    
    **Why This Matters:**
    - Most financial time series have unit roots (Campbell, Lo, MacKinlay 1997)
    - Crypto markets often show mixed evidence (regime-dependent)
    - Low p-values may be spurious (structural breaks, volatility clustering)
    
    **Technical Details:**
    - Test performed on log prices (handles exponential growth)
    - Regression includes constant term ('c') but not trend
    - AIC criterion selects optimal lag length
    
    Examples
    --------
    >>> btc_adf = test_mean_reversion(btc_data['price'], 'Bitcoin')
    Bitcoin - Augmented Dickey-Fuller Test:
      ADF Statistic: -1.234
      P-value: 0.658  # Cannot reject unit root ‚Üí Random walk
    """
    
    # ADF test on log prices
    log_prices = np.log(prices)
    adf_result = adfuller(log_prices.dropna(), maxlag=20, regression='c', autolag='AIC')
    
    print(f"\n{name} - Augmented Dickey-Fuller Test:")
    print(f"  ADF Statistic: {adf_result[0]:.4f}")
    print(f"  P-value: {adf_result[1]:.4f}")
    print(f"  Critical Values:")
    for key, value in adf_result[4].items():
        print(f"    {key}: {value:.4f}")
    
    if adf_result[1] < 0.05:
        print(f"  ‚úì Reject unit root (prices are stationary/mean-reverting)")
    else:
        print(f"  ‚úó Cannot reject unit root (prices have unit root/random walk)")
    
    return adf_result

print("\n" + "="*70)
print("MEAN REVERSION TESTS")
print("="*70)

btc_adf = test_mean_reversion(btc_data['price'], "Bitcoin")
eth_adf = test_mean_reversion(eth_data['price'], "Ethereum")

print("\nüí° If prices follow random walk, past prices don't predict future prices")
print("   This supports weak-form market efficiency")

### Reflection Questions (Exercise 3)

Write 200-250 words addressing:

1.  **Efficiency Interpretation**: What do your autocorrelation and
    momentum results suggest about Bitcoin market efficiency? Can small
    autocorrelations or strategy profits coexist with efficient markets?

2.  **Transaction Costs Matter**: The momentum strategy showed
    \[profit/loss\] before transaction costs. Cryptocurrency trading
    costs 0.1-0.5% per trade. Would your strategy be profitable after
    accounting for costs? Show rough calculations.

3.  **Limits to Arbitrage**: Even if inefficiencies exist (predictable
    patterns), what practical barriers prevent traders from exploiting
    them and eliminating the patterns?

------------------------------------------------------------------------

## Exercise 4: GARCH Volatility Modeling & Structural Breaks

**Learning Objectives:** - Apply GARCH models to cryptocurrency returns
(Week 3, ¬ß3.4 theory) - Test for volatility clustering and asymmetric
effects - Evaluate volatility forecasting accuracy (Mincer-Zarnowitz
regression) - Detect structural breaks and regime shifts in volatility

> **Connection to [Ch 03: Volatility
> Modelling](../chapters/03_volatility_modelling.qmd) & [Ch 07:
> Cryptocurrency](../chapters/07_cryptocurrency_digital_currency.qmd#sec-garch)**
>
> This exercise applies **Week 3 GARCH theory** to Bitcoin data. You‚Äôll
> estimate time-varying volatility, test for asymmetric effects
> (leverage), forecast volatility, and test whether high GARCH
> persistence is real or an artifact of regime shifts.
>
> **Key statistical concepts**: Volatility clustering, fat tails,
> leverage effect, out-of-sample validation, structural breaks.

### Part A: Statistical Tests for Volatility Properties

**Before fitting GARCH, test for its key assumptions:**

In [None]:
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import jarque_bera

# Bitcoin returns (from Exercise 2)
returns = btc_data['return'].dropna()

# Test 1: Ljung-Box test on squared returns (volatility clustering)
lb_test = acorr_ljungbox(returns**2, lags=[10], return_df=True)
lb_stat = lb_test['lb_stat'].values[0]
lb_pval = lb_test['lb_pvalue'].values[0]

# Test 2: Jarque-Bera test (normality)
jb_stat, jb_pval = jarque_bera(returns)

print("=" * 70)
print("STATISTICAL TESTS FOR GARCH ASSUMPTIONS")
print("=" * 70)

print("\n1. Ljung-Box Test (Volatility Clustering)")
print(f"   H‚ÇÄ: No autocorrelation in squared returns")
print(f"   Statistic: {lb_stat:.2f} | p-value: {lb_pval:.4f}")
if lb_pval < 0.05:
    print(f"   ‚úì REJECT H‚ÇÄ ‚Üí Significant volatility clustering (GARCH warranted)")
else:
    print(f"   ‚úó Cannot reject H‚ÇÄ ‚Üí No volatility clustering")

print("\n2. Jarque-Bera Test (Normality)")
print(f"   H‚ÇÄ: Returns are normally distributed")
print(f"   Statistic: {jb_stat:.2f} | p-value: {jb_pval:.6f}")
if jb_pval < 0.05:
    print(f"   ‚úì REJECT H‚ÇÄ ‚Üí Non-normal distribution (fat tails present)")
else:
    print(f"   ‚úó Cannot reject H‚ÇÄ ‚Üí Normal distribution")

print("\nüí° Both tests should reject H‚ÇÄ for cryptocurrency data")
print("   This justifies using GARCH with Student's t distribution")

**Interpretation**: Bitcoin should show p \< 0.001 for both tests‚Äîstrong
volatility clustering and fat tails.

### Part B: GARCH(1,1) and GJR-GARCH Estimation

**Fit symmetric GARCH(1,1) and asymmetric GJR-GARCH:**

In [None]:
from arch import arch_model

# Convert returns to percentage for numerical stability
returns_pct = returns * 100

# Model 1: GARCH(1,1) with Student's t distribution (fat tails)
model_garch = arch_model(returns_pct, vol='GARCH', p=1, q=1, dist='t')
garch_fit = model_garch.fit(disp='off')

# Model 2: GJR-GARCH (asymmetric, captures leverage effect)
model_gjr = arch_model(returns_pct, vol='GARCH', p=1, o=1, q=1, dist='t')
gjr_fit = model_gjr.fit(disp='off')

# Extract parameters
print("\n" + "=" * 70)
print("GARCH MODEL ESTIMATION RESULTS")
print("=" * 70)

print("\n1. GARCH(1,1) with Student's t:")
print(f"   œâ (baseline):     {garch_fit.params['omega']:>8.4f}")
print(f"   Œ± (news impact):  {garch_fit.params['alpha[1]']:>8.4f}")
print(f"   Œ≤ (persistence):  {garch_fit.params['beta[1]']:>8.4f}")
print(f"   Œ± + Œ≤:            {garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']:>8.4f}")
print(f"   df (tail):        {garch_fit.params['nu']:>8.2f} (normal = ‚àû)")
print(f"   AIC:              {garch_fit.aic:>8.2f}")

print("\n2. GJR-GARCH (asymmetric):")
print(f"   œâ (baseline):     {gjr_fit.params['omega']:>8.4f}")
print(f"   Œ± (positive):     {gjr_fit.params['alpha[1]']:>8.4f}")
print(f"   Œ≥ (asymmetry):    {gjr_fit.params['gamma[1]']:>8.4f}")
print(f"   Œ≤ (persistence):  {gjr_fit.params['beta[1]']:>8.4f}")
print(f"   Œ± + Œ≥ (negative): {gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']:>8.4f}")
print(f"   df (tail):        {gjr_fit.params['nu']:>8.2f}")
print(f"   AIC:              {gjr_fit.aic:>8.2f}")

# Model comparison
if gjr_fit.aic < garch_fit.aic:
    improvement = garch_fit.aic - gjr_fit.aic
    print(f"\n‚úì GJR-GARCH preferred (AIC lower by {improvement:.2f})")
    print(f"   Negative shocks increase volatility {(gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']) / gjr_fit.params['alpha[1]']:.2f}√ó more")
else:
    print(f"\n  GARCH(1,1) preferred (symmetric effects)")

# Visualize conditional volatility
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Panel 1: Returns with ¬±1œÉ GARCH bands
conditional_vol_garch = garch_fit.conditional_volatility
axes[0].plot(returns_pct.index, returns_pct, linewidth=0.5, alpha=0.7, label='Returns', color='blue')
axes[0].fill_between(returns_pct.index, -conditional_vol_garch, conditional_vol_garch,
                      alpha=0.2, color='red', label='¬±1œÉ (GARCH volatility)')
axes[0].set_ylabel('Return (%)', fontsize=11)
axes[0].set_title('Bitcoin Returns with GARCH(1,1) Conditional Volatility', fontsize=13)
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)

# Panel 2: GARCH vs GJR volatility over time
conditional_vol_gjr = gjr_fit.conditional_volatility
axes[1].plot(conditional_vol_garch.index, conditional_vol_garch, linewidth=1.5, 
             color='blue', label='GARCH(1,1)', alpha=0.7)
axes[1].plot(conditional_vol_gjr.index, conditional_vol_gjr, linewidth=1.5,
             color='red', label='GJR-GARCH', alpha=0.7)
axes[1].set_xlabel('Date', fontsize=11)
axes[1].set_ylabel('Conditional Volatility (%)', fontsize=11)
axes[1].set_title('Time-Varying Volatility: GARCH vs GJR-GARCH', fontsize=13)
axes[1].legend(fontsize=10)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° GARCH captures volatility spikes (2018 crash, 2020 COVID, 2021 bull run)")
print("   GJR-GARCH shows asymmetry‚Äîbad news increases volatility more")

> **Interpreting GARCH Parameters**
>
> **Persistence (Œ± + Œ≤)**: - **~0.95-0.99**: High persistence‚Äîvolatility
> shocks decay slowly (typical for crypto) - **Half-life** = ln(0.5) /
> ln(Œ± + Œ≤). If Œ±+Œ≤=0.98, half-life ‚âà 35 days
>
> **Asymmetry (Œ≥ in GJR)**: - **Œ≥ \> 0**: Negative shocks (bad news)
> increase volatility more than positive shocks - **Leverage effect**:
> -5% drop increases vol more than +5% rally (typical for all assets)
>
> **Degrees of freedom (df)**: - **df \< 10**: Very fat tails (extreme
> events common) - **df ‚Üí ‚àû**: Normal distribution (no fat tails)

### Part C: News Impact Curves (Asymmetry Visualization)

**Visualize how shocks of different sizes/signs affect volatility:**

In [None]:
# Generate news impact curves
shocks = np.linspace(-10, 10, 200)  # -10% to +10% returns

# GARCH(1,1) impact (symmetric)
alpha_garch = garch_fit.params['alpha[1]']
impact_garch = alpha_garch * shocks**2

# GJR-GARCH impact (asymmetric)
alpha_gjr = gjr_fit.params['alpha[1]']
gamma_gjr = gjr_fit.params['gamma[1]']
impact_gjr = alpha_gjr * shocks**2 + gamma_gjr * (shocks < 0) * shocks**2

# Plot
plt.figure(figsize=(12, 7))
plt.plot(shocks, impact_garch, linewidth=2.5, linestyle='--', color='blue', label='GARCH (symmetric)', alpha=0.8)
plt.plot(shocks, impact_gjr, linewidth=2.5, color='red', label='GJR-GARCH (asymmetric)')

# Highlight key points
plt.axvline(0, color='black', linestyle=':', linewidth=1.5, alpha=0.5)
plt.axvline(-5, color='red', linestyle=':', linewidth=1, alpha=0.5, label='Example: -5% shock')
plt.axvline(+5, color='green', linestyle=':', linewidth=1, alpha=0.5, label='Example: +5% shock')

# Annotate asymmetry
neg5_impact = alpha_gjr * 25 + gamma_gjr * 25
pos5_impact = alpha_gjr * 25
plt.scatter([-5, 5], [neg5_impact, pos5_impact], s=150, c=['red', 'green'], 
            edgecolors='black', zorder=5, alpha=0.8)
plt.text(-5, neg5_impact + 0.5, f'Impact: {neg5_impact:.2f}', ha='center', fontsize=10, fontweight='bold')
plt.text(5, pos5_impact + 0.5, f'Impact: {pos5_impact:.2f}', ha='center', fontsize=10, fontweight='bold')

plt.xlabel('News Shock (% return)', fontsize=12)
plt.ylabel('Impact on Conditional Variance', fontsize=12)
plt.title('News Impact Curves: How Shocks Affect Volatility', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate asymmetry ratio
asymmetry_ratio = neg5_impact / pos5_impact
print(f"\nAsymmetry Ratio:")
print(f"  -5% shock impact / +5% shock impact = {asymmetry_ratio:.2f}")
print(f"  ‚Üí Bad news increases volatility {asymmetry_ratio:.1f}√ó more than good news")

**Interpretation**: For Bitcoin, negative shocks typically increase
volatility ~1.5-2√ó more than positive shocks.

### Part D: Volatility Forecasting & Out-of-Sample Validation

**Test if GARCH forecasts future volatility accurately (Mincer-Zarnowitz
regression):**

In [None]:
from scipy.stats import linregress

# Rolling-window forecast
forecast_horizon = 22  # days (1 month ahead)
train_size = 252 * 2   # 2 years training window

forecasts_garch = []
realized_vols = []

print("\n" + "=" * 70)
print("ROLLING-WINDOW VOLATILITY FORECASTING")
print("=" * 70)
print(f"Training window: {train_size} days | Forecast horizon: {forecast_horizon} days")

for start in range(train_size, len(returns_pct) - forecast_horizon, forecast_horizon):
    # Train GARCH on historical data
    train_data = returns_pct.iloc[start - train_size:start]
    model_train = arch_model(train_data, vol='GARCH', p=1, q=1, dist='t')
    fit_train = model_train.fit(disp='off')
    
    # Forecast next month's volatility
    forecast = fit_train.forecast(horizon=forecast_horizon)
    forecast_vol = np.sqrt(forecast.variance.values[-1, :].mean())  # Average over horizon
    
    # Realized volatility (actual)
    test_data = returns_pct.iloc[start:start + forecast_horizon]
    realized_vol = test_data.std()
    
    forecasts_garch.append(forecast_vol)
    realized_vols.append(realized_vol)

forecasts_garch = np.array(forecasts_garch)
realized_vols = np.array(realized_vols)

# Mincer-Zarnowitz regression: Realized = Œ± + Œ≤ √ó Forecast + Œµ
slope, intercept, r_value, p_value, std_err = linregress(forecasts_garch, realized_vols)

# Calculate RMSE
rmse = np.sqrt(((realized_vols - forecasts_garch)**2).mean())

print(f"\nNumber of forecasts: {len(forecasts_garch)}")
print(f"\nMincer-Zarnowitz Regression Results:")
print(f"  Intercept (Œ±):  {intercept:>8.3f} (ideal = 0)")
print(f"  Slope (Œ≤):      {slope:>8.3f} (ideal = 1)")
print(f"  R¬≤:             {r_value**2:>8.3f}")
print(f"  RMSE:           {rmse:>8.2f}%")

if abs(intercept) < 1 and abs(slope - 1) < 0.2:
    print(f"\n‚úì Forecast is approximately unbiased (Œ±‚âà0, Œ≤‚âà1)")
else:
    print(f"\n‚ö†Ô∏è  Forecast shows bias (Œ±‚â†0 or Œ≤‚â†1)")

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Panel 1: Mincer-Zarnowitz scatter
ax1.scatter(forecasts_garch, realized_vols, alpha=0.6, s=80, edgecolor='black', linewidth=0.5)
ax1.plot([forecasts_garch.min(), forecasts_garch.max()],
         [forecasts_garch.min(), forecasts_garch.max()],
         'r--', linewidth=2.5, label='Perfect forecast (45¬∞ line)')
ax1.plot(forecasts_garch, intercept + slope * forecasts_garch,
         'b-', linewidth=2.5, label=f'Fitted: y={intercept:.2f}+{slope:.2f}x (R¬≤={r_value**2:.2f})')
ax1.set_xlabel('GARCH Forecast Volatility (%)', fontsize=12)
ax1.set_ylabel('Realized Volatility (%)', fontsize=12)
ax1.set_title('Mincer-Zarnowitz: Forecast Accuracy', fontsize=13)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3)

# Panel 2: Forecast errors over time
errors = realized_vols - forecasts_garch
ax2.plot(errors, linewidth=1.5, color='red', alpha=0.7)
ax2.axhline(0, color='black', linestyle='--', linewidth=1.5)
ax2.fill_between(range(len(errors)), 0, errors, alpha=0.3, color='red')
ax2.set_xlabel('Forecast Period', fontsize=12)
ax2.set_ylabel('Forecast Error (Realized - Forecast, %)', fontsize=12)
ax2.set_title('Forecast Errors Over Time', fontsize=13)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° GARCH forecasts Bitcoin volatility reasonably (R¬≤~0.5-0.7)")
print("   But underestimates during extreme events (crashes, manias)")

> **Connection to [Week 0, ¬ß0.6: Out-of-Sample
> Validation](../chapters/00_foundations.qmd#sec-model-selection)**
>
> **Mincer-Zarnowitz regression** tests forecast unbiasedness: - **Œ± =
> 0**: No systematic over/under-prediction - **Œ≤ = 1**: Forecast
> correctly captures volatility scale - **High R¬≤**: Forecast explains
> realized volatility well
>
> This is **honest evaluation**‚Äîtrain on past, test on future (no
> look-ahead bias).

### Part E: Structural Breaks Detection

**Test if high GARCH persistence (Œ±+Œ≤ ‚âà 0.98) is real or artifact of
regime shifts:**

In [None]:
# Step 1: Visual regime identification (rolling volatility)
rolling_vol = returns_pct.rolling(window=30).std() * np.sqrt(252)

plt.figure(figsize=(14, 7))
plt.plot(rolling_vol.index, rolling_vol, linewidth=1.5, color='blue', label='30-day rolling volatility')

# Regime thresholds
calm_threshold = 30
turbulent_threshold = 60
plt.axhline(calm_threshold, color='green', linestyle='--', linewidth=2, alpha=0.7, label=f'Calm threshold ({calm_threshold}%)')
plt.axhline(turbulent_threshold, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Turbulent threshold ({turbulent_threshold}%)')
plt.fill_between(rolling_vol.index, 0, calm_threshold, alpha=0.1, color='green', label='Calm regime')
plt.fill_between(rolling_vol.index, turbulent_threshold, rolling_vol.max(), alpha=0.1, color='red', label='Turbulent regime')

plt.xlabel('Date', fontsize=12)
plt.ylabel('Annualized Volatility (%)', fontsize=12)
plt.title('Bitcoin Rolling Volatility: Regime Identification', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate regime statistics
calm_pct = (rolling_vol < calm_threshold).sum() / len(rolling_vol.dropna()) * 100
turbulent_pct = (rolling_vol > turbulent_threshold).sum() / len(rolling_vol.dropna()) * 100

print("\n" + "=" * 70)
print("REGIME IDENTIFICATION")
print("=" * 70)
print(f"\nRegime Statistics:")
print(f"  Calm regime (<{calm_threshold}% vol):      {calm_pct:>6.1f}% of days")
print(f"  Normal regime ({calm_threshold}-{turbulent_threshold}% vol):  {100 - calm_pct - turbulent_pct:>6.1f}% of days")
print(f"  Turbulent regime (>{turbulent_threshold}% vol): {turbulent_pct:>6.1f}% of days")

# Step 2: Sub-period GARCH comparison
mid_point = len(returns_pct) // 2
returns_first = returns_pct.iloc[:mid_point]
returns_second = returns_pct.iloc[mid_point:]

# Fit GARCH to each sub-period
model_first = arch_model(returns_first, vol='GARCH', p=1, q=1, dist='t')
garch_first = model_first.fit(disp='off')
persistence_first = garch_first.params['alpha[1]'] + garch_first.params['beta[1]']

model_second = arch_model(returns_second, vol='GARCH', p=1, q=1, dist='t')
garch_second = model_second.fit(disp='off')
persistence_second = garch_second.params['alpha[1]'] + garch_second.params['beta[1]']

# Full-sample persistence (from earlier)
persistence_full = garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']

print("\n" + "=" * 70)
print("STRUCTURAL BREAKS TEST: SUB-PERIOD GARCH PERSISTENCE")
print("=" * 70)
print(f"\nFull sample persistence:   Œ±+Œ≤ = {persistence_full:.4f}")
print(f"First half persistence:    Œ±+Œ≤ = {persistence_first:.4f}")
print(f"Second half persistence:   Œ±+Œ≤ = {persistence_second:.4f}")
print(f"Absolute difference:       Œî   = {abs(persistence_first - persistence_second):.4f}")

if abs(persistence_first - persistence_second) > 0.05:
    print(f"\n‚ö†Ô∏è  LARGE difference suggests REGIME SHIFTS, not true persistence!")
    print(f"    Full-sample GARCH overestimates persistence by confusing regime changes with gradual decay.")
else:
    print(f"\n‚úì  Similar persistence suggests GARCH model is stable across periods.")

# Model comparison (AIC)
print(f"\nModel Comparison (AIC, lower = better):")
print(f"  Full-sample GARCH:         {garch_fit.aic:.2f}")
print(f"  Sub-periods total:         {garch_first.aic + garch_second.aic:.2f}")

if (garch_first.aic + garch_second.aic) < garch_fit.aic:
    improvement = garch_fit.aic - (garch_first.aic + garch_second.aic)
    print(f"\n‚úì Sub-period models fit BETTER (AIC improvement: {improvement:.2f})")
    print(f"  ‚Üí Evidence of structural breaks / regime shifts")
else:
    print(f"\n  Full-sample model fits as well (no strong evidence of breaks)")

print("\nüí° Key finding: If persistence differs across sub-periods,")
print("   full-sample GARCH is OVERESTIMATING true persistence!")

> **Implication: GARCH Persistence Partly Spurious**
>
> If sub-period persistence is significantly lower than full-sample
> (e.g., 0.93 vs 0.98), this suggests:
>
> 1.  **Regime shifts exist**: Bitcoin alternates between calm and
>     turbulent volatility states
> 2.  **Full-sample GARCH confuses regimes with persistence**: Mistakes
>     regime changes for gradual decay
> 3.  **Better models needed**: Markov-switching GARCH, regime-dependent
>     models, threshold models
>
> **Practical impact**: Risk models using single-regime GARCH
> **overestimate** how long volatility shocks persist ‚Üí wrong hedging
> ratios, wrong VaR forecasts.
>
> See [Ch 03, ¬ß3.5: Structural
> Breaks](../chapters/03_volatility_modelling.qmd#sec-structural-breaks)

### Reflection Questions (Exercise 4)

Write 250-300 words addressing:

1.  **GARCH vs GJR**: Did asymmetric GJR-GARCH fit better than symmetric
    GARCH? What does the asymmetry parameter (Œ≥) tell you about
    Bitcoin‚Äôs volatility response to good vs bad news?

2.  **Forecast accuracy**: How well did GARCH forecast future volatility
    (R¬≤ from Mincer-Zarnowitz)? When did forecasts fail most badly (look
    at error plot)?

3.  **Structural breaks**: Did sub-period persistence differ from
    full-sample? If yes, what does this imply about using full-sample
    GARCH for risk management?

4.  **Practical implications**: If you were designing a crypto risk
    model, would you use single-regime GARCH or a regime-switching
    model? Justify your choice.

------------------------------------------------------------------------

## Summary and Integration

### What We‚Äôve Learned

Through these exercises, you‚Äôve:

1.  **Accessed real cryptocurrency market data** using public APIs,
    experiencing data quality challenges and fragmentation

2.  **Quantified extreme volatility** (60-80% annualized) that makes
    cryptocurrency unsuitable as currency and challenging as investment

3.  **Documented fat tail distributions** that violate normal
    distribution assumptions and cause standard risk models to
    underestimate tail risk

4.  **Measured high correlations** within crypto (0.5-0.8) limiting
    diversification benefits

5.  **Tested market efficiency** finding mixed evidence‚Äîsome weak
    predictability but likely not exploitable after costs

6.  **Evaluated inclusion claims** implicitly through data analysis‚Äîif
    crypto were banking the unbanked, we‚Äôd see different adoption and
    usage patterns

### Connections to Course Themes

-   **Week 2 (APIs)**: Cryptocurrency data is openly accessible via
    APIs, democratizing financial data but creating standardization
    challenges

-   **Week 3 (Platforms)**: Exchanges are platforms matching
    buyers/sellers; fragmentation creates arbitrage opportunities but
    liquidity challenges

-   **Week 6 (Financial Inclusion)**: Mobile money (M-Pesa) showed
    rigorous welfare evidence; cryptocurrency shows speculative usage
    among wealthy

-   **Week 8 (Blockchain)**: Next week explores blockchain technology
    and fraud detection more deeply

### Critical Evaluation Framework

When evaluating cryptocurrency or any FinTech innovation:

1.  **Examine actual data** (adoption, usage, outcomes) versus marketing
    claims
2.  **Measure risks quantitatively** (volatility, correlations, tail
    risk)
3.  **Compare to alternatives** (mobile money, traditional finance)
4.  **Demand welfare evidence** (does it help intended beneficiaries?)
5.  **Account for barriers** (technical, knowledge, economic)

### Assessment Preparation

If your assessment involves a short research report or reflective
analysis, this lab gives you two strong pathways:

-   Empirical analysis of crypto returns (momentum, volatility,
    correlations, tail risk)
-   Evidence-based evaluation of ‚Äúcrypto for inclusion‚Äù claims using
    data, mechanisms, and limitations

### Further Exploration

If interested in extending your analysis:

-   **Cross-asset correlations**: Download S&P 500 or gold data; analyze
    Bitcoin-equity correlation dynamics
-   **Volatility forecasting**: Implement GARCH models to forecast
    future volatility
-   **Arbitrage opportunities**: Compare prices across multiple
    exchanges in real-time
-   **DeFi analysis**: Examine yield farming APYs, liquidity pool
    dynamics, or stablecoin deviations from peg
-   **On-chain metrics**: Analyze blockchain data (active addresses,
    transaction volumes) as predictors

------------------------------------------------------------------------

**Excellent work! You‚Äôve completed rigorous empirical analysis of
cryptocurrency markets, connecting data to theory and claims to
evidence.**