# Lab 7: Cryptocurrency Data Analysis

Market structure, volatility, and efficiency testing

> **Expected Time**
>
> -   FIN510: Exercises 1-2 ≈ 75 min
> -   FIN720: All exercises ≈ 110 min
> -   Directed learning extensions ≈ 60 min

<figure>
<a
href="https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab07_crypto.ipynb"><img
src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
<figcaption>Open in Colab</figcaption>
</figure>

## Before You Code: The Big Picture

Cryptocurrencies promise **financial inclusion, decentralization, and
censorship resistance**. But do they deliver? Let’s test the claims
empirically using market microstructure analysis.

> **The Crypto Promise vs. Reality**
>
> **The Promise:** 1. **Inclusion**: Banking for the 1.7 billion
> unbanked (World Bank) 2. **Efficiency**: Near-zero transaction costs,
> instant settlement 3. **Decentralization**: No intermediaries, no
> gatekeepers 4. **Transparency**: All transactions on public blockchain
>
> **The Reality (Empirical Evidence):** - **Volatility**: Bitcoin std
> dev ~80% annualized (vs. S&P 500 ~15%) - **Correlation**: Bitcoin-S&P
> correlation increased from ~0 (2015) to ~0.5 (2022)—no longer
> diversifying - **Efficiency**: Autocorrelation tests show
> predictability (inefficient markets) - **Inclusion**: 95% of crypto
> holders are speculators, not unbanked users (Makarov & Schoar 2022,
> JF) - **Costs**: During congestion, Ethereum gas fees reached \$50+
> per transaction
>
> **The Academic Debate:** - **Skeptics** (Krugman, Roubini):
> Speculative bubble with no fundamental value - **Advocates**
> (Antonopoulos, Buterin): Early technology, wait for adoption -
> **Evidence-based** (This lab): Test claims with data, not ideology

### What You’ll Build Today

By the end of this lab, you will have:

-   ✅ Real-time crypto data from public APIs (CoinGecko)
-   ✅ Volatility analysis comparing crypto to traditional assets
-   ✅ Return distribution analysis (fat tails, skewness)
-   ✅ Market efficiency tests (autocorrelation, mean reversion)
-   ✅ Critical perspective on crypto’s actual use cases

**Time estimate:** 75 minutes (FIN510) \| 110 minutes (FIN720 with all
exercises)

> **Why This Matters**
>
> Crypto is either the future of finance or a trillion-dollar
> speculative bubble. Your job as a data scientist: **test the claims
> empirically**, not ideologically. This lab shows you how.

## Learning Objectives

By the end of this lab, you will be able to:

-   Access cryptocurrency market data using public APIs
-   Calculate and compare volatility across crypto and traditional
    assets
-   Analyze return distributions and identify tail risk
-   Measure correlation patterns (within-crypto and cross-asset)
-   Test market efficiency using autocorrelation and arbitrage analysis
-   Visualize price dynamics and microstructure features
-   Evaluate crypto financial inclusion claims empirically

## Setup and Dependencies

In [1]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# For API requests
try:
    import requests
    import json
except ImportError:
    print("Installing requests...")
    !pip install -q requests

# For statistical tests
try:
    import statsmodels.api as sm
    from statsmodels.tsa.stattools import adfuller, acf
except ImportError:
    print("Installing statsmodels...")
    !pip install -q statsmodels

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("✓ Setup complete - ready for crypto market analysis")

## Exercise 1: Accessing Cryptocurrency Market Data

### Understanding Crypto Data Sources

Unlike traditional finance where Bloomberg terminals and licensed data
vendors dominate, cryptocurrency data comes from public APIs provided by
exchanges and aggregators. This democratizes access—you can get the same
data professionals use—but also creates challenges around data quality,
fragmentation, and standardization.

**Key data sources:**

-   **Aggregators**: CoinGecko, CoinMarketCap (volume-weighted prices
    across exchanges)
-   **Exchanges**: Coinbase Pro, Binance, Kraken (order books, trade
    data, official prices)
-   **Blockchain explorers**: On-chain data (transaction volumes,
    addresses, mining)
-   **Derivatives**: CME, Deribit (futures, options implied volatility)

We’ll use CoinGecko’s free API, which doesn’t require authentication for
basic usage and provides comprehensive historical data.

### Retrieving Historical Price Data

In [2]:
def get_crypto_data(coin_id, vs_currency='usd', days=365):
    """
    Fetch historical cryptocurrency data from CoinGecko API.
    
    Parameters
    ----------
    coin_id : str
        CoinGecko coin identifier (e.g., 'bitcoin', 'ethereum')
    vs_currency : str
        Target currency (default 'usd')
    days : int
        Number of days of historical data
        
    Returns
    -------
    pd.DataFrame
        DataFrame with price, market_cap, and volume data
    """
    url = f"https://api.coingecko.com/api/v3/coins/{coin_id}/market_chart"
    params = {
        'vs_currency': vs_currency,
        'days': days,
        'interval': 'daily'
    }
    
    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        data = response.json()
        
        # Extract prices, market caps, volumes
        df = pd.DataFrame({
            'price': [x[1] for x in data['prices']],
            'market_cap': [x[1] for x in data['market_caps']],
            'volume': [x[1] for x in data['total_volumes']]
        })
        
        # Convert timestamps to datetime
        df['date'] = pd.to_datetime([x[0] for x in data['prices']], unit='ms')
        df = df.set_index('date')
        
        return df
    
    except requests.exceptions.RequestException as e:
        print(f"Error fetching data for {coin_id}: {e}")
        return None

# Fetch data for major cryptocurrencies
print("Fetching cryptocurrency data...")
btc_data = get_crypto_data('bitcoin', days=730)
eth_data = get_crypto_data('ethereum', days=730)
bnb_data = get_crypto_data('binancecoin', days=730)

if btc_data is not None:
    print(f"✓ Retrieved {len(btc_data)} days of Bitcoin data")
    print(f"  Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}")
    print("\nSample data:")
    print(btc_data.head())

### Visualizing Price Trends

In [3]:
# Create comprehensive price visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin price
axes[0, 0].plot(btc_data.index, btc_data['price'], color='orange', linewidth=2)
axes[0, 0].set_title('Bitcoin Price (USD)', fontsize=13, fontweight='bold')
axes[0, 0].set_ylabel('Price ($)')
axes[0, 0].grid(alpha=0.3)
axes[0, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

# Ethereum price
axes[0, 1].plot(eth_data.index, eth_data['price'], color='blue', linewidth=2)
axes[0, 1].set_title('Ethereum Price (USD)', fontsize=13, fontweight='bold')
axes[0, 1].set_ylabel('Price ($)')
axes[0, 1].grid(alpha=0.3)

# Trading volumes
axes[1, 0].plot(btc_data.index, btc_data['volume'], color='green', alpha=0.7, linewidth=1.5)
axes[1, 0].set_title('Bitcoin Trading Volume', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Volume ($)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].grid(alpha=0.3)
axes[1, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1e9:.1f}B'))

# Price comparison (normalized to 100)
btc_norm = 100 * btc_data['price'] / btc_data['price'].iloc[0]
eth_norm = 100 * eth_data['price'] / eth_data['price'].iloc[0]
bnb_norm = 100 * bnb_data['price'] / bnb_data['price'].iloc[0]

axes[1, 1].plot(btc_norm.index, btc_norm, label='Bitcoin', color='orange', linewidth=2)
axes[1, 1].plot(eth_norm.index, eth_norm, label='Ethereum', color='blue', linewidth=2)
axes[1, 1].plot(bnb_norm.index, bnb_norm, label='BNB', color='gold', linewidth=2)
axes[1, 1].set_title('Comparative Performance (Base = 100)', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Index Value')
axes[1, 1].set_xlabel('Date')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS (2-year period)")
print("="*60)
for name, data in [('Bitcoin', btc_data), ('Ethereum', eth_data), ('BNB', bnb_data)]:
    total_return = (data['price'].iloc[-1] / data['price'].iloc[0] - 1) * 100
    max_price = data['price'].max()
    min_price = data['price'].min()
    drawdown = ((data['price'] / data['price'].cummax()) - 1).min() * 100
    
    print(f"\n{name}:")
    print(f"  Total Return: {total_return:+.1f}%")
    print(f"  Price Range: ${min_price:,.0f} - ${max_price:,.0f}")
    print(f"  Max Drawdown: {drawdown:.1f}%")

### Reflection Questions (Exercise 1)

Write 200-250 words addressing:

1.  **Data Quality**: What challenges might arise from using free
    aggregator APIs versus licensed data feeds? How might wash trading
    on some exchanges affect aggregate data quality?

2.  **Price Fragmentation**: CoinGecko aggregates prices across
    exchanges. Why might Bitcoin trade at different prices
    simultaneously on different venues? What arbitrage mechanisms should
    eliminate these spreads?

3.  **Volume Interpretation**: How should we interpret trading volume
    data knowing that significant portion might be wash trading? What
    alternative metrics could measure genuine market activity?

## Exercise 2: Volatility and Risk Analysis

### Calculating Returns and Volatility

In [4]:
# Calculate log returns
btc_data['returns'] = np.log(btc_data['price'] / btc_data['price'].shift(1))
eth_data['returns'] = np.log(eth_data['price'] / eth_data['price'].shift(1))
bnb_data['returns'] = np.log(bnb_data['price'] / bnb_data['price'].shift(1))

# Remove NaN values
btc_returns = btc_data['returns'].dropna()
eth_returns = eth_data['returns'].dropna()
bnb_returns = bnb_data['returns'].dropna()

# Calculate volatility metrics
def calculate_volatility_metrics(returns, name):
    """
    Calculate comprehensive volatility statistics for cryptocurrency returns.
    
    Computes key risk metrics used by portfolio managers: realized volatility,
    tail risk measures (VaR), and distribution shape (skewness, kurtosis).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns (log or simple returns)
    name : str
        Asset name for display in output
        
    Returns
    -------
    dict
        Dictionary with keys:
        - 'daily_vol' : float, daily standard deviation
        - 'annual_vol' : float, annualized standard deviation (daily * sqrt(365))
        - 'rolling_vol' : pd.Series, 30-day rolling volatility
        - 'skew' : float, skewness (negative = left tail)
        - 'kurt' : float, excess kurtosis (> 0 = fat tails)
        - 'var_95' : float, 5th percentile return (1-day VaR at 95%)
        - 'var_99' : float, 1st percentile return (1-day VaR at 99%)
        
    Notes
    -----
    - Annualization assumes 365 trading days (crypto markets trade 24/7)
    - Traditional equity markets use 252 trading days
    - VaR is historical (empirical percentiles), not parametric (Gaussian assumption)
    - Fat tails (kurtosis > 3) mean VaR underestimates extreme losses
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> metrics = calculate_volatility_metrics(btc_returns, 'Bitcoin')
    >>> metrics['annual_vol']
    0.65  # 65% annualized volatility (vs. S&P 500 ~15%)
    """
    daily_vol = returns.std()
    annual_vol = daily_vol * np.sqrt(365)
    
    # Rolling volatility (30-day window)
    rolling_vol = returns.rolling(30).std() * np.sqrt(365)
    
    # Skewness and kurtosis
    skew = stats.skew(returns.dropna())
    kurt = stats.kurtosis(returns.dropna())
    
    # Value at Risk (95% and 99%)
    var_95 = np.percentile(returns.dropna(), 5)
    var_99 = np.percentile(returns.dropna(), 1)
    
    print(f"\n{name} Volatility Metrics:")
    print(f"  Daily Volatility: {daily_vol*100:.2f}%")
    print(f"  Annualized Volatility: {annual_vol*100:.1f}%")
    print(f"  Skewness: {skew:.3f} {'(negative tail)' if skew < 0 else '(positive tail)'}")
    print(f"  Kurtosis: {kurt:.3f} {'(fat tails)' if kurt > 3 else '(thin tails)'}")
    print(f"  VaR (95%): {var_95*100:.2f}% (1-day)")
    print(f"  VaR (99%): {var_99*100:.2f}% (1-day)")
    
    return {
        'daily_vol': daily_vol,
        'annual_vol': annual_vol,
        'rolling_vol': rolling_vol,
        'skew': skew,
        'kurt': kurt,
        'var_95': var_95,
        'var_99': var_99
    }

print("="*70)
print("VOLATILITY ANALYSIS")
print("="*70)

btc_vol = calculate_volatility_metrics(btc_returns, "Bitcoin")
eth_vol = calculate_volatility_metrics(eth_returns, "Ethereum")
bnb_vol = calculate_volatility_metrics(bnb_returns, "BNB")

# Compare to traditional assets (typical values for reference)
print("\n" + "-"*70)
print("COMPARISON TO TRADITIONAL ASSETS (typical values):")
print("-"*70)
print("S&P 500:      Annual Vol ~15-20%, Skew ~-0.5, Kurtosis ~5-8")
print("Gold:         Annual Vol ~15-18%, Skew ~0.2, Kurtosis ~3-5")
print("Treasury Bonds: Annual Vol ~5-8%, Skew ~0.0, Kurtosis ~3-4")
print("\nCryptocurrency volatility is 3-5x higher than traditional assets!")

### Visualizing Return Distributions

In [5]:
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin return distribution
axes[0, 0].hist(btc_returns * 100, bins=50, alpha=0.7, color='orange', edgecolor='black')
axes[0, 0].axvline(btc_returns.mean() * 100, color='red', linestyle='--', linewidth=2, label=f'Mean: {btc_returns.mean()*100:.2f}%')
axes[0, 0].set_title('Bitcoin Daily Returns Distribution', fontsize=13, fontweight='bold')
axes[0, 0].set_xlabel('Daily Return (%)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# QQ plot for normality test
stats.probplot(btc_returns.dropna(), dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot: Bitcoin Returns vs Normal Distribution', fontsize=13, fontweight='bold')
axes[0, 1].grid(alpha=0.3)

# Rolling volatility
axes[1, 0].plot(btc_vol['rolling_vol'].index, btc_vol['rolling_vol'] * 100, 
                color='purple', linewidth=2, label='BTC Rolling Vol (30d)')
axes[1, 0].plot(eth_vol['rolling_vol'].index, eth_vol['rolling_vol'] * 100, 
                color='blue', linewidth=2, alpha=0.7, label='ETH Rolling Vol (30d)')
axes[1, 0].set_title('Rolling Volatility (30-day window)', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Annualized Volatility (%)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Volatility comparison bar chart
vol_comparison = pd.DataFrame({
    'Bitcoin': [btc_vol['annual_vol'] * 100],
    'Ethereum': [eth_vol['annual_vol'] * 100],
    'BNB': [bnb_vol['annual_vol'] * 100],
    'S&P 500': [17.5],  # Typical value
    'Gold': [16.5]  # Typical value
})

vol_comparison.T.plot(kind='bar', ax=axes[1, 1], legend=False, color=['orange', 'blue', 'gold', 'green', 'brown'])
axes[1, 1].set_title('Annualized Volatility Comparison', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Volatility (%)')
axes[1, 1].set_xlabel('Asset')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha='right')
axes[1, 1].axhline(y=20, color='red', linestyle='--', alpha=0.5, label='20% threshold')
axes[1, 1].grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Statistical tests for normality
print("\n" + "="*70)
print("NORMALITY TESTS")
print("="*70)

for name, returns in [('Bitcoin', btc_returns), ('Ethereum', eth_returns)]:
    # Jarque-Bera test
    jb_stat, jb_pval = stats.jarque_bera(returns.dropna())
    
    # Shapiro-Wilk test (sample if too large)
    sample_returns = returns.dropna().sample(min(5000, len(returns)))
    sw_stat, sw_pval = stats.shapiro(sample_returns)
    
    print(f"\n{name}:")
    print(f"  Jarque-Bera test: statistic={jb_stat:.2f}, p-value={jb_pval:.4f}")
    print(f"    {'Reject normality' if jb_pval < 0.05 else 'Cannot reject normality'} (α=0.05)")
    print(f"  Shapiro-Wilk test: statistic={sw_stat:.4f}, p-value={sw_pval:.4f}")
    print(f"    {'Reject normality' if sw_pval < 0.05 else 'Cannot reject normality'} (α=0.05)")

print("\n💡 Returns exhibit fat tails and deviate significantly from normal distribution!")

### Correlation Analysis

In [6]:
# Combine returns into single DataFrame
returns_df = pd.DataFrame({
    'BTC': btc_returns,
    'ETH': eth_returns,
    'BNB': bnb_returns
}).dropna()

# Calculate correlation matrix
corr_matrix = returns_df.corr()

# Visualize correlations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Heatmap
sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='RdYlGn', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=axes[0])
axes[0].set_title('Cryptocurrency Correlation Matrix', fontsize=13, fontweight='bold')

# Scatter plot: BTC vs ETH
axes[1].scatter(returns_df['BTC'] * 100, returns_df['ETH'] * 100, alpha=0.5, s=20)
axes[1].set_xlabel('Bitcoin Daily Return (%)')
axes[1].set_ylabel('Ethereum Daily Return (%)')
axes[1].set_title(f'BTC-ETH Correlation: {corr_matrix.loc["BTC", "ETH"]:.3f}', 
                  fontsize=13, fontweight='bold')
axes[1].axhline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].axvline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].grid(alpha=0.3)

# Add regression line
z = np.polyfit(returns_df['BTC'], returns_df['ETH'], 1)
p = np.poly1d(z)
axes[1].plot(returns_df['BTC'] * 100, p(returns_df['BTC']) * 100, 
             "r--", alpha=0.8, linewidth=2, label=f'Regression line')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("CORRELATION ANALYSIS")
print("="*70)
print("\nWithin-Crypto Correlations:")
print(corr_matrix)
print("\n💡 High correlations (0.5-0.8) limit diversification within cryptocurrency portfolios")

# Rolling correlation
rolling_corr_btc_eth = returns_df['BTC'].rolling(90).corr(returns_df['ETH'])

plt.figure(figsize=(12, 5))
plt.plot(rolling_corr_btc_eth.index, rolling_corr_btc_eth, linewidth=2, color='purple')
plt.axhline(y=rolling_corr_btc_eth.mean(), color='red', linestyle='--', 
            label=f'Mean: {rolling_corr_btc_eth.mean():.3f}')
plt.title('Rolling Correlation: Bitcoin vs Ethereum (90-day window)', fontsize=13, fontweight='bold')
plt.ylabel('Correlation')
plt.xlabel('Date')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nRolling BTC-ETH correlation: Mean={rolling_corr_btc_eth.mean():.3f}, "
      f"Std={rolling_corr_btc_eth.std():.3f}")
print("Note: Correlation increases during volatile periods (contagion effect)")

### Reflection Questions (Exercise 2)

Write 250-300 words addressing:

1.  **Volatility Implications**: Bitcoin’s 60-80% annualized volatility
    is 3-4x higher than equities. What does this mean for: (a) using
    Bitcoin as currency (purchasing power stability)? (b) portfolio
    allocation (risk contribution)? (c) options pricing and risk
    management?

2.  **Fat Tails and Risk Models**: The Q-Q plot shows Bitcoin returns
    deviate from normality with fat tails. Why do standard risk models
    (VaR assuming normal distribution) underestimate tail risk? What
    practical consequences does this have?

3.  **Correlation Patterns**: Cryptocurrencies show high correlation
    with each other (0.5-0.8) but time-varying correlation with
    equities. What does this mean for diversification benefits within
    crypto portfolios versus across asset classes?

## Exercise 3: Market Efficiency Testing

### Autocorrelation Analysis

In [7]:
# Test for autocorrelation (do past returns predict future returns?)
def test_autocorrelation(returns, name, max_lag=20):
    """
    Test for serial correlation in returns (market efficiency diagnostic).
    
    Autocorrelation measures whether past returns predict future returns. If
    significant autocorrelation exists, markets are inefficient (predictable).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns series
    name : str
        Asset name for display
    max_lag : int, default=20
        Maximum lag to test (20 days = ~1 month)
        
    Returns
    -------
    None
        Prints test results and displays ACF plot
        
    Notes
    -----
    **Ljung-Box Test:**
    - Null hypothesis: No autocorrelation up to lag k
    - p-value < 0.05 → Reject H0 → Significant autocorrelation (inefficiency)
    
    **Efficient Market Hypothesis (weak form):**
    - If markets are efficient, past prices shouldn't predict future prices
    - ACF should be ~0 at all lags (within 95% confidence bands)
    - Crypto often shows significant autocorrelation (inefficient)
    
    **Why Crypto Markets Are Inefficient:**
    - Fragmented liquidity across hundreds of exchanges
    - High transaction costs (gas fees, spreads)
    - Retail-dominated (fewer arbitrageurs)
    - 24/7 trading → slower price discovery
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> test_autocorrelation(btc_returns, 'Bitcoin', max_lag=20)
    Bitcoin Autocorrelation Analysis:
      Lag-1 Autocorrelation: 0.0234
      Ljung-Box p-value (lag 10): 0.0012  # Reject H0 → Inefficient!
    """
    
    # Calculate autocorrelation function
    acf_values = acf(returns.dropna(), nlags=max_lag, fft=False)
    
    # Ljung-Box test for joint significance
    from statsmodels.stats.diagnostic import acorr_ljungbox
    lb_test = acorr_ljungbox(returns.dropna(), lags=[5, 10, 20], return_df=True)
    
    print(f"\n{name} Autocorrelation Analysis:")
    print(f"  Lag-1 Autocorrelation: {acf_values[1]:.4f}")
    print(f"  Lag-5 Autocorrelation: {acf_values[5]:.4f}")
    print("\nLjung-Box Test (joint significance):")
    print(lb_test)
    
    # Plot ACF
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.stem(range(len(acf_values)), acf_values, basefmt=" ")
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.axhline(y=1.96/np.sqrt(len(returns)), color='red', linestyle='--', label='95% CI')
    ax.axhline(y=-1.96/np.sqrt(len(returns)), color='red', linestyle='--')
    ax.set_title(f'{name} Autocorrelation Function', fontsize=13, fontweight='bold')
    ax.set_xlabel('Lag (days)')
    ax.set_ylabel('Autocorrelation')
    ax.legend()
    ax.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    return acf_values

print("="*70)
print("MARKET EFFICIENCY: AUTOCORRELATION TESTS")
print("="*70)

btc_acf = test_autocorrelation(btc_returns, "Bitcoin")
eth_acf = test_autocorrelation(eth_returns, "Ethereum")

print("\n💡 Interpretation: Significant autocorrelation suggests predictability (market inefficiency)")
print("   Small correlations may not be economically significant after transaction costs")

### Momentum Strategy Backtest

In [8]:
# Simple momentum strategy: buy if price > 50-day MA, sell otherwise
def momentum_strategy(prices, short_window=10, long_window=50):
    """
    Backtest simple moving average crossover momentum strategy.
    
    Classic technical analysis strategy: buy when short MA crosses above long MA,
    sell when it crosses below. Tests whether momentum exists in crypto markets.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    short_window : int, default=10
        Short moving average window (days)
    long_window : int, default=50
        Long moving average window (days)
        
    Returns
    -------
    pd.DataFrame
        Columns:
        - 'price' : original prices
        - 'MA_short' : short-window moving average
        - 'MA_long' : long-window moving average
        - 'signal' : trading position (+1 = long, -1 = short, 0 = no position)
        - 'returns' : buy-and-hold returns
        - 'strategy_returns' : strategy returns (position × market return)
        - 'cum_returns' : cumulative buy-and-hold
        - 'cum_strategy' : cumulative strategy performance
        
    Notes
    -----
    **Strategy Logic:**
    - Golden Cross: Short MA > Long MA → Buy signal
    - Death Cross: Short MA < Long MA → Sell signal
    
    **Reality Check:**
    - This is a **naive backtest** (ignores transaction costs, slippage, fees)
    - Crypto trading fees ~0.1-0.5% per trade → eats into profits
    - No position sizing, risk management, or stop-losses
    - Past performance ≠ future returns (overfitting risk)
    
    **Academic Evidence:**
    - Momentum works in equities (Jegadeesh & Titman 1993, JF)
    - Crypto momentum: mixed evidence, high volatility dominates
    - Transaction costs often exceed strategy alpha
    
    Examples
    --------
    >>> btc_momentum = momentum_strategy(btc_data['price'])
    >>> strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
    >>> print(f"Strategy return: {strategy_return:.1f}%")
    Strategy return: -5.2%  # Often underperforms buy-and-hold after costs
    """
    df = pd.DataFrame({'price': prices})
    
    # Calculate moving averages
    df['MA_short'] = df['price'].rolling(short_window).mean()
    df['MA_long'] = df['price'].rolling(long_window).mean()
    
    # Generate signals
    df['signal'] = 0
    df.loc[df['MA_short'] > df['MA_long'], 'signal'] = 1  # Buy signal
    df.loc[df['MA_short'] < df['MA_long'], 'signal'] = -1  # Sell signal
    
    # Calculate returns
    df['returns'] = df['price'].pct_change()
    df['strategy_returns'] = df['signal'].shift(1) * df['returns']
    
    # Cumulative returns
    df['cum_returns'] = (1 + df['returns']).cumprod()
    df['cum_strategy'] = (1 + df['strategy_returns']).cumprod()
    
    return df

# Run momentum strategy
btc_momentum = momentum_strategy(btc_data['price'])

# Calculate performance metrics
total_return = (btc_momentum['cum_returns'].iloc[-1] - 1) * 100
strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
excess_return = strategy_return - total_return

print("\n" + "="*70)
print("MOMENTUM STRATEGY BACKTEST")
print("="*70)
print(f"\nBuy-and-Hold Return: {total_return:.2f}%")
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Excess Return: {excess_return:+.2f}%")

# Visualize strategy performance
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Price and moving averages
axes[0].plot(btc_momentum.index, btc_momentum['price'], label='Bitcoin Price', color='orange', linewidth=2)
axes[0].plot(btc_momentum.index, btc_momentum['MA_short'], label=f'{10}D MA', color='blue', linewidth=1.5)
axes[0].plot(btc_momentum.index, btc_momentum['MA_long'], label=f'{50}D MA', color='red', linewidth=1.5)
axes[0].set_title('Bitcoin Price with Moving Averages', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Cumulative returns comparison
axes[1].plot(btc_momentum.index, btc_momentum['cum_returns'], 
             label='Buy and Hold', color='gray', linewidth=2)
axes[1].plot(btc_momentum.index, btc_momentum['cum_strategy'], 
             label='Momentum Strategy', color='green', linewidth=2)
axes[1].set_title('Cumulative Returns: Strategy vs Buy-and-Hold', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Cumulative Return (Base = 1)')
axes[1].set_xlabel('Date')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n⚠️  Note: This backtest ignores transaction costs, slippage, and taxes")
print("    Real-world implementation would have lower returns")

### Mean Reversion Test

In [9]:
# Augmented Dickey-Fuller test for stationarity/mean reversion
def test_mean_reversion(prices, name):
    """
    Test for mean reversion using Augmented Dickey-Fuller (ADF) test.
    
    Tests whether prices follow a random walk (unit root) or revert to a mean
    (stationary). Critical for pairs trading and mean-reversion strategies.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    name : str
        Asset name for display
        
    Returns
    -------
    tuple
        ADF test results: (statistic, p-value, lags_used, nobs, critical_values, icbest)
        
    Notes
    -----
    **Augmented Dickey-Fuller Test:**
    - Null hypothesis (H0): Series has unit root (random walk, NOT mean-reverting)
    - Alternative (H1): Series is stationary (mean-reverting)
    - p-value < 0.05 → Reject H0 → Prices are stationary (mean-reverting)
    - p-value > 0.05 → Cannot reject H0 → Random walk (efficient market)
    
    **Implications for Trading:**
    - **Random walk** (efficient): Momentum strategies may work, mean-reversion won't
    - **Mean-reverting** (inefficient): Pairs trading, statistical arbitrage possible
    
    **Why This Matters:**
    - Most financial time series have unit roots (Campbell, Lo, MacKinlay 1997)
    - Crypto markets often show mixed evidence (regime-dependent)
    - Low p-values may be spurious (structural breaks, volatility clustering)
    
    **Technical Details:**
    - Test performed on log prices (handles exponential growth)
    - Regression includes constant term ('c') but not trend
    - AIC criterion selects optimal lag length
    
    Examples
    --------
    >>> btc_adf = test_mean_reversion(btc_data['price'], 'Bitcoin')
    Bitcoin - Augmented Dickey-Fuller Test:
      ADF Statistic: -1.234
      P-value: 0.658  # Cannot reject unit root → Random walk
    """
    
    # ADF test on log prices
    log_prices = np.log(prices)
    adf_result = adfuller(log_prices.dropna(), maxlag=20, regression='c', autolag='AIC')
    
    print(f"\n{name} - Augmented Dickey-Fuller Test:")
    print(f"  ADF Statistic: {adf_result[0]:.4f}")
    print(f"  P-value: {adf_result[1]:.4f}")
    print(f"  Critical Values:")
    for key, value in adf_result[4].items():
        print(f"    {key}: {value:.4f}")
    
    if adf_result[1] < 0.05:
        print(f"  ✓ Reject unit root (prices are stationary/mean-reverting)")
    else:
        print(f"  ✗ Cannot reject unit root (prices have unit root/random walk)")
    
    return adf_result

print("\n" + "="*70)
print("MEAN REVERSION TESTS")
print("="*70)

btc_adf = test_mean_reversion(btc_data['price'], "Bitcoin")
eth_adf = test_mean_reversion(eth_data['price'], "Ethereum")

print("\n💡 If prices follow random walk, past prices don't predict future prices")
print("   This supports weak-form market efficiency")

### Reflection Questions (Exercise 3)

Write 200-250 words addressing:

1.  **Efficiency Interpretation**: What do your autocorrelation and
    momentum results suggest about Bitcoin market efficiency? Can small
    autocorrelations or strategy profits coexist with efficient markets?

2.  **Transaction Costs Matter**: The momentum strategy showed
    \[profit/loss\] before transaction costs. Cryptocurrency trading
    costs 0.1-0.5% per trade. Would your strategy be profitable after
    accounting for costs? Show rough calculations.

3.  **Limits to Arbitrage**: Even if inefficiencies exist (predictable
    patterns), what practical barriers prevent traders from exploiting
    them and eliminating the patterns?

## Summary and Integration

### What We’ve Learned

Through these exercises, you’ve:

1.  **Accessed real cryptocurrency market data** using public APIs,
    experiencing data quality challenges and fragmentation

2.  **Quantified extreme volatility** (60-80% annualized) that makes
    cryptocurrency unsuitable as currency and challenging as investment

3.  **Documented fat tail distributions** that violate normal
    distribution assumptions and cause standard risk models to
    underestimate tail risk

4.  **Measured high correlations** within crypto (0.5-0.8) limiting
    diversification benefits

5.  **Tested market efficiency** finding mixed evidence—some weak
    predictability but likely not exploitable after costs

6.  **Evaluated inclusion claims** implicitly through data analysis—if
    crypto were banking the unbanked, we’d see different adoption and
    usage patterns

### Connections to Course Themes

-   **Week 2 (APIs)**: Cryptocurrency data is openly accessible via
    APIs, democratizing financial data but creating standardization
    challenges

-   **Week 3 (Platforms)**: Exchanges are platforms matching
    buyers/sellers; fragmentation creates arbitrage opportunities but
    liquidity challenges

-   **Week 6 (Financial Inclusion)**: Mobile money (M-Pesa) showed
    rigorous welfare evidence; cryptocurrency shows speculative usage
    among wealthy

-   **Week 8 (Blockchain)**: Next week explores blockchain technology
    and fraud detection more deeply

### Critical Evaluation Framework

When evaluating cryptocurrency or any FinTech innovation:

1.  **Examine actual data** (adoption, usage, outcomes) versus marketing
    claims
2.  **Measure risks quantitatively** (volatility, correlations, tail
    risk)
3.  **Compare to alternatives** (mobile money, traditional finance)
4.  **Demand welfare evidence** (does it help intended beneficiaries?)
5.  **Account for barriers** (technical, knowledge, economic)

### Assessment Preparation

**FIN510 Coursework 2**: You can analyze cryptocurrency factors
(momentum, volatility, size) using methods from this lab. The data is
freely available; the techniques transfer from equity factors.

**FIN720**: Critical evaluation of crypto financial inclusion claims
(using Week 6-7 frameworks) makes excellent reflective analysis topic.
Use data analysis to support arguments.

### Further Exploration

If interested in extending your analysis:

-   **Cross-asset correlations**: Download S&P 500 or gold data; analyze
    Bitcoin-equity correlation dynamics
-   **Volatility forecasting**: Implement GARCH models to forecast
    future volatility
-   **Arbitrage opportunities**: Compare prices across multiple
    exchanges in real-time
-   **DeFi analysis**: Examine yield farming APYs, liquidity pool
    dynamics, or stablecoin deviations from peg
-   **On-chain metrics**: Analyze blockchain data (active addresses,
    transaction volumes) as predictors

------------------------------------------------------------------------

**Excellent work! You’ve completed rigorous empirical analysis of
cryptocurrency markets, connecting data to theory and claims to
evidence.**