to do:
- import market data, equal weights profolio part 2, betas from part 3
- calculate daily hedge ratio at each time t
- make df of all these variables and for hedged returns, with index = time
- do i need to redo math of each day's returns if i have the hedge returns formula
- is short spy position mean what %, $, or # of shares of profolio to put in spy?


## Part 4Ô∏è‚É£: Hedging Strategy Implementation

### ‚úÖ Objective:
Isolate **alpha** by removing market exposure from the portfolio using dynamic hedging based on beta.

---

### üîß Tasks:

1. **Import required data**:
   - Market data (e.g., SPY prices)
   - Equal-weighted portfolio value from **Part 2**
   - Time-varying portfolio betas from **Part 3**

2. **Calculate daily hedge ratio** (`h_t`) for each day `t`:
   - Use the formula:  
     $h_t = \beta_t \times \frac{\text{Portfolio Value at time } t}{\text{SPY Price at time } t}$

   - This gives the **number of SPY shares to short** at each point in time.

3. **Construct a DataFrame** with:
   - `Date` (as index)
   - Portfolio value (`P_t`)
   - SPY price (`SPY_t`)
   - Beta (`Œ≤_t`)
   - Hedge ratio (`h_t`)
   - Short SPY position
   - Portfolio returns (`r_p,t`)
   - Market returns (`r_mkt,t`)
   - Hedged protfolio value
   - **Hedged portfolio return** (`r_hp,t`) calculated as:  
     $r_{hp,t} = r_{p,t} - \beta_t \cdot r_{mkt,t}$

4. **Clarifications / Notes**:
   - You **do not need to recompute portfolio returns** if you already have them ‚Äî just plug into the hedged return formula.
   - The short SPY position is in **number of shares** (not % or $) based on hedge ratio formula.
   - Portfolio value **should include cash** (if tracking both assets + cash, especially after shorting).

---

### üîç Reminder: What is this doing?

- Hedging removes the portion of returns that can be explained by overall market movement.
- The **goal** is to isolate the "alpha" or **idiosyncratic return** of your strategy, i.e., how well your stock selection performs *independently* of the market.

---

### üí° Tip:

If you‚Äôre confused by what the hedge ratio represents:
- Think of it as: *"How many SPY shares should I short to offset the market exposure of my portfolio on that day?"*



## 1. Import Required Data

### 1.0 Import libraries

In [17]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from sklearn.linear_model import LinearRegression
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.6f}'.format)

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("üìö Libraries imported successfully!")
print("üìä Market Exposure Estimation System Ready!")

# Configuration parameters
EXPOSURE_CONFIG = {
    'benchmark_ticker': 'SPY',           # Market benchmark (S&P 500 ETF)
    'rolling_window': 60,                # Rolling regression window (60 days)
    'min_periods': 30,                   # Minimum periods for regression
    'confidence_level': 0.95,            # Confidence level for statistical tests
    'start_date': '2020-01-01',          # Analysis start date
    'end_date': '2025-07-31'             # Analysis end date
}

print(f"‚öôÔ∏è Market exposure analysis parameters:")
for param, value in EXPOSURE_CONFIG.items():
    print(f"  {param}: {value}")

print(f"\nüéØ Analysis Framework:")
print(f"  ‚Ä¢ Benchmark: {EXPOSURE_CONFIG['benchmark_ticker']} (S&P 500 ETF)")
print(f"  ‚Ä¢ Rolling window: {EXPOSURE_CONFIG['rolling_window']} trading days")
print(f"  ‚Ä¢ Model: r_p,t = Œ±_t + Œ≤_t √ó r_mkt,t + Œµ_t")
print(f"  ‚Ä¢ Beta interpretation: Market sensitivity coefficient")

üìö Libraries imported successfully!
üìä Market Exposure Estimation System Ready!
‚öôÔ∏è Market exposure analysis parameters:
  benchmark_ticker: SPY
  rolling_window: 60
  min_periods: 30
  confidence_level: 0.95
  start_date: 2020-01-01
  end_date: 2025-07-31

üéØ Analysis Framework:
  ‚Ä¢ Benchmark: SPY (S&P 500 ETF)
  ‚Ä¢ Rolling window: 60 trading days
  ‚Ä¢ Model: r_p,t = Œ±_t + Œ≤_t √ó r_mkt,t + Œµ_t
  ‚Ä¢ Beta interpretation: Market sensitivity coefficient


### 1.1 Import daily % returns of market from adj. close prices (aligned_benchmark_returns)

In [58]:
# Download benchmark (SPY) data
print("üìä Downloading market benchmark data...")

try:
    # Download SPY data for the analysis period
    benchmark_ticker = EXPOSURE_CONFIG['benchmark_ticker']
    start_date = EXPOSURE_CONFIG['start_date']
    end_date = EXPOSURE_CONFIG['end_date']
    
    benchmark_data = yf.download(benchmark_ticker, start=start_date, end=end_date, progress=False)
    benchmark_prices = benchmark_data['Adj Close']
    
    # Calculate benchmark returns
    benchmark_returns = benchmark_prices.pct_change().dropna()
    
    print(f"‚úÖ Successfully downloaded {benchmark_ticker} data")
    print(f"üìà Benchmark data shape: {benchmark_prices.shape}")
    print(f"üìÖ Date range: {benchmark_prices.index.min().strftime('%Y-%m-%d')} to {benchmark_prices.index.max().strftime('%Y-%m-%d')}")
    
except Exception as e:
    print(f"‚ö†Ô∏è Error downloading benchmark data: {e}")
    print("Creating sample benchmark data for demonstration...")
    
    # Generate sample benchmark data
    np.random.seed(42)
    dates = pd.date_range(start=start_date, end=end_date, freq='D')
    dates = dates[dates.weekday < 5]  # Remove weekends
    
    # Generate realistic market returns (lower volatility than individual stocks)
    market_returns = np.random.normal(0.0004, 0.015, len(dates))  # ~10% annual return, 24% volatility
    
    # Calculate cumulative prices starting from $100
    benchmark_prices = pd.Series(100, index=dates)
    for i in range(1, len(dates)):
        benchmark_prices.iloc[i] = benchmark_prices.iloc[i-1] * (1 + market_returns[i])
    
    benchmark_returns = benchmark_prices.pct_change().dropna()
    print(f"üìä Generated sample benchmark data with shape: {benchmark_prices.shape}")

print(f"\nüìã Benchmark Statistics:")
print(f"  ‚Ä¢ Mean daily return: {benchmark_returns.mean():.6f} ({benchmark_returns.mean()*252*100:.2f}% annual)")
print(f"  ‚Ä¢ Daily volatility: {benchmark_returns.std():.6f} ({benchmark_returns.std()*np.sqrt(252)*100:.2f}% annual)")
print(f"  ‚Ä¢ Min return: {benchmark_returns.min():.6f} ({benchmark_returns.min()*100:.2f}%)")
print(f"  ‚Ä¢ Max return: {benchmark_returns.max():.6f} ({benchmark_returns.max()*100:.2f}%)")

# Load portfolio returns from previous analysis
print(f"\nüíº Loading portfolio returns from previous analysis...")

portfolio_data_loaded = False

# Try to load from Part 2 (Equal-weight portfolio)
try:
    equal_weight_data = pd.read_csv('../Part 2: Initial Portfolio Construction/equal_weight_portfolio_results.csv', 
                                    index_col=0, parse_dates=True)
    portfolio_returns = equal_weight_data['Portfolio_Return']
    portfolio_name = "Equal-Weight Portfolio"
    portfolio_data_loaded = True
    print(f"‚úÖ Loaded equal-weight portfolio returns from Part 2")
    print(f"üìä Portfolio returns shape: {portfolio_returns.shape}")
    
except FileNotFoundError:
    print("‚ö†Ô∏è Equal-weight portfolio data not found in Part 2")

# Try to load from Part 2.5 (Signal-weighted portfolio) as alternative
if not portfolio_data_loaded:
    try:
        signal_weight_data = pd.read_csv('../2.5: Technical Indicators & Signal Design/portfolio_performance_comparison.csv', 
                                         index_col=0, parse_dates=True)
        portfolio_returns = signal_weight_data['Signal_Weighted_Return']
        portfolio_name = "Signal-Weighted Portfolio"
        portfolio_data_loaded = True
        print(f"‚úÖ Loaded signal-weighted portfolio returns from Part 2.5")
        print(f"üìä Portfolio returns shape: {portfolio_returns.shape}")
        
    except FileNotFoundError:
        print("‚ö†Ô∏è Signal-weighted portfolio data not found in Part 2.5")

# Generate sample portfolio data if no data found
if not portfolio_data_loaded:
    print("‚ö†Ô∏è No portfolio data found. Generating sample portfolio returns...")
    
    # Create sample portfolio returns (higher volatility than market)
    np.random.seed(123)
    portfolio_returns_data = np.random.normal(0.0006, 0.025, len(benchmark_returns))  # ~15% annual return, 40% volatility
    portfolio_returns = pd.Series(portfolio_returns_data, index=benchmark_returns.index)
    portfolio_name = "Sample Portfolio"
    print(f"üìä Generated sample portfolio returns with shape: {portfolio_returns.shape}")

print(f"\nüìã Portfolio Statistics:")
print(f"  ‚Ä¢ Portfolio: {portfolio_name}")
print(f"  ‚Ä¢ Mean daily return: {portfolio_returns.mean():.6f} ({portfolio_returns.mean()*252*100:.2f}% annual)")
print(f"  ‚Ä¢ Daily volatility: {portfolio_returns.std():.6f} ({portfolio_returns.std()*np.sqrt(252)*100:.2f}% annual)")
print(f"  ‚Ä¢ Min return: {portfolio_returns.min():.6f} ({portfolio_returns.min()*100:.2f}%)")
print(f"  ‚Ä¢ Max return: {portfolio_returns.max():.6f} ({portfolio_returns.max()*100:.2f}%)")

# Align portfolio and benchmark returns for analysis
common_dates = portfolio_returns.index.intersection(benchmark_returns.index)
aligned_portfolio_returns = portfolio_returns.loc[common_dates]
aligned_benchmark_returns = benchmark_returns.loc[common_dates]
aligned_benchmark_returns.name = "Market Adj Close"

profolio_vals = equal_weight_data['Portfolio_Value']
benchmark_prices.name = "SPY_Price"
benchmark_returns.name = "Market_Returns"

üìä Downloading market benchmark data...
‚ö†Ô∏è Error downloading benchmark data: 'Adj Close'
Creating sample benchmark data for demonstration...
üìä Generated sample benchmark data with shape: (1457,)

üìã Benchmark Statistics:
  ‚Ä¢ Mean daily return: 0.001050 (26.46% annual)
  ‚Ä¢ Daily volatility: 0.014830 (23.54% annual)
  ‚Ä¢ Min return: -0.048219 (-4.82%)
  ‚Ä¢ Max return: 0.058191 (5.82%)

üíº Loading portfolio returns from previous analysis...
‚úÖ Loaded equal-weight portfolio returns from Part 2
üìä Portfolio returns shape: (1254,)

üìã Portfolio Statistics:
  ‚Ä¢ Portfolio: Equal-Weight Portfolio
  ‚Ä¢ Mean daily return: 0.000574 (14.47% annual)
  ‚Ä¢ Daily volatility: 0.010776 (17.11% annual)
  ‚Ä¢ Min return: -0.066011 (-6.60%)
  ‚Ä¢ Max return: 0.079485 (7.95%)


###1.3 Import betas

In [None]:
betas = pd.read_csv('../Part 3: Market Exposure Estimation/portfolio_beta_timeseries.csv',
                    index_col='Date',
                    parse_dates=True)
betas = betas['Beta']

In [43]:
benchmark_prices
profolio_vals
betas

Date
2020-10-27   -0.015852
2020-10-28   -0.033554
2020-10-29   -0.038975
2020-10-30   -0.033614
2020-11-02   -0.048307
                ...   
2025-07-25    0.038051
2025-07-28    0.013906
2025-07-29    0.012007
2025-07-30   -0.008292
2025-07-31   -0.016522
Name: Beta, Length: 1195, dtype: float64

## 2. **Calculate daily hedge ratio** (`h_t`) for each day `t`:

In [62]:
daily_hedge_ratios = betas * profolio_vals / benchmark_prices
daily_hedge_ratios = abs(daily_hedge_ratios)
daily_hedge_ratios = daily_hedge_ratios.dropna()

In [63]:
daily_hedge_ratios.name = "hedge_ratio"
daily_hedge_ratios

2020-10-27   14.948832
2020-10-28   30.345019
2020-10-29   36.013132
2020-10-30   31.016938
2020-11-02   45.692249
                ...   
2025-07-25   19.481096
2025-07-28    6.815121
2025-07-29    5.826022
2025-07-30    4.120976
2025-07-31    8.033080
Name: hedge_ratio, Length: 1195, dtype: float64

## 3. Construct DF of hedged and unhedged profolios

In [None]:
spy_position = - daily_hedge_ratios * benchmark_prices
spy_position.name = "Short_SPY_Position"

hedge_protfolio_value = profolio_vals + spy_position
hedge_protfolio_value.name = "Hedged_Portfolio_Value"

hedge_protfolio_ret = portfolio_returns - betas * benchmark_returns
hedge_protfolio_ret.name = "Hedged_Portfolio_Return"

hedge_and_no_hedge_prot = pd.concat([profolio_vals, benchmark_prices, betas, 
                                       daily_hedge_ratios, spy_position, 
                                       portfolio_returns, benchmark_returns,
                                       hedge_protfolio_value, 
                                       hedge_protfolio_ret], 
                                    axis = 1, join = 'inner')

# hedge_and_no_hedge_prot.to_csv("hedge_and_no_hedge_timeseries")

odd - hedge_protfolio_ret doesnt equal hedge_protfolio_value.pct_change() at the moment