Michael Muschitiello // Robust Beta Computation


This notebook calculates both long-term and short-term betas for a “Magnificent 7” portfolio relative to the S&P 500. 
It demonstrates a multi-step workflow:

1. **Long-Term Beta (Monthly)**  
   - Gathers monthly price data (including dividends) for the Magnificent 7 and for the S&P 500.  
   - Combines dividend amounts with monthly closing prices, producing “total” monthly prices.  
   - Computes monthly log returns, then calculates beta in two ways:  
     - **Cov/Var beta** for a quick estimate.  
     - **Weighted Least Squares (WLS)** with exponential decay to emphasized recent data while still retaining historical trends.  

2. **Short-Term Beta (Weekly)**  
   - Gathers weekly price data and dividends from Yahoo Finance starting around mid-2023.  
   - Combines dividend amounts with weekly closing prices, then computes weekly log returns.  
   - Estimates beta using both Cov/Var and WLS with a decay factor that assigns 50% weight to the earliest observation.  

3. **Serial Correlation Testing**  
   - Uses Ljung-Box tests for both daily and weekly returns across specified date ranges (2019–2025 and 12/2023–2025).  
   - Prints results indicating whether any ticker exhibits statistically significant autocorrelation.

4. **Interpretation & Diagnostics**  
   - Prints regression summaries (alpha, beta, R-squared, residual standard deviation).  
   - Decomposes the relative contribution of the Magnificent 7 portion (roughly 28.38% of S&P 500) versus the other ~493 stocks.

## Use Cases
- **Portfolio Construction**: Helps shape a “beta-neutral” or “factor-based” strategy using estimated betas.  
- **Risk & Attribution**: Assesses how much risk Magnificent 7 adds to the S&P 500.  
- **Statistic Checks**: Quickly identifies presence or absence of serial correlation in returns.  

Overall, this notebook provides a comprehensive blueprint for integrating monthly and weekly price data, dividends, and WLS regression to estimate an evolving beta for a subset of the S&P 500 under varying time horizons.


## Long-term Beta calculation:
- Monthly total log returns
- WLS Regression
- Exponential decay factor lambda = 0.9832 to give .35 weight to oldest observation

In [2]:
import numpy as np 
import pandas as pd 
import yfinance as yf 
import matplotlib.pyplot as plt 
import statsmodels.api as sm
from statsmodels.stats.diagnostic import acorr_ljungbox

Testing for serial correlation 

In [3]:
def ljung_box_test(returns, max_lag=20):
    """
    Runs Ljung-Box test for serial correlation up to 'max_lag'.
    Returns a string indicating whether p-value < 0.05 (i.e., significant serial correlation).
    """
    # You can test multiple lags, but here we just take the p-value from the highest lag
    result = acorr_ljungbox(returns.dropna(), lags=[max_lag], return_df=True)
    p_value = result['lb_pvalue'].iloc[-1]
    if p_value < 0.05:
        return f"YES (p={p_value:.3e})"
    else:
        return f"NO (p={p_value:.3e})"

# 1) Define your tickers and date ranges
tickers = ['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'META', 'NVDA', 'TSLA', 'SPY']
start_long, end_long = '2019-01-01', '2025-12-31'
start_short, end_short = '2023-12-01', '2025-12-31'

# 2) Download daily price data for both time windows
data_long = yf.download(tickers, start=start_long, end=end_long, auto_adjust=True)['Close']
data_short = yf.download(tickers, start=start_short, end=end_short, auto_adjust=True)['Close']

# 3) Compute daily log returns
daily_returns_long = np.log(data_long / data_long.shift(1)).dropna()
daily_returns_short = np.log(data_short / data_short.shift(1)).dropna()

# 4) Compute weekly log returns (resample last business day of each week)
weekly_prices_long = data_long.resample('W-FRI').last().dropna()
weekly_returns_long = np.log(weekly_prices_long / weekly_prices_long.shift(1)).dropna()

weekly_prices_short = data_short.resample('W-FRI').last().dropna()
weekly_returns_short = np.log(weekly_prices_short / weekly_prices_short.shift(1)).dropna()

# 5) Run Ljung-Box test for each ticker in each set
def print_serial_corr_results(name, df_returns, max_lag=10):
    print(f"\n=== {name} ===")
    for col in df_returns.columns:
        test_result = ljung_box_test(df_returns[col], max_lag)
        print(f"{col}: Serial Correlation? {test_result}")

# Daily (2019–2025)
print_serial_corr_results("DAILY (2019–2025)", daily_returns_long)

# Daily (12/2023–2025)
print_serial_corr_results("DAILY (12/2023–2025)", daily_returns_short)

# Weekly (2019–2025)
print_serial_corr_results("WEEKLY (2019–2025)", weekly_returns_long)

# Weekly (12/2023–2025)
print_serial_corr_results("WEEKLY (12/2023–2025)", weekly_returns_short)

[*********************100%***********************]  8 of 8 completed
[*********************100%***********************]  8 of 8 completed


=== DAILY (2019–2025) ===
AAPL: Serial Correlation? YES (p=2.437e-12)
AMZN: Serial Correlation? NO (p=3.190e-01)
GOOGL: Serial Correlation? YES (p=8.902e-08)
META: Serial Correlation? YES (p=4.436e-02)
MSFT: Serial Correlation? YES (p=3.442e-25)
NVDA: Serial Correlation? YES (p=7.889e-06)
SPY: Serial Correlation? YES (p=9.441e-43)
TSLA: Serial Correlation? NO (p=7.759e-02)

=== DAILY (12/2023–2025) ===
AAPL: Serial Correlation? NO (p=6.898e-01)
AMZN: Serial Correlation? NO (p=4.132e-01)
GOOGL: Serial Correlation? NO (p=6.730e-01)
META: Serial Correlation? NO (p=5.958e-01)
MSFT: Serial Correlation? NO (p=4.932e-01)
NVDA: Serial Correlation? NO (p=4.380e-01)
SPY: Serial Correlation? NO (p=7.303e-01)
TSLA: Serial Correlation? NO (p=7.592e-01)

=== WEEKLY (2019–2025) ===
AAPL: Serial Correlation? NO (p=6.599e-01)
AMZN: Serial Correlation? NO (p=1.165e-01)
GOOGL: Serial Correlation? NO (p=2.221e-01)
META: Serial Correlation? NO (p=6.769e-01)
MSFT: Serial Correlation? NO (p=4.123e-01)
NVDA:




In [5]:
start = '2019-12-01'
end   = '2025-02-17'
mag7  = ['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'META', 'NVDA', 'TSLA', '^SPX']

# read in weight data for Magnificent 7 (as of 1/31/25)
weight_data = pd.read_excel("S&P 500 1-31-25.xlsx", sheet_name="20250131_SP500_CLS")
weight_data = weight_data[['BLOOMBERG TICKER', 'INDEX MARKET CAP']].iloc[:7].copy()
total_mkcap_mag7 = weight_data['INDEX MARKET CAP'].sum()
weight_data['Weight in Mag7 Portfolio'] = weight_data['INDEX MARKET CAP'] / total_mkcap_mag7
weight_data['BLOOMBERG TICKER'] = weight_data['BLOOMBERG TICKER'].str.replace(' UQ', '', regex=False)

# download daily data for dividends
price_data      = {}
dividend_events = []

for ticker in mag7:
    hist = yf.Ticker(ticker).history(start=start, end=end)
    price_data[ticker] = hist
    divs = hist['Dividends'][hist['Dividends'] > 0]
    
    for date, dividend in divs.items():
        try:
            date_idx = hist.index.get_loc(date)
            # skip if no prior day
            if date_idx == 0:
                continue
            prev_date = hist.index[date_idx - 1]
            S         = hist.loc[prev_date, 'Close']
            dividend_events.append({
                'Ticker': ticker,
                'DividendDate': pd.to_datetime(date),
                'Dividend': dividend,
                'PriceBefore': S
            })
        except Exception as e:
            print(f"Error processing {ticker} on {date}: {e}")

df_div = pd.DataFrame(dividend_events).sort_values('DividendDate')
# convert index to date only, ignoring time
df_div['DividendDate'] = pd.to_datetime(df_div['DividendDate']).dt.date
df_div.set_index('DividendDate', inplace=True)

# convert dividends to monthly period, pivot, shift
df_div.index = pd.to_datetime(df_div.index)
df_div['Month'] = df_div.index.to_period('M')

dividends_pivot = (
    df_div
    .groupby(['Month','Ticker'])['Dividend']
    .sum()
    .unstack('Ticker')
    .fillna(0)
)

# shift dividends so that dividends in Feb are added to the March price
dividends_pivot.index = dividends_pivot.index + 1

# download monthly closing prices for the same tickers
monthly_prices = pd.DataFrame()
for ticker in mag7:
    data = yf.download(ticker, start=start, end=end, interval='1mo')['Close']
    data.name = ticker
    monthly_prices = pd.concat([monthly_prices, data], axis=1)

# move index to column and set a monthly PeriodIndex
monthly_prices.reset_index(inplace=True)
monthly_prices['Month'] = monthly_prices['Date'].dt.to_period('M')
monthly_prices.set_index('Month', inplace=True)

# align columns with pivoted dividends and add them
common_div, common_prices = dividends_pivot.align(
    monthly_prices.drop(columns=['Date'], errors='ignore'),
    axis=1, join='outer', fill_value=0
)
adjusted_prices = common_prices.add(common_div, fill_value=0)

# drop final row if it extends beyond the dividend data
adjusted_prices = adjusted_prices.iloc[:-1]

# compute monthly log returns
log_returns = np.log(adjusted_prices / adjusted_prices.shift(1)).dropna()

# build the cap-weighted portfolio returns for Magnificent 7
mag7_only     = ['AAPL','MSFT','NVDA','AMZN','META','GOOGL','TSLA']
mag7_port_wt  = weight_data['Weight in Mag7 Portfolio'].values  # ordering must match mag7_only
mag7_log_rets = log_returns[mag7_only].values
weighted_mag7_rets = mag7_log_rets @ mag7_port_wt

# pull SPX log returns
spx_monthly_logrets = log_returns['^SPX']

# basic Beta Calculation via Cov/Var
beta_mag7 = np.cov(weighted_mag7_rets, spx_monthly_logrets)[0, 1] / np.var(spx_monthly_logrets)
print(f"Simple Beta (Cov/Var) for Magnificent 7: {beta_mag7:.3f}")

# weighted Least Squares (exponential weighting)
X = sm.add_constant(spx_monthly_logrets)
y = weighted_mag7_rets

T = len(y)
lmbda = 0.9832  # decay
weights = np.array([lmbda**(T-1-i) for i in range(T)], dtype=float)

model   = sm.WLS(y, X, weights=weights)
results = model.fit()

print(results.summary())

alpha_est = results.params[0]
beta_est  = results.params[1]
print("\n5Y Monthly Returns Weighted Least Squares Results:")
print(f"Estimated alpha: {alpha_est:.6f}")
print(f"Estimated beta:  {beta_est:.6f}")
print(f"R-squared:       {results.rsquared:.6f}")
print(f"Residual Std:    {results.resid.std():.6f}")

# beta of the other ~493 stocks & contribution
mag7_sp_weight = 0.2837966
beta_493 = (1.0 - mag7_sp_weight*beta_est) / (1.0 - mag7_sp_weight)
print(f"\nBeta of the ~493 other stocks: {beta_493:.6f}")
print(f"Percent contribution of Magnificent 7: {mag7_sp_weight * beta_est:.6f}")


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Simple Beta (Cov/Var) for Magnificent 7: 1.274
                            WLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.641
Model:                            WLS   Adj. R-squared:                  0.635
Method:                 Least Squares   F-statistic:                     107.1
Date:                Fri, 28 Feb 2025   Prob (F-statistic):           5.71e-15
Time:                        12:17:09   Log-Likelihood:                 100.92
No. Observations:                  62   AIC:                            -197.8
Df Residuals:                      60   BIC:                            -193.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const


  alpha_est = results.params[0]
  beta_est  = results.params[1]


## Short Term Beta(18m)
- Weekly total log returns
- WLS regression
- exponential decay rate lambda = .9966571 giving .5 weight to the oldest obs

In [6]:
# tickers, date range, weight data
start = '2023-8-17'
end   = '2025-02-17'
mag7  = ['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'META', 'NVDA', 'TSLA', '^SPX']

# download weekly prices & identify dividend events
price_data      = {}
dividend_events = []

for ticker in mag7:
    stock = yf.Ticker(ticker)
    hist  = stock.history(start=start, end=end)  # daily data for dividends
    price_data[ticker] = hist
    divs  = hist['Dividends'][hist['Dividends'] > 0]

    for dt, dividend in divs.items():
        idx = hist.index.get_loc(dt)
        if idx > 0:
            prev_date   = hist.index[idx - 1]
            price_close = hist.loc[prev_date, 'Close']
            dividend_events.append({
                'Ticker': ticker,
                'DividendDate': pd.to_datetime(dt),
                'Dividend':     dividend,
                'PriceBefore':  price_close
            })

df_div = pd.DataFrame(dividend_events).sort_values('DividendDate')
df_div['DividendDate'] = pd.to_datetime(df_div['DividendDate']).dt.date
df_div.set_index('DividendDate', inplace=True)

# 3. Convert dividends to weekly frequency & sum by ticker
df_div.index = pd.to_datetime(df_div.index)
df_div['Week'] = df_div.index.to_period('W-SUN')
dividends_weekly = df_div.groupby(['Week','Ticker'])['Dividend'].sum().unstack('Ticker').fillna(0)

# 4. Download weekly close prices for Mag7 + SPX
weekly_prices = pd.DataFrame()
for ticker in mag7:
    data      = yf.download(ticker, start=start, end=end, interval='1wk', auto_adjust=False)['Close']
    data.name = ticker
    weekly_prices = pd.concat([weekly_prices, data], axis=1)

# Convert index to weekly PeriodIndex
weekly_prices.index = pd.to_datetime(weekly_prices.index)
weekly_prices['Week'] = weekly_prices.index.to_period('W-SUN')
weekly_prices.set_index('Week', inplace=True)

# 5. Add weekly dividends to the weekly closing prices
aligned_div, aligned_prices = dividends_weekly.align(
    weekly_prices.drop(columns=['Date'], errors='ignore'),
    join='outer', axis=1, fill_value=0
)
adjusted_weekly_prices = aligned_prices.add(aligned_div, fill_value=0)
adjusted_weekly_prices.index = adjusted_weekly_prices.index.to_timestamp()

# 6. Calculate weekly log returns (drop the first row of NaNs)
weekly_log_rets = np.log(adjusted_weekly_prices / adjusted_weekly_prices.shift(1)).dropna()

# build the cap-weighted Mag7 portfolio returns
mag7_only        = ['AAPL', 'MSFT', 'NVDA', 'AMZN', 'META', 'GOOGL', 'TSLA']
mag7_port_weights = weight_data['Weight in Mag7 Portfolio'].values
mag7_log_rets = weekly_log_rets[mag7_only].values
weighted_mag7_rets = mag7_log_rets @ mag7_port_weights

# SPX returns
spx_weekly_logrets = weekly_log_rets['^SPX']

# simple Beta Calculation (Cov/Var)
beta_mag7 = np.cov(weighted_mag7_rets, spx_weekly_logrets)[0,1] / np.var(spx_weekly_logrets)
print(f"Simple Cov/Var Beta for Mag7 weekly returns: {beta_mag7:.6f}")


# Weighted Least Squares Beta (Exponential Weighting)
X = sm.add_constant(spx_weekly_logrets)
y = weighted_mag7_rets
T = len(y)
lmbda = .9966571 # giving 50% weight to the oldest obs 

weights = np.array([lmbda**(T-1 - i) for i in range(T)], dtype=float)
model = sm.WLS(y, X, weights=weights)
results = model.fit()

print(results.summary())

alpha_est = results.params.iloc[0]
beta_est  = results.params.iloc[1]
print("\n18M Weekly Returns Weighted Least Squares Results:")
print(f"Estimated alpha: {alpha_est:.6f}")
print(f"Estimated beta:  {beta_est:.6f}")
print(f"R-squared:       {results.rsquared:.6f}")
print(f"Residual Std:    {results.resid.std():.6f}")


# beta of Other ~493 Stocks + Magnificent 7 Contribution
mag7_sp_weight = 0.2837966
beta_493 = (1.0 - mag7_sp_weight * beta_est) / (1.0 - mag7_sp_weight)
contrib   = mag7_sp_weight * beta_est

print(f"\nThe Beta of the ~493 other stocks in the SP500 is: {beta_493:.6f}")
print(f"The percentage of SPX beta contributed by the Magnificent 7 is: {contrib:.6f}")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Simple Cov/Var Beta for Mag7 weekly returns: 1.446791
                            WLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.700
Model:                            WLS   Adj. R-squared:                  0.696
Method:                 Least Squares   F-statistic:                     177.6
Date:                Fri, 28 Feb 2025   Prob (F-statistic):           1.40e-21
Time:                        12:17:21   Log-Likelihood:                 210.09
No. Observations:                  78   AIC:                            -416.2
Df Residuals:                      76   BIC:                            -411.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------


