## Indicator-Based Benchmark Regression

We want to explain the excess returns of our portfolio (5 stocks over two quarters) using both **fundamental** and **technical indicators**.

---

### 1. Data Setup

For each stock $j = 1, \dots, N$ and each time $t = 1, \dots, T$, we have:

- $R_{j,t}$ = stock return (from "Return" column)  
- $R_{f,t}$ = risk-free rate (assume 4.25% annual, converted to daily: $R_f \approx 0.0425/255$)  
- Indicators:
  - $SMA_{20,j,t}$ = 20-day simple moving average on return 
  - $EMA_{20,j,t}$ = 20-day exponential moving average on return
  - $Vol_{20,j,t}$ = 20-day volatility on return
  - $RSI_{14,j,t}$ = relative strength index on return
  - $PE_{j,t}$ = P/E ratio  
  - $EPS_{j,t}$ = earnings per share  
  - $PB_{j,t}$ = price-to-book ratio  
  - $DY_{j,t}$ = dividend yield

---

### 2. Regression Model

The benchmark model is:

$R_{j,t} - R_{f,t} = \beta_0 + \beta_1 SMA_{20,j,t} + \beta_2 EMA_{20,j,t} + 
\beta_3 Vol_{20,j,t} + \beta_4 RSI_{14,j,t} + \beta_5 PE_{j,t} +
\beta_6 EPS_{j,t} + \beta_7 PB_{j,t} + \beta_8 DY_{j,t} + \epsilon_{j,t}$

- $\beta_i$: sensitivity of returns to indicator $i$  
- $\epsilon_{j,t}$: residual unexplained return (alpha)  

---

### 3. Portfolio Benchmark Return

Predicted return for stock $j$:

$\hat{R}_{j,t} = \hat{\beta}_0 + \sum_{i=1}^{8} \hat{\beta}_i X_{i,j,t}$

Portfolio benchmark return:

$\hat{R}_{p,t} = \sum_{j=1}^N w_j \hat{R}_{j,t}$

where $w_j$ are portfolio weights (equal weight if unspecified).  

---

### 4. Interpretation of Coefficients

- $\beta_1, \beta_2 > 0$: Portfolio benefits when SMA/EMA are high (momentum effect).  
- $\beta_3 < 0$: Higher volatility reduces expected returns (risk aversion).  
- $\beta_4 > 0$: Higher RSI (overbought condition) still correlates with return (short-term momentum).  
- $\beta_5, \beta_7 < 0$: High P/E or P/B ratios lower expected returns (value premium).  
- $\beta_6 > 0$: Higher EPS boosts expected returns (profitability factor).  
- $\beta_8 > 0$: Dividend yield contributes positively (income premium).  

---

### 5. Benchmark & Alpha

- **Benchmark return** = $\hat{R}_{p,t}$ (explained by indicators)  
- **Alpha** = $R_{p,t} - \hat{R}_{p,t}$ (unexplained skill/edge)  

- Excess returns are computed as $R_{j,t} - R_{f,t}$ with $R_{f,t} \approx 0.0425/255$ daily.


In [2]:
import pandas as pd
import statsmodels.api as sm
import numpy as np

# --- Load dataset ---
df = pd.read_csv("OSEBX_Q1Q2_with_indicators_on_returns.csv", sep=";")

# --- Compute excess returns ---
df["ExcessReturn"] = pd.to_numeric(df["Return"], errors="coerce")

# --- Numeric predictors ---
predictors = ["SMA_20", "EMA_20", "Volatility_20", "RSI_14",
              "PE_ratio", "EPS", "PB_ratio", "DividendYield"]

# Convert predictors to numeric
for col in predictors:
    df[col] = pd.to_numeric(df[col], errors="coerce")

# --- Dummy variables for ticker fixed effects ---
ticker_dummies = pd.get_dummies(df["Ticker"], prefix="Ticker", drop_first=True)

# --- Combine predictors and ensure numeric ---
X = pd.concat([df[predictors], ticker_dummies], axis=1)
X = sm.add_constant(X)

# Force all X columns to float
X = X.astype(float)
y = df["ExcessReturn"].astype(float) - 0.0425 / 255

# --- Drop rows with NaNs ---
valid_idx = X.notna().all(axis=1) & y.notna()
X = X.loc[valid_idx]
y = y.loc[valid_idx]

# --- Fit OLS regression ---
model = sm.OLS(y, X).fit()
print(model.summary())

# --- Predicted benchmark returns ---
df["BenchmarkReturn"] = model.predict(X)

# --- Portfolio benchmark (equal weight per day) ---
portfolio_benchmark = df.groupby("Date")["BenchmarkReturn"].mean()
portfolio_actual = df.groupby("Date")["ExcessReturn"].mean()

# --- Residual (alpha) ---
alpha_series = portfolio_actual - portfolio_benchmark

print("\nAverage Benchmark Return:", portfolio_benchmark.mean())
print("Average Actual Return:", portfolio_actual.mean())
print("Average Alpha:", alpha_series.mean())


                            OLS Regression Results                            
Dep. Variable:           ExcessReturn   R-squared:                       0.537
Model:                            OLS   Adj. R-squared:                  0.527
Method:                 Least Squares   F-statistic:                     49.58
Date:                Sun, 07 Sep 2025   Prob (F-statistic):           8.29e-78
Time:                        14:44:00   Log-Likelihood:                 1597.4
No. Observations:                 525   AIC:                            -3169.
Df Residuals:                     512   BIC:                            -3113.
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const            -0.0952      0.018     -5.435

# Plan: From Benchmarking to Stock Selection

## 1. Current Setup
So far, we have a dataset where for each stock $j$ at time $t$ we observe:

- $R_{j,t}$ = daily return of stock $j$
- $R_{f,t}$ = daily risk-free rate (set to zero if negligible)
- Indicator variables $X_{i,j,t}$ including:
  - $SMA_{20,j,t}$ = 20-day simple moving average (on returns)
  - $EMA_{20,j,t}$ = 20-day exponential moving average (on returns)
  - $Volatility_{20,j,t}$ = rolling 20-day standard deviation of returns
  - $RSI_{14,j,t}$ = relative strength index (on returns)
  - $PE_{j,t}$ = price-to-earnings ratio
  - $EPS_{j,t}$ = earnings per share
  - $PB_{j,t}$ = price-to-book ratio
  - $DY_{j,t}$ = dividend yield

The **benchmark regression model** was written as:

$$
R_{j,t} - R_{f,t} = \beta_0 + \sum_{i=1}^n \beta_i X_{i,j,t} + \epsilon_{j,t}
$$

This allowed us to see how indicators correlate with returns.

---

## 2. Next Step: Reverse the Perspective