# Phase 1: Automated Ticker Selection (Bias Removal)

## Objective
Instead of manually selecting stocks based on intuition, we will **systematically** compute Beta ($\beta$) for all S&P 500 constituents and select the top 30 high-beta and bottom 30 low-beta stocks. This removes human bias and ensures our analysis covers a diverse range of market sensitivities.

## Methodology
1. Load S&P 500 constituent tickers from `constituents.csv`
2. Download 1 year of daily adjusted close prices for all stocks + SPY (benchmark)
3. Calculate log returns: $R_t = \ln(P_t / P_{t-1})$
4. Compute Beta via linear regression: $R_{i,t} = \alpha + \beta R_{SPY,t} + \epsilon$
5. Sort stocks by Beta and extract:
   - **High Beta**: Top 30 stocks with $\beta > 1.5$ (high market sensitivity)
   - **Low Beta**: Bottom 30 stocks with $\beta < 0.6$ (low market sensitivity)
6. Save results to `data/tickers/high_beta.csv` and `data/tickers/low_beta.csv`

---

## 1. Import Libraries and Setup

In [None]:
import os

import pandas as pd
import numpy as np

import yfinance as yf

from scipy import stats
from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt
import seaborn as sns

from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

## 2. Load S&P 500 Constituents

In [2]:
drive.mount('/content/drive')
notebook_dir = '/content/drive/MyDrive/market-sentiment-impact-analysis/notebooks'

# Load the S&P 500 constituents
constituents_path = '../data/raw/constituents.csv'
sp500 = pd.read_csv(constituents_path)

print(f'Total S&P 500 constituents loaded: {len(sp500)}')
print(f'\nColumns: {list(sp500.columns)}')
print(f'\nFirst 5 stocks:')
sp500.head()

ValueError: mount failed