# Momentum, Volatility, and Volume Factors in U.S. Stock Returns

**ISYE 4031 Final Project**  
*Regression & Forecasting, Georgia Tech*

## üìä Project Overview

This notebook analyzes the relationship between **momentum**, **volatility**, and **volume** factors in U.S. stock returns using S&P 500 data.

### Research Questions:
1. Do momentum indicators significantly predict future stock returns?
2. How does volatility clustering affect return predictability? 
3. Is trading volume a reliable indicator of price direction?

---

In [56]:
import yfinance as yf
import pandas as pd
from pandas_datareader import data as pdr
import datetime as dt
import numpy as np
from bs4 import BeautifulSoup
import requests, re
import ta

## üìà Data Collection

### Step 1: S&P 500 Stock List
We start by scraping the current S&P 500 stock list from a reliable financial data source.

**Data Source**: [Stock Analysis - S&P 500](https://stockanalysis.com/list/sp-500-stocks/)

**Key Information Collected**:
- Stock symbols (tickers)
- Market capitalization

In [2]:
url = 'https://stockanalysis.com/list/sp-500-stocks/'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')

# Find the table and extract headers
table = soup.find('table', class_='symbol-table svelte-1ro3niy')
headers = [th.get_text(strip=True) for th in table.find('tr').find_all('th')]

# Extract all row data
stocks_data = []
for row in table.find_all('tr')[1:]:  # Skip header row
    row_data = [cell.get_text(strip=True) for cell in row.find_all('td')]
    stocks_data.append(row_data)

# Create DataFrame and set No. column as index
sp500_df = pd.DataFrame(stocks_data, columns=headers)
sp500_df = sp500_df.set_index('No.')

print("\nFirst 10 rows:")
print(sp500_df.head(10)[['Symbol', 'Market Cap']])


First 10 rows:
    Symbol Market Cap
No.                  
1     NVDA      4.81T
2     AAPL      3.99T
3     MSFT      3.76T
4    GOOGL      3.49T
5     GOOG      3.37T
6     AMZN      2.66T
7     AVGO      1.69T
8     META      1.60T
9     TSLA      1.49T
10   BRK.B      1.08T


### Step 2: Stock Selection & Date Range Setup

**Stock Selection Process:**
- Extract first 50 companies from S&P 500 list for analysis
- Focus on established companies for reliable historical data

**Analysis Time Period:**
- üìÖ **Start Date**: January 1, 2021
- üìÖ **End Date**: December 27, 2024
- ‚è±Ô∏è **Duration**: 4 years of market data
- üéØ **Purpose**: Capture post-pandemic market trends and recovery patterns

> **Note**: Using a subset of the top 50 stocks for computational efficiency and financial significance.

In [51]:
stocks = sp500_df.head(5)['Symbol'].tolist()
stocks.sort()
startDate = dt.date(2021, 1, 4)
endDate = dt.date(2024, 12, 27)

Getting Return and Indicators for each stock

In [None]:

try:
    download = yf.download(
        tickers = stocks,
        start = startDate,
        end = endDate,
        actions = False, threads = True, auto_adjust = True, rounding = True,
        group_by = 'tickers', 
        interval = '1wk'
    )
    
    # Extract Open and Close data
    open_data = download.xs('Open', level=1, axis=1)
    close_data = download.xs('Close', level=1, axis=1)
    
    # Calculate log returns: ln(Close/Open) * 100 for percentage
    log_returns = (np.log(close_data / open_data) * 100)
    
    # Create MultiIndex DataFrame
    columns = []
    for ticker in stocks:
        columns.extend([(ticker, 'Open'), (ticker, 'Close'), (ticker, 'Log_Return_%')])
    multi_columns = pd.MultiIndex.from_tuples(columns, names=['Ticker', 'Data_Type'])
    weekly_data = pd.DataFrame(index=open_data.index, columns=multi_columns)
    
    # Fill in the data
    for ticker in stocks:
        weekly_data[(ticker, 'Open')] = open_data[ticker].round(2)
        weekly_data[(ticker, 'Close')] = close_data[ticker].round(2)
        weekly_data[(ticker, 'Log_Return_%')] = log_returns[ticker].round(2)
    
    # Add week numbers as a separate col
    weekly_data.insert(0, 'Week', range(1, len(weekly_data) + 1))
    
    print(f"Total weeks: {len(weekly_data)}")
    print(f"Date range: {weekly_data.index[0].date()} to {weekly_data.index[-1].date()}")
    print(f"DataFrame shape: {weekly_data.shape}")
    print(f"Column structure: {weekly_data.columns.names}")
    
    display(weekly_data)
        
except Exception as e:
    print(f"Error: {e}")

[*********************100%***********************]  5 of 5 completed

Total weeks: 208
Date range: 2021-01-04 to 2024-12-23
DataFrame shape: (208, 16)
Column structure: ['Ticker', 'Data_Type']

üîç First 10 weeks of data:





Ticker,Week,AAPL,AAPL,AAPL,GOOG,GOOG,GOOG,GOOGL,GOOGL,GOOGL,MSFT,MSFT,MSFT,NVDA,NVDA,NVDA
Data_Type,Unnamed: 1_level_1,Open,Close,Log_Return_%,Open,Close,Log_Return_%,Open,Close,Log_Return_%,Open,Close,Log_Return_%,Open,Close,Log_Return_%
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
2021-01-04,1,130.10,128.67,-1.11,87.28,89.75,2.79,87.40,89.28,2.13,213.93,211.13,-1.32,13.07,13.24,1.29
2021-01-11,2,125.88,123.88,-1.60,88.70,86.22,-2.84,88.24,85.79,-2.82,210.02,204.43,-2.70,13.38,12.82,-4.28
2021-01-18,3,124.51,135.51,8.47,87.02,94.41,8.15,86.55,93.98,8.24,205.49,217.21,5.55,12.98,13.67,5.18
2021-01-25,4,139.41,128.58,-8.09,95.38,91.16,-4.53,94.98,90.74,-4.57,220.26,222.99,1.23,13.74,12.95,-5.92
2021-02-01,5,130.33,133.26,2.22,92.05,104.19,12.39,91.60,103.73,12.44,225.97,232.84,2.99,13.02,13.55,3.99
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-25,204,230.64,236.49,2.50,167.26,169.75,1.48,165.36,168.21,1.71,416.09,421.15,1.21,141.95,138.21,-2.67
2024-12-02,205,236.43,241.98,2.32,169.58,175.72,3.56,168.03,173.94,3.46,419.27,441.15,5.09,138.79,142.40,2.57
2024-12-09,206,240.97,247.25,2.57,174.95,190.55,8.54,173.20,188.99,8.72,440.18,444.83,1.05,138.94,134.22,-3.46
2024-12-16,207,247.11,253.59,2.59,193.74,192.34,-0.73,192.24,190.79,-0.76,444.83,434.21,-2.42,134.15,134.67,0.39


Getting Technical Indicators for Stocks

In [None]:
# Download data for technical indicator calculations using ta library
try:
    # Need more data to calculate technical indicators properly
    extended_start = dt.date(2020, 1, 1)  # Start earlier for indicator calculations
    
    tech_download = yf.download(
        tickers = stocks,
        start = extended_start,
        end = dt.date(2021, 1, 15),  # Just past first week
        actions = False, threads = True, auto_adjust = True, rounding = True,
        group_by = 'tickers', 
        interval = '1d'  # Daily data for better indicator calculations
    )
    
    # Extract price data
    close_prices = tech_download.xs('Close', level=1, axis=1)
    high_prices = tech_download.xs('High', level=1, axis=1)
    low_prices = tech_download.xs('Low', level=1, axis=1)
    volume_data = tech_download.xs('Volume', level=1, axis=1)
    
    # Find the first week of our analysis period (Jan 4-8, 2021)
    first_week_start = pd.Timestamp('2021-01-04')
    first_week_end = pd.Timestamp('2021-01-08')
    
    # Filter to first week
    first_week_mask = (close_prices.index >= first_week_start) & (close_prices.index <= first_week_end)
    first_week_close = close_prices[first_week_mask]
    
    # Create DataFrame with stocks as rows, indicators as columns
    first_week_indicators = pd.DataFrame(index=stocks, columns=['Log_Return_%', 'ROC', 'RVOL', 'BBW'])
    
    for ticker in stocks:
        # Get price data up to first week end for calculations
        ticker_data = pd.DataFrame({
            'close': close_prices.loc[:first_week_end, ticker],
            'high': high_prices.loc[:first_week_end, ticker],
            'low': low_prices.loc[:first_week_end, ticker],
            'volume': volume_data.loc[:first_week_end, ticker]
        }).dropna()
        
        if len(ticker_data) < 20:  # Need enough data for indicators
            first_week_indicators.loc[ticker] = [np.nan, np.nan, np.nan, np.nan]
            continue
            
        # Log Return % for first week (manual calculation)
        week_open = first_week_close[ticker].iloc[0] if len(first_week_close) > 0 else np.nan
        week_close = first_week_close[ticker].iloc[-1] if len(first_week_close) > 0 else np.nan
        log_return = np.log(week_close / week_open) * 100 if not pd.isna(week_open) and not pd.isna(week_close) else np.nan
        
        # Calculate indicators using ta library (with correct class names)
        # Rate of Change (ROC) - 10-day
        roc = ta.momentum.ROCIndicator(close=ticker_data['close'], window=10).roc().iloc[-1] * 100
        
        # Relative Volume (RVOL) - Current week volume vs 20-day SMA volume
        # Use simple rolling mean since ta library doesn't have VolumeSMAIndicator
        volume_sma_20 = ticker_data['volume'].rolling(window=20).mean()
        avg_volume = volume_sma_20.iloc[-1] if not volume_sma_20.empty else np.nan
        current_week_volume = ticker_data['volume'].iloc[-5:].mean()  # Last 5 days average
        rvol = (current_week_volume / avg_volume) if not pd.isna(avg_volume) and avg_volume != 0 else np.nan
        
        # Bollinger Band Width (BBW) using ta library
        bb_indicator = ta.volatility.BollingerBands(close=ticker_data['close'], window=20, window_dev=2)
        upper = bb_indicator.bollinger_hband().iloc[-1]
        lower = bb_indicator.bollinger_lband().iloc[-1]
        middle = bb_indicator.bollinger_mavg().iloc[-1]
        bbw = ((upper - lower) / middle) * 100 if not pd.isna(upper) and not pd.isna(lower) and middle != 0 else np.nan
            
        # Add row to DataFrame
        first_week_indicators.loc[ticker] = [log_return, roc, rvol, bbw]
    
    print(f"üìä First Week Technical Analysis (Using ta library)")
    print(f"Analysis Period: {first_week_start.date()} to {first_week_end.date()}")
    print(f"Stocks Analyzed: {len(stocks)}")
    print(f"\nüîç Technical Indicators for First Week:")
    print("‚Ä¢ Log_Return_%: Weekly log return")
    print("‚Ä¢ ROC: 10-day Rate of Change (ta.momentum)")
    print("‚Ä¢ RVOL: Relative Volume vs 20-day average (pandas rolling)")
    print("‚Ä¢ BBW: Bollinger Band Width (ta.volatility)")
    print()
    print("DataFrame Structure: Stocks as rows, Indicators as columns")
    
    display(first_week_indicators.round(4))
    
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()

[*********************100%***********************]  5 of 5 completed

Error: module 'ta.volume' has no attribute 'VolumeSMAIndicator'



Traceback (most recent call last):
  File "/var/folders/6y/spt598r17_dgcfx4prrmwk4m0000gq/T/ipykernel_4575/683111224.py", line 55, in <module>
    volume_sma = ta.volume.VolumeSMAIndicator(close=ticker_data['close'], volume=ticker_data['volume'], window=20).volume_sma()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'ta.volume' has no attribute 'VolumeSMAIndicator'
