# Stock Momentum Calculation

This notebook calculates momentum for all stocks in the Rice Business Stock Market Data Portal.

Momentum is defined as:
- (Price 1 month ago / Price 12 months ago) - 1

This captures the return from 12 months ago to 1 month ago, which is a common momentum measure that skips the most recent month.

## Setup and Connect to Rice Data Portal

In [None]:
from rice_data_client import RiceDataClient
from dotenv import load_dotenv
import pandas as pd
import numpy as np
import os

# Load environment variables from .env file
load_dotenv()

# Get configuration from environment
ACCESS_TOKEN = os.getenv('USER_ACCESS_TOKEN')
BASE_URL = os.getenv('RICE_DATA_URL', 'https://portal.rice-business.org')

# Connect to Rice Data Portal
client = RiceDataClient(
    access_token=ACCESS_TOKEN,
    base_url=BASE_URL
)

print("Connected to Rice Data Portal")

## Query Daily Prices Year by Year

The SEP table contains daily prices. We'll download data one year at a time to avoid timeouts, then filter in Python to get only the last trading day of each month.

In [None]:
# Set the start year for data download
start_year = 2023

# Get current year
import datetime
current_year = datetime.datetime.now().year

# Initialize empty list to collect end-of-month data only
all_data = []

# Loop through years from start_year to current year
for year in range(start_year, current_year + 1):
    print(f"Downloading data for {year}...")
    
    # SQL query to get daily adjusted closing prices for one year
    sql = f"""
    SELECT 
        ticker,
        date,
        closeadj
    FROM sep
    WHERE date::DATE >= '{year}-01-01' 
      AND date::DATE < '{year + 1}-01-01'
    ORDER BY ticker, date
    """
    
    # Execute query
    df_year = client.query(sql)
    print(f"  Downloaded {len(df_year):,} daily observations")
    
    # Convert date to datetime
    df_year['date'] = pd.to_datetime(df_year['date'])
    
    # Filter to end-of-month dates for this year
    # This reduces memory usage by filtering before accumulating
    df_year['year_month'] = df_year['date'].dt.to_period('M')
    df_month_end = df_year.groupby(['ticker', 'year_month']).apply(
        lambda x: x.loc[x['date'].idxmax()]
    ).reset_index(drop=True)
    df_month_end = df_month_end.drop(columns=['year_month'])
    
    print(f"  Filtered to {len(df_month_end):,} end-of-month observations")
    
    # Keep only the end-of-month data
    all_data.append(df_month_end)
    # df_year is no longer referenced and will be garbage collected

# Combine all years into one dataframe
print("\nCombining all years...")
df = pd.concat(all_data, ignore_index=True)
print(f"Total: {len(df):,} end-of-month observations for {df['ticker'].nunique():,} tickers")

# Display first few rows
df.head(10)

## Data Preparation

In [None]:
# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

# Sort by ticker and date
df = df.sort_values(['ticker', 'date'])

# Display summary statistics
print("\nDate range:")
print(f"  Start: {df['date'].min()}")
print(f"  End: {df['date'].max()}")
print(f"\nNumber of tickers: {df['ticker'].nunique():,}")
print(f"Total observations: {len(df):,}")

## Calculate Momentum

Momentum = (Price 1 month ago / Price 12 months ago) - 1

We use `.shift(1)` to get the price 1 month ago and `.shift(12)` to get the price 12 months ago.

In [None]:
# Calculate lagged prices by ticker
df['price_lag1'] = df.groupby('ticker')['closeadj'].shift(1)
df['price_lag12'] = df.groupby('ticker')['closeadj'].shift(12)

# Calculate momentum
df['momentum'] = (df['price_lag1'] / df['price_lag12']) - 1

# Display results
print(f"\nMomentum calculated for {df['momentum'].notna().sum():,} observations")
print(f"Missing momentum values: {df['momentum'].isna().sum():,}")

# Show first few rows with momentum
df[['ticker', 'date', 'closeadj', 'price_lag1', 'price_lag12', 'momentum']].head(20)

## Summary Statistics

In [None]:
# Remove missing momentum values for statistics
df_clean = df.dropna(subset=['momentum'])

print("Momentum Summary Statistics:")
print("="*50)
print(df_clean['momentum'].describe())

# Additional statistics
print(f"\nPercentiles:")
print(f"  10th: {df_clean['momentum'].quantile(0.10):.4f}")
print(f"  25th: {df_clean['momentum'].quantile(0.25):.4f}")
print(f"  50th: {df_clean['momentum'].quantile(0.50):.4f}")
print(f"  75th: {df_clean['momentum'].quantile(0.75):.4f}")
print(f"  90th: {df_clean['momentum'].quantile(0.90):.4f}")

## Visualize Momentum Distribution

In [None]:
import matplotlib.pyplot as plt

# Create histogram of momentum values
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Full distribution
axes[0].hist(df_clean['momentum'], bins=100, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Momentum')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Momentum Distribution (All Values)')
axes[0].axvline(0, color='red', linestyle='--', linewidth=2, label='Zero')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Zoomed in distribution (between -1 and 2)
df_zoom = df_clean[(df_clean['momentum'] >= -1) & (df_clean['momentum'] <= 2)]
axes[1].hist(df_zoom['momentum'], bins=100, edgecolor='black', alpha=0.7)
axes[1].set_xlabel('Momentum')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Momentum Distribution (Zoomed: -100% to +200%)')
axes[1].axvline(0, color='red', linestyle='--', linewidth=2, label='Zero')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nPercentage of stocks with positive momentum: {(df_clean['momentum'] > 0).mean() * 100:.1f}%")
print(f"Percentage of stocks with negative momentum: {(df_clean['momentum'] < 0).mean() * 100:.1f}%")

## Example: View Recent Momentum for Specific Stocks

In [None]:
# Example tickers to examine
example_tickers = ['AAPL', 'MSFT', 'GOOGL', 'TSLA', 'JPM']

# Get most recent 12 months of data for these tickers
recent_data = df[df['ticker'].isin(example_tickers)].groupby('ticker').tail(12)

# Display
for ticker in example_tickers:
    ticker_data = recent_data[recent_data['ticker'] == ticker][['date', 'closeadj', 'momentum']].tail(12)
    if not ticker_data.empty:
        print(f"\n{ticker}:")
        print(ticker_data.to_string(index=False))
        latest_momentum = ticker_data['momentum'].iloc[-1]
        if pd.notna(latest_momentum):
            print(f"Latest momentum: {latest_momentum:.2%}")

## Save Momentum Data to Disk

We'll save the momentum data in Parquet format, which offers several advantages:
- **Smaller file size**: Compressed format, typically 5-10x smaller than CSV
- **Preserves data types**: Dates and numbers are stored natively, no parsing needed when reading
- **Faster I/O**: Quicker to read and write than CSV
- **Industry standard**: Widely used in data science and finance

To read the data back later, use: `df = pd.read_parquet('momentum.parquet')`

In [None]:
# Select relevant columns
output_df = df[['ticker', 'date', 'closeadj', 'momentum']].copy()

# Save to Parquet format
output_filename = 'momentum.parquet'
output_df.to_parquet(output_filename, index=False)

print(f"Momentum data saved to {output_filename}")
print(f"Total rows: {len(output_df):,}")
print(f"Rows with momentum: {output_df['momentum'].notna().sum():,}")

# Display file size
import os
file_size_mb = os.path.getsize(output_filename) / (1024 * 1024)
print(f"File size: {file_size_mb:.2f} MB")

## Summary

This notebook:
1. Connected to Rice Data Portal using rice_data_client
2. Downloaded daily adjusted closing prices from the SEP table year by year (to avoid timeouts)
   - Start year is configurable (currently set to 2023)
   - Filtered to end-of-month inside the loop to minimize memory usage
3. Combined end-of-month prices from all years
4. Calculated momentum for each stock as (Price_t-1 / Price_t-12) - 1
5. Analyzed the distribution of momentum values
6. Saved the results to a Parquet file for efficient storage and future use

**Note**: The SEP table contains daily prices, not monthly prices. We filter for end-of-month dates after downloading.

The momentum measure skips the most recent month and looks at returns from 12 months ago to 1 month ago, which is a standard approach in momentum investing to avoid short-term reversals.

**To use the saved data later:**
```python
import pandas as pd
df_momentum = pd.read_parquet('momentum.parquet')
```