# S&P 500 Best and Worst Days Analysis

**Data Source:** [S&P 500 Historical Data - Stooq](https://stooq.com/q/d/?s=%5Espx)

**Note:** Historical data prior to 1957 (when the S&P 500 Index was officially founded) are "rebuilt" using retroactive calculations based on constituent companies that would have been included.

**Disclaimer:** This analysis is for educational purposes only and should not be considered as financial advice.

This notebook performs exploratory data analysis and visualization of S&P 500 historical price data.

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

# Fix the path construction - use forward slashes or raw strings
import os

# Method 1: Use forward slashes (recommended)
path = os.path.abspath(os.path.join("..", "04_S&P500_quant_analysis", "01_data", "S&P500_D_1789-05-01_2025-10-10.csv"))

# Alternative Method 2: Use raw string
# path = os.path.abspath(os.path.join("..", r"04_S&P500_quant_analysis\01_data", "S&P500_D_1789-05-01_2025-09-17.csv"))

# Alternative Method 3: Use pathlib (modern approach)
# from pathlib import Path
# path = Path("..") / "04_S&P500_quant_analysis" / "01_data" / "S&P500_D_1789-05-01_2025-09-17.csv"

print(f"Path exists: {os.path.exists(path)}")

# Read the CSV file
df = pd.read_csv(path)

# lower case column names for easier access
df.columns = [col.lower() for col in df.columns]

# Set the 'date' column as the index and convert it to datetime
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Show the first few rows of the dataframe
df.head()

Path exists: True


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1789-05-01,0.51,0.51,0.51,0.51,0.0
1789-06-01,0.51,0.51,0.51,0.51,0.0
1789-07-01,0.5,0.5,0.5,0.5,0.0
1789-08-01,0.5,0.51,0.5,0.51,0.0
1789-09-01,0.51,0.51,0.5,0.51,0.0


In [15]:
# filter DatetimeIndex higher than year 1957
df = df[df.index > '1957-01-01']

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17311 entries, 1957-01-02 to 2025-10-10
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    17311 non-null  float64
 1   high    17311 non-null  float64
 2   low     17311 non-null  float64
 3   close   17311 non-null  float64
 4   volume  17311 non-null  float64
dtypes: float64(5)
memory usage: 811.5 KB


In [16]:
df.shape

(17311, 5)

In [17]:
# # plot close prices over time
# plt.figure(figsize=(14, 7))
# plt.plot(df.index, df['close'], label='Close Price')
# plt.title('S&P 500 Close Prices Over Time')
# plt.xlabel('Date')
# plt.ylabel('Close Price')
# plt.legend()
# plt.grid()
# plt.show()

In [18]:
# Calculate simple returns with pct_change()
simple_returns = df['close'].pct_change().fillna(0)

# Calculate log returns
log_returns = (1 + simple_returns).apply(np.log)

# Cumulative sum of simple returns (not correct for compounding)
cumsum_simple = simple_returns.cumsum()

In [19]:
simple_returns.tail()

date
2025-10-06    0.003647
2025-10-07   -0.003811
2025-10-08    0.005828
2025-10-09   -0.002756
2025-10-10   -0.027112
Name: close, dtype: float64

In [20]:
# top 100 worst days
worst_days = simple_returns.nsmallest(100)

# top 100 best days
best_days = simple_returns.nlargest(100)

# print results
print("Top 100 Worst Days:")
print(worst_days)

print("\nTop 100 Best Days:")
print(best_days)

Top 100 Worst Days:
date
1987-10-19   -0.204669
2020-03-16   -0.119841
2020-03-12   -0.095113
2008-10-15   -0.090350
2008-12-01   -0.089295
                ...   
2018-10-10   -0.032864
2022-06-16   -0.032512
2018-12-04   -0.032365
2010-05-06   -0.032354
2002-09-27   -0.032259
Name: close, Length: 100, dtype: float64

Top 100 Best Days:
date
2008-10-13    0.115800
2008-10-28    0.107890
2025-04-09    0.095154
2020-03-24    0.093828
2020-03-13    0.092871
                ...   
2000-10-13    0.033381
2000-04-25    0.033276
2003-01-02    0.033200
2000-04-17    0.033084
2010-05-27    0.032876
Name: close, Length: 100, dtype: float64
