# Bubble analysisThis notebook examines three major market bubbles: Nifty Fifty (1968-1975), Dot-Com (1995-2002), and Housing (1998-2010). It fetches public data, engineers features, visualizes dynamics, and runs simple statistical models.

## IntroductionWe explore how valuations, momentum, and macro housing indicators behaved before, during, and after each bubble. The analysis uses public sources (Shiller and FRED) so it can be reproduced from scratch.

In [None]:
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom src.clean_transform import prepare_features, compute_bubble_summary, BUBBLE_WINDOWSfrom src.visuals import plot_index_with_bubbles, plot_valuation, plot_dotcom_comparison, plot_housing, plot_volatilityfrom src.models import regression_valuation_vs_return, crash_vs_normal_test, volatility_regimes, fit_arima_baseline, fit_arimax, walk_forward_accuracypd.options.display.float_format = "{:.3f}".format

## Data and preparationThe steps below pull Shiller CAPE and S&P 500 levels along with FRED series for NASDAQ, Case-Shiller, mortgage debt, and homeownership. All series are resampled to month end and merged. Features include returns, drawdowns, trailing performance, and bubble flags based on valuation and momentum.

In [None]:
df = prepare_features(force=False)df.head()

In [None]:
print(df.describe().T[['mean','std','min','max']].head())

## Exploratory data analysisWe inspect levels, valuations, and drawdowns with bubble windows shaded. Each figure includes a short interpretation.

In [None]:
fig1 = plot_index_with_bubbles(df)plt.show()print("S&P 500 shows sharp run-ups before each bubble peak and deep drawdowns afterward.")

In [None]:
fig2 = plot_valuation(df)plt.show()print("CAPE spikes near bubble peaks, with the Dot-Com peak standing out as the highest valuation.")

In [None]:
fig3 = plot_dotcom_comparison(df)plt.show()print("NASDAQ outpaced the S&P 500 during the Dot-Com boom and suffered a steeper drawdown.")

In [None]:
fig4 = plot_housing(df)plt.show()print("Case-Shiller accelerated in the 2000s with YoY z-scores above the overvaluation threshold before the decline.")

In [None]:
fig5 = plot_volatility(df)plt.show()print("Volatility rises during and after peaks, highlighting regime shifts around crashes.")

## Statistical inferenceWe relate valuations to forward returns and test whether crash periods had different return means than the full sample.

In [None]:
model = regression_valuation_vs_return(df, horizon=60)print(model.summary().as_text().split('\n')[0:12])print("Higher CAPE is associated with lower forward annualized returns.")

In [None]:
crash_mask = df['sp500_drawdown'] < -0.2crash_test = crash_vs_normal_test(df, crash_mask)print(crash_test)print("Crash window returns are meaningfully lower than the long run average.")

## Modeling approachWe evaluate simple time series models: an ARIMA baseline on S&P 500 returns and an ARIMAX that adds CAPE and bubble flags as exogenous drivers. Walk forward evaluation focuses on directional accuracy and mean absolute error.

In [None]:
arima_baseline = fit_arima_baseline(df)print(arima_baseline.summary().tables[0])

In [None]:
arimax_model = fit_arimax(df)print(arimax_model.summary().tables[0])

In [None]:
metrics = walk_forward_accuracy(df)print(metrics)

## EvaluationWe compare volatility regimes around peaks and summarize each bubble's peak date, valuation, run-up, and drawdown depth.

In [None]:
summary_table = compute_bubble_summary(df)print(summary_table)

In [None]:
for name, (start, end) in BUBBLE_WINDOWS.items():    peak = df.loc[start:end, 'sp500'].idxmax()    vols = volatility_regimes(df, peak)    print(name, peak.date(), vols)

## Results and business recommendations- Elevated CAPE values align with weaker forward returns, so valuation-aware risk management would have reduced drawdowns.- Momentum signals captured run-ups before peaks; combining valuation and momentum provides an early warning dashboard.- Housing indicators flagged overheating before 2007, suggesting macro data can complement equity signals.- ARIMAX modestly improves directional accuracy versus the baseline, but errors remain; use it as a secondary risk gauge rather than a primary trading rule.

## Limitations and next steps- Data coverage for NASDAQ and housing is shorter than S&P history.- The models are simple and omit structural shifts; richer macro variables and regime-switching models could improve forecasts.- Bootstrapped confidence intervals and scenario analysis would add robustness.- Incorporating sector-level data (e.g., XLK) and alternative valuation metrics could refine bubble detection.