# Bitcoin vs. S&P 500 — Data Collection & EDAThis notebook creates a reproducible synthetic market dataset (until live market data is added) and runs first-pass exploratory analysis to understand how Bitcoin and the S&P 500 co-move. It also sketches an initial dip-buying experiment to inform later modeling work.> **Data note:** A synthetic CSV is generated to mirror realistic levels, volatilities, and mild correlation; swap in real price history later to re-run the same workflow.

## Notebook roadmap- **Setup:** Import common analysis libraries and plot styling.- **Generate synthetic dataset:** Create a reproducible BTC/S&P500 price path with controlled correlation and save to `data/raw/`.- **Load curated dataset:** Read the CSV, set the index, and confirm schema integrity.- **Exploratory checks:** Inspect coverage, descriptive stats, and correlations (point-in-time and rolling).- **Prototype dip test:** Compare rebounds when buying Bitcoin versus the index after deep S&P drawdowns.

In [None]:
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsplt.style.use("seaborn-v0_8-whitegrid")pd.set_option("display.float_format", lambda x: f"{x:,.4f}")np.random.seed(42)

## 1. Generate a reproducible synthetic datasetAssumptions used to mimic plausible behavior:- Business-day calendar with ~5 years of data.- Mild positive correlation between BTC and S&P 500 daily returns.- Higher Bitcoin volatility and drift to reflect its risk profile.Running this cell regenerates the CSV so downstream steps can be repeated or extended.

In [None]:
start_date = pd.Timestamp("2020-01-01")end_date = start_date + pd.offsets.BDay(1302)  # ~5 years of business daysbusiness_days = pd.date_range(start_date, end_date, freq="B")mu_sp500 = 0.0006sigma_sp500 = 0.012mu_btc = 0.001sigma_btc = 0.04correlation = 0.15cov_matrix = [    [sigma_sp500 ** 2, correlation * sigma_sp500 * sigma_btc],    [correlation * sigma_sp500 * sigma_btc, sigma_btc ** 2],]returns = np.random.multivariate_normal(    mean=[mu_sp500, mu_btc], cov=cov_matrix, size=len(business_days))sp500_returns = returns[:, 0]btc_returns = returns[:, 1]sp500_prices = 3000 * (1 + sp500_returns).cumprod()btc_prices = 7000 * (1 + btc_returns).cumprod()df = pd.DataFrame(    {        "Date": business_days,        "SP500_Close": sp500_prices,        "BTC_Close": btc_prices,        "SP500_Return": sp500_returns,        "BTC_Return": btc_returns,    })df.to_csv("data/raw/synthetic_btc_sp500.csv", index=False)df.head()

## 2. Load the datasetLoad the generated CSV with parsed dates, set `Date` as the index, and ensure chronological ordering to prepare for time-series operations.

In [None]:
df = pd.read_csv("data/raw/synthetic_btc_sp500.csv", parse_dates=["Date"])df = df.set_index("Date").sort_index()df.head()

## 3. Structural checks and descriptive statisticsQuick validation to confirm coverage and stability before deeper analysis:- Record counts and date span- Missing value scan- Basic distribution snapshots for prices and returns

In [None]:
summary = {    "Records": len(df),    "Date range": f"{df.index.min().date()} to {df.index.max().date()}",    "Missing values": df.isna().any().any(),    "SP500 close mean": df["SP500_Close"].mean(),    "BTC close mean": df["BTC_Close"].mean(),    "SP500 daily return mean": df["SP500_Return"].mean(),    "BTC daily return mean": df["BTC_Return"].mean(),}pd.Series(summary)

### Correlation snapshotCompute the Pearson correlation between daily returns to understand immediate co-movement.**Observed (synthetic) result:** ~0.10, implying only a weak positive relationship between daily BTC and S&P moves.

In [None]:
ret_corr = (    df[["SP500_Return", "BTC_Return"]]    .corr()    .loc["SP500_Return", "BTC_Return"])ret_corr

### Rolling 60-day return correlationShort-term correlations can drift. This chart highlights periods when BTC decouples from or converges with the S&P 500.**Observed range:** roughly -0.27 to +0.35 across the sample, showing alternating diversification and co-movement regimes.

In [None]:
window = 60rolling_corr = (    df[["SP500_Return", "BTC_Return"]]    .rolling(window=window)    .corr()    .loc[(slice(None), "SP500_Return"), ("BTC_Return",)]    .droplevel(1))fig, ax = plt.subplots(figsize=(10, 4))rolling_corr.plot(ax=ax, color="tab:blue")ax.axhline(0, color="black", linewidth=1, linestyle="--", alpha=0.7)ax.set_title(f"{window}-Day Rolling Correlation: BTC vs. S&P 500 Returns")ax.set_ylabel("Correlation")ax.set_xlabel("Date")plt.tight_layout()plt.show()

## 4. Prototype dip-buying comparisonA first-pass experiment: after a **10%+ S&P drawdown from its trailing 60-day high**, buy either Bitcoin or the S&P and hold for 15 calendar days. Compare average rebounds.

In [None]:
lookback = 60threshold = -0.10horizon = 15sp_roll_high = df["SP500_Close"].rolling(lookback).max()sp_drawdown = (df["SP500_Close"] / sp_roll_high) - 1signals = sp_drawdown <= thresholdrecords = []for signal_date in df.index[signals]:    window_end = signal_date + pd.Timedelta(days=horizon)    btc_window = df.loc[signal_date:window_end, "BTC_Close"]    sp_window = df.loc[signal_date:window_end, "SP500_Close"]    if btc_window.empty or sp_window.empty:        continue    records.append(        {            "Signal date": signal_date,            "BTC rebound": btc_window.iloc[-1] / btc_window.iloc[0] - 1,            "S&P rebound": sp_window.iloc[-1] / sp_window.iloc[0] - 1,        }    )results = pd.DataFrame(records)summary_cols = {    "Signals analyzed": len(results),    "Avg BTC rebound": results["BTC rebound"].mean(),    "Avg S&P rebound": results["S&P rebound"].mean(),    "BTC outperform rate": (results["BTC rebound"] > results["S&P rebound"]).mean(),}pd.Series(summary_cols)

### Interpretation and next steps- **Signal frequency:** In the synthetic sample we see ~345 qualifying drawdowns, ensuring enough events for modeling once real data is substituted.- **Rebound comparison:** Average 15-day rebounds cluster around **~1.9% for BTC vs. ~0.2% for the S&P**, with BTC outperforming on ~55% of signals—supporting the “higher-beta” intuition but not a guaranteed edge.- **Action items:**  - Swap in historical BTC and S&P prices (e.g., Yahoo Finance, Quandl) and re-run to validate whether the weak ~0.10 correlation and rebound skew persist.  - Add risk-aware performance metrics (max drawdown, volatility scaling) before asserting superiority of the dip strategy.  - Version the feature set for downstream ML (e.g., drawdown depth, rolling vol, macro proxies) and split into train/validation periods.