# Synthetic Data Collection & Exploratory AnalysisThis notebook creates a **synthetic** Bitcoin and S&P 500 price history, performs initial exploratory analysis, and prototypes a simple dip-buying comparison between holding Bitcoin vs. the S&P 500 after equity market drawdowns. Synthetic data is used because outbound network access is disabled in this environment; the workflow mirrors what we will apply once live price data is available.

## Objectives- Generate a reproducible synthetic dataset with controlled correlation between Bitcoin and the S&P 500.- Store the dataset under `data/raw/` for downstream notebooks and reports.- Run quick EDA to profile returns, ranges, and correlations.- Prototype a dip-buying heuristic: when the S&P 500 is in a deep drawdown, compare holding Bitcoin vs. the index over the next few weeks.

In [None]:
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom pathlib import Pathpd.set_option("display.float_format", lambda v: f"{v:,.6f}")sns.set_context("talk")

## 1. Generate a reproducible synthetic datasetAssumptions:- Business-day calendar for ~5 years (2020-01-01 onward).- Mild positive correlation (ρ = 0.35) between daily returns.- S&P 500 daily drift/vol: μ = 0.03% , σ = 1.2%.- Bitcoin daily drift/vol: μ = 0.08% , σ = 3.5%.A Cholesky factor of the covariance matrix ensures the simulated returns respect the chosen correlation structure.

In [None]:
np.random.seed(42)start_date = pd.Timestamp("2020-01-01")end_date = start_date + pd.Timedelta(days=5 * 365)dates = pd.bdate_range(start_date, end_date)mu_sp, sigma_sp = 0.0003, 0.012mu_btc, sigma_btc = 0.0008, 0.035rho = 0.35cov = np.array(    [        [sigma_sp ** 2, rho * sigma_sp * sigma_btc],        [rho * sigma_sp * sigma_btc, sigma_btc ** 2],    ])chol = np.linalg.cholesky(cov)z = np.random.normal(size=(len(dates), 2))correlated_noise = z @ chol.Treturns = correlated_noise + np.array([mu_sp, mu_btc])sp_prices = [3200.0]btc_prices = [7200.0]for sp_r, btc_r in returns:    sp_prices.append(sp_prices[-1] * (1 + sp_r))    btc_prices.append(btc_prices[-1] * (1 + btc_r))sp_series = pd.Series(sp_prices[1:], index=dates)btc_series = pd.Series(btc_prices[1:], index=dates)simulated = pd.DataFrame(    {        "Date": dates,        "SP500_Close": sp_series.round(2),        "BTC_Close": btc_series.round(2),        "SP500_Return": returns[:, 0],        "BTC_Return": returns[:, 1],    })csv_path = Path("data/raw/synthetic_btc_sp500.csv")csv_path.parent.mkdir(parents=True, exist_ok=True)simulated.to_csv(csv_path, index=False)print(f"Saved synthetic dataset to {csv_path}")print(f"Rows: {len(simulated):,} | Date range: {simulated['Date'].min().date()} to {simulated['Date'].max().date()}")

## 2. Load the datasetLoad the generated CSV with parsed dates and set `Date` as the time index for time-series operations.

In [None]:
df = pd.read_csv("data/raw/synthetic_btc_sp500.csv", parse_dates=["Date"])df = df.sort_values("Date").reset_index(drop=True)df.set_index("Date", inplace=True)df.head()

## 3. Structural checks and descriptive statisticsThese quick checks verify column ranges, date coverage, and basic return properties before deeper analysis.

In [None]:
summary = {    "Records": len(df),    "Date range": f"{df.index.min().date()} to {df.index.max().date()}",    "S&P 500 close range": f"{df['SP500_Close'].min():,.2f} to {df['SP500_Close'].max():,.2f}",    "Bitcoin close range": f"{df['BTC_Close'].min():,.2f} to {df['BTC_Close'].max():,.2f}",    "Mean daily return (S&P 500)": df["SP500_Return"].mean(),    "Mean daily return (Bitcoin)": df["BTC_Return"].mean(),    "Std daily return (S&P 500)": df["SP500_Return"].std(),    "Std daily return (Bitcoin)": df["BTC_Return"].std(),}pd.Series(summary)

### Correlation snapshotCompute the Pearson correlation between daily returns to quantify co-movement in the simulated series.

In [None]:
ret_corr = df[["SP500_Return", "BTC_Return"]].corr().loc["SP500_Return", "BTC_Return"]print(f"Return correlation (BTC vs. S&P 500): {ret_corr:.3f}")

**Interpretation:** A correlation near 0 suggests independent moves, while values closer to 1 imply strong co-movement. The synthetic settings target a mild positive relationship, so values in the 0.1–0.3 range indicate weak coupling—enough to matter for risk but leaving room for diversification.

### Rolling 60-day return correlationShort-term correlations can drift. This chart highlights how the BTC–S&P 500 relationship evolves over time.

In [None]:
window = 60rolling_corr = (    df[["SP500_Return", "BTC_Return"]]    .rolling(window)    .corr()    .unstack()    .loc[:, ("SP500_Return", "BTC_Return")])fig, ax = plt.subplots(figsize=(10, 4))rolling_corr.plot(ax=ax, color="steelblue")ax.axhline(0, color="gray", linestyle="--", linewidth=1)ax.set_title(f"{window}-Day Rolling Correlation (BTC vs. S&P 500 Returns)")ax.set_ylabel("Correlation")ax.set_xlabel("Date")plt.tight_layout()plt.show()

## 4. Prototype dip-buying comparisonA first-pass experiment to address the research question:> When the equity market is in a sharp drawdown, does rotating into Bitcoin provide a better short-term rebound than staying in the S&P 500?Method (toy example using synthetic data):- Define a **drawdown trigger** when the S&P 500 is 10% below its 60-day rolling high.- On each trigger date, simulate buying and holding either Bitcoin or the S&P 500 for the next 15 trading days.- Compare the distribution of forward returns for each choice.This is not a trading strategy recommendation—just a diagnostic to see whether Bitcoin behaves like a higher-beta proxy after equity sell-offs.

In [None]:
lookback = 60threshold = -0.10horizon = 15sp_roll_high = df["SP500_Close"].rolling(lookback, min_periods=1).max()drawdown = df["SP500_Close"] / sp_roll_high - 1df["SP500_Drawdown"] = drawdownsignals = df.index[(drawdown <= threshold) & (drawdown.shift(1) > threshold)]records = []for ts in signals:    forward_idx = df.index.get_loc(ts) + horizon    if forward_idx >= len(df):        continue    sp_future = df.iloc[forward_idx]["SP500_Close"] / df.loc[ts]["SP500_Close"] - 1    btc_future = df.iloc[forward_idx]["BTC_Close"] / df.loc[ts]["BTC_Close"] - 1    records.append({        "signal_date": ts.date(),        f"sp_return_{horizon}d": sp_future,        f"btc_return_{horizon}d": btc_future,    })results = pd.DataFrame(records)print(f"Signals found: {len(results)} across {lookback}-day lookback with {threshold*100:.0f}% trigger")results.describe()

### Interpretation- **Signal frequency:** Provides a sense of how often deep equity drawdowns occur in the simulated environment.- **Average forward returns:** Compare the mean/median of the Bitcoin vs. S&P columns above. In this synthetic run, Bitcoin often exhibits larger upside swings but also higher downside tails—consistent with a higher-volatility asset that is only loosely correlated to equities.- **Next steps:** Repeat this analysis with real market data, vary thresholds/horizons, and include transaction costs and volatility-adjusted position sizing to assess robustness.