This project studies daily stock data and applies three related tasks on the same dataset:
- exploratory data analysis
- statistical inference
- time-series forecasting
- survival analysis
- ranking
The goal is to work with historical stock prices, understand their behavior, test simple statistical claims, and then build shared models that can be applied across multiple stocks.
- 01_notebooks/01_EDA.ipynb EDA, anomaly checks, correlations, and regime comparison
- 01_notebooks/02_hypotesis_testing.ipynb confidence intervals and hypothesis tests
- 02_magic/01_data/download_data.py
downloads raw stock data with
yfinance - 02_magic/01_data/load_data.py loads and formats the raw CSV
- 02_magic/03_models/01_baseline.ipynb simple baseline forecasting model
- 02_magic/03_models/02_forcasting_model.ipynb shared forecasting model for next-day close price
- 02_magic/03_models/03_survival.ipynb pooled survival model for time until a 5 percent next-day gain
- 02_magic/03_models/04_ranking.ipynb ranking stocks by predicted next-day gain
Raw and engineered data are stored in:
Main files:
stock_prices.csvdf_eda.csv
The downloader currently includes 10 tickers:
AAPLMSFTAMZNGOOGLMETAJPMJNJXOMPGHD
Recommended order:
- Run
download_data.pyto refresh the raw data. - Run the EDA notebook and save the engineered dataset if needed.
- Run the hypothesis testing notebook.
- Run the baseline model notebook.
- Run the forecasting notebook.
- Run the survival notebook.
- Run the ranking notebook.
- The modeling notebooks use one shared model across all stocks and then evaluate performance by ticker.
- The forecasting notebook predicts next-day closing price directly.
- The ranking notebook ranks stocks by predicted next-day price gain and compares against a simple price-momentum baseline.
- If the ticker list changes, the notebooks should be rerun so the saved outputs and markdown stay aligned with the latest results.