# Task 1: Preprocess and Explore the Data

Project: Time Series Forecasting for Portfolio Management Optimization (GMF Investments)

This notebook loads TSLA, BND, SPY from yfinance (2015-07-01 to 2025-07-31), cleans data, performs EDA, computes returns and volatility, runs ADF stationarity tests, and calculates risk metrics (VaR, Sharpe).

In [None]:
# Imports and configuration
import os
import pandas as pd
import numpy as np
from src.constants.config import TICKERS, START_DATE, END_DATE, INTERVAL, AUTO_ADJUST, RISK_FREE_RATE
from src.utils.data_loader import fetch_yfinance_data, merge_adjusted_close
from src.utils.preprocessing import fill_missing_dates, handle_missing, compute_returns
from src.utils.eda import basic_stats, rolling_stats, detect_outliers_zscore, adf_test
from src.utils.metrics import sharpe_ratio, value_at_risk_historic, value_at_risk_parametric
from src.utils.plotting import plot_prices, plot_returns, plot_rolling
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 160)
print('Config:', TICKERS, START_DATE, END_DATE)

In [None]:
# Fetch OHLCV data
raw = fetch_yfinance_data(TICKERS, START_DATE, END_DATE, interval=INTERVAL, auto_adjust=AUTO_ADJUST)
list(raw.keys()), [raw[k].shape for k in raw]

In [None]:
# Merge Adj Close into a single DataFrame
prices = merge_adjusted_close(raw, column='Adj Close')
prices = fill_missing_dates(prices)
prices = handle_missing(prices, method='interpolate')
prices.head(), prices.isna().sum()

In [None]:
# EDA: plot price series
plot_prices(prices, title='Adjusted Close Prices (TSLA, BND, SPY)')
stats = basic_stats(prices)
stats

In [None]:
# Compute daily returns and visualize
returns = compute_returns(prices, kind='pct')
plot_returns(returns, title='Daily Percentage Returns')
returns.describe(percentiles=[0.01,0.05,0.95,0.99])

In [None]:
# Rolling volatility (21-day)
roll = rolling_stats(returns, window=21)
plot_rolling(roll['mean'], roll['std'], column='TSLA', title='TSLA Rolling Mean/Std (21d)')
roll['mean'].tail(), roll['std'].tail()

In [None]:
# Outlier detection for TSLA daily returns
out_tsla = detect_outliers_zscore(returns['TSLA'], threshold=3.0)
out_tsla[out_tsla['is_outlier']].head()

In [None]:
# Stationarity tests (ADF) on prices and returns
adf_prices_tsla = adf_test(prices['TSLA'])
adf_returns_tsla = adf_test(returns['TSLA'])
adf_prices_tsla, adf_returns_tsla

In [None]:
# Risk metrics: VaR and Sharpe (daily to annualized)
tsla_sharpe = sharpe_ratio(returns['TSLA'], risk_free_rate=RISK_FREE_RATE, freq='daily')
tsla_var_hist_95 = value_at_risk_historic(returns['TSLA'].dropna(), alpha=0.95)
tsla_var_param_95 = value_at_risk_parametric(returns['TSLA'].dropna(), alpha=0.95)
{'TSLA_Sharpe': tsla_sharpe, 'TSLA_VaR_hist_95': tsla_var_hist_95, 'TSLA_VaR_param_95': tsla_var_param_95}

## Notes and Insights
- Summarize price trends and return distributions.
- Comment on stationarity (ADF): prices likely non-stationary; returns typically stationary.
- Discuss volatility clustering and implications for modeling.
- Connect findings to portfolio implications (risk management, allocation hints).