# Task 1 - Brent Oil Prices: Initial EDA

This notebook does the initial EDA required for Task 1:
- Load and validate the Brent oil price time series
- Visualize the raw price series (trend, major shocks)
- Compute and visualize log returns (volatility clustering)
- Run basic stationarity tests (ADF)

Outputs (figures) are saved to `reports/figures/`.

In [None]:
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller

# Project root (works whether launched from repo root or notebooks/)
ROOT = Path.cwd().resolve()
if not (ROOT / 'data').exists():
    ROOT = ROOT.parent
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

from src.data import load_brent_prices, compute_log_returns

sns.set_style('whitegrid')
FIG_DIR = ROOT / 'reports' / 'figures'
FIG_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
df = load_brent_prices()
df.head()

In [None]:
df.shape, df['Date'].min(), df['Date'].max(), df['Price'].describe()

In [None]:
plt.figure(figsize=(12, 4))
plt.plot(df['Date'], df['Price'], linewidth=1)
plt.title('Brent Oil Price (USD/barrel) - Daily')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.tight_layout()
out = FIG_DIR / 'price_series.png'
plt.savefig(out, dpi=160)
plt.show()
out

In [None]:
log_returns = compute_log_returns(df['Price'])
returns_df = pd.DataFrame({'Date': df['Date'].iloc[1:].values, 'log_return': log_returns.values})
returns_df.head()

In [None]:
plt.figure(figsize=(12, 4))
plt.plot(returns_df['Date'], returns_df['log_return'], linewidth=0.8)
plt.title('Log Returns: log(price_t) - log(price_{t-1})')
plt.xlabel('Date')
plt.ylabel('Log return')
plt.tight_layout()
out = FIG_DIR / 'log_returns.png'
plt.savefig(out, dpi=160)
plt.show()
out

In [None]:
def adf_test(series, name: str):
    series = pd.Series(series).dropna().astype(float)
    stat, pvalue, _, _, crit, _ = adfuller(series, autolag='AIC')
    return {
        'series': name,
        'adf_stat': stat,
        'p_value': pvalue,
        'n': len(series),
        'crit_1%': crit.get('1%'),
        'crit_5%': crit.get('5%'),
        'crit_10%': crit.get('10%'),
    }

pd.DataFrame([
    adf_test(df['Price'], 'Price'),
    adf_test(returns_df['log_return'], 'Log returns'),
]).set_index('series')

In [None]:
# Volatility proxy: rolling std of log returns
window = 30
returns_df['roll_vol_30d'] = returns_df['log_return'].rolling(window).std() * np.sqrt(252)

plt.figure(figsize=(12, 4))
plt.plot(returns_df['Date'], returns_df['roll_vol_30d'], linewidth=1)
plt.title('Rolling Volatility (30-day std of log returns, annualized)')
plt.xlabel('Date')
plt.ylabel('Annualized volatility')
plt.tight_layout()
out = FIG_DIR / 'rolling_volatility_30d.png'
plt.savefig(out, dpi=160)
plt.show()
out

## Initial findings (to carry into the report)

- The raw price series exhibits long-run trend shifts and shocks, so modeling the *price level* with a constant-mean Gaussian likelihood is usually inappropriate.
- Log returns tend to be closer to stationary than price levels (ADF typically supports this), but show volatility clustering.
- Volatility is time-varying, suggesting that change point models and/or regime-switching volatility models can be informative.