# Task 1: Foundation and Data Setup - Time Series Properties

## Objective
Analyze the Brent oil prices data to understand its key properties: trend, stationarity, and volatility. This analysis informs the modeling choices for change point detection.

## 1. Load Data


In [5]:
import sys
import os

# Add src to path
sys.path.append(os.path.abspath(os.path.join('..')))

from src.data_loader import load_data, calculate_log_returns
from src.analysis import plot_time_series, check_stationarity, plot_rolling_volatility
import matplotlib.pyplot as plt

# Load data
file_path = '../data/BrentOilPrices.csv'
df = load_data(file_path)

# Display first few rows
print(df.head())
print(df.info())


Error loading data: time data "Apr 22, 2020" doesn't match format "%d-%b-%y", at position 8360. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
Empty DataFrame
Columns: []
Index: []
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 0 entries
Empty DataFrame
None


## 2. Trend Analysis
Visual inspection of the price series to identify long-term trends and potential structural breaks.


In [None]:
plot_time_series(df, 'Price', 'Brent Oil Prices (1987 - 2022)', 'Price (USD/Barrel)')


## 3. Stationarity Testing
We use the Augmented Dickey-Fuller (ADF) test to check for stationarity.
Null Hypothesis (H0): The time series has a unit root (is non-stationary).
Alternate Hypothesis (H1): The time series is stationary.


In [None]:
print("ADF Test for Raw Prices:")
check_stationarity(df['Price'])


If the raw prices are non-stationary (p-value > 0.05), we calculate log returns to achieve stationarity.
$$ r_t = \ln(P_t) - \ln(P_{t-1}) $$


In [None]:
# Calculate Log Returns
df = calculate_log_returns(df)

print("ADF Test for Log Returns:")
check_stationarity(df['Log_Returns'])

plot_time_series(df, 'Log_Returns', 'Brent Oil Price Log Returns', 'Log Return', color='green')


## 4. Volatility Analysis
Analyze volatility clustering using rolling standard deviation.


In [None]:
# Rolling Volatility (30-day window)
plot_rolling_volatility(df['Log_Returns'], window=30)
