# Economic Time Series Analysis

Complete tutorial on analyzing macroeconomic time series data with forecasting and causality testing.

## Dataset

Quarterly economic indicators (2020-2025):
- **GDP**: Real GDP (billions)
- **Inflation**: CPI year-over-year %
- **Unemployment**: Unemployment rate %
- **Interest Rate**: Federal funds rate %
- **Stock Index**: Market index (points)

## Methods
- Stationarity testing (ADF, KPSS)
- ARIMA modeling
- Granger causality
- Forecasting
- Correlation analysis

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss, grangercausalitytests
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('Set2')
%matplotlib inline

print("✓ Setup complete")

## 1. Load and Explore Data

In [None]:
# Load data
df = pd.read_csv('sample_economic_data.csv')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

print(f"Data shape: {df.shape}")
print(f"Period: {df.index[0]} to {df.index[-1]}")
print(f"\nVariables: {', '.join(df.columns)}")

df.head(10)

In [None]:
# Summary statistics
print("Summary Statistics:")
print(df.describe())

## 2. Visualize Time Series

In [None]:
fig, axes = plt.subplots(3, 2, figsize=(15, 12))
axes = axes.flatten()

variables = df.columns
colors = ['blue', 'red', 'green', 'purple', 'orange']

for idx, (var, color) in enumerate(zip(variables, colors)):
    axes[idx].plot(df.index, df[var], marker='o', linewidth=2, color=color)
    axes[idx].set_title(f'{var.replace("_", " ").title()}', fontweight='bold')
    axes[idx].set_xlabel('Quarter')
    axes[idx].set_ylabel(var.replace('_', ' ').title())
    axes[idx].grid(True, alpha=0.3)
    axes[idx].tick_params(axis='x', rotation=45)

axes[5].axis('off')

plt.tight_layout()
plt.show()

## 3. Correlation Analysis

In [None]:
# Correlation matrix
correlation = df.corr()

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', 
           center=0, square=True, ax=ax, cbar_kws={'label': 'Correlation'})
ax.set_title('Economic Indicators Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nKey Correlations:")
for i in range(len(correlation.columns)):
    for j in range(i+1, len(correlation.columns)):
        corr_val = correlation.iloc[i, j]
        if abs(corr_val) > 0.5:
            print(f"  {correlation.columns[i]} vs {correlation.columns[j]}: {corr_val:.3f}")

## 4. Stationarity Testing

In [None]:
def test_stationarity(series, name):
    """Test stationarity using ADF and KPSS tests."""
    print(f"\n{'='*60}")
    print(f"Stationarity Tests: {name}")
    print('='*60)
    
    # Augmented Dickey-Fuller test
    adf_result = adfuller(series.dropna())
    print(f"\nADF Test:")
    print(f"  Test Statistic: {adf_result[0]:.4f}")
    print(f"  P-value: {adf_result[1]:.4f}")
    print(f"  Critical Values: {adf_result[4]}")
    print(f"  Result: {'Stationary' if adf_result[1] < 0.05 else 'Non-stationary'}")
    
    # KPSS test
    kpss_result = kpss(series.dropna(), regression='ct')
    print(f"\nKPSS Test:")
    print(f"  Test Statistic: {kpss_result[0]:.4f}")
    print(f"  P-value: {kpss_result[1]:.4f}")
    print(f"  Critical Values: {kpss_result[3]}")
    print(f"  Result: {'Stationary' if kpss_result[1] > 0.05 else 'Non-stationary'}")
    
    return adf_result[1] < 0.05 and kpss_result[1] > 0.05

# Test each variable
stationarity_results = {}
for col in df.columns:
    is_stationary = test_stationarity(df[col], col)
    stationarity_results[col] = is_stationary

## 5. Differencing for Stationarity

In [None]:
# Create differenced series
df_diff = df.diff().dropna()

# Plot original vs differenced
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Original GDP
axes[0].plot(df.index, df['gdp'], marker='o', linewidth=2)
axes[0].set_title('GDP (Original)', fontweight='bold')
axes[0].set_ylabel('GDP (billions)')
axes[0].grid(True, alpha=0.3)

# Differenced GDP
axes[1].plot(df_diff.index, df_diff['gdp'], marker='o', linewidth=2, color='orange')
axes[1].axhline(0, color='red', linestyle='--', alpha=0.5)
axes[1].set_title('GDP (First Difference)', fontweight='bold')
axes[1].set_ylabel('Change in GDP')
axes[1].set_xlabel('Quarter')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Test differenced series
test_stationarity(df_diff['gdp'], 'GDP (Differenced)')

## 6. ARIMA Modeling

In [None]:
# ACF and PACF plots for GDP
fig, axes = plt.subplots(1, 2, figsize=(14, 4))

plot_acf(df_diff['gdp'].dropna(), lags=8, ax=axes[0])
axes[0].set_title('Autocorrelation Function (ACF)', fontweight='bold')

plot_pacf(df_diff['gdp'].dropna(), lags=8, ax=axes[1])
axes[1].set_title('Partial Autocorrelation Function (PACF)', fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Fit ARIMA model for GDP
# Using ARIMA(1,1,1) based on ACF/PACF
model = ARIMA(df['gdp'], order=(1, 1, 1))
results = model.fit()

print(results.summary())

# Model diagnostics
fig = results.plot_diagnostics(figsize=(14, 10))
plt.tight_layout()
plt.show()

## 7. Forecasting

In [None]:
# Forecast 4 quarters ahead
forecast_steps = 4
forecast = results.forecast(steps=forecast_steps)
forecast_index = pd.date_range(start=df.index[-1] + pd.DateOffset(months=3), 
                               periods=forecast_steps, freq='QS')

# Get confidence intervals
forecast_df = results.get_forecast(steps=forecast_steps)
forecast_ci = forecast_df.conf_int()

# Plot
fig, ax = plt.subplots(figsize=(14, 6))

# Historical data
ax.plot(df.index, df['gdp'], marker='o', linewidth=2, label='Historical')

# Forecast
ax.plot(forecast_index, forecast, marker='o', linewidth=2, 
       color='red', linestyle='--', label='Forecast')

# Confidence interval
ax.fill_between(forecast_index, 
               forecast_ci.iloc[:, 0], 
               forecast_ci.iloc[:, 1], 
               color='red', alpha=0.2, label='95% CI')

ax.set_title('GDP Forecast (ARIMA)', fontsize=14, fontweight='bold')
ax.set_xlabel('Quarter')
ax.set_ylabel('GDP (billions)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nGDP Forecasts:")
for date, value, lower, upper in zip(forecast_index, forecast, 
                                     forecast_ci.iloc[:, 0], 
                                     forecast_ci.iloc[:, 1]):
    print(f"  {date.strftime('%Y-Q%q')}: {value:.0f} (95% CI: [{lower:.0f}, {upper:.0f}])")

## 8. Granger Causality Tests

In [None]:
def granger_test(data, var1, var2, max_lag=4):
    """Test if var1 Granger-causes var2."""
    print(f"\n{'='*60}")
    print(f"Granger Causality: Does {var1} cause {var2}?")
    print('='*60)
    
    test_data = data[[var2, var1]].dropna()
    results = grangercausalitytests(test_data, maxlag=max_lag, verbose=False)
    
    # Extract p-values
    p_values = [results[lag][0]['ssr_ftest'][1] for lag in range(1, max_lag+1)]
    
    print(f"\nP-values by lag:")
    for lag, p_val in enumerate(p_values, 1):
        sig = "***" if p_val < 0.01 else ("**" if p_val < 0.05 else ("*" if p_val < 0.1 else "ns"))
        print(f"  Lag {lag}: {p_val:.4f} {sig}")
    
    min_p = min(p_values)
    conclusion = "YES" if min_p < 0.05 else "NO"
    print(f"\n  Conclusion: {var1} {'does' if conclusion == 'YES' else 'does not'} Granger-cause {var2}")
    
    return min_p < 0.05

# Test key relationships
granger_test(df, 'interest_rate', 'inflation')
granger_test(df, 'inflation', 'interest_rate')
granger_test(df, 'unemployment', 'inflation')
granger_test(df, 'gdp', 'unemployment')

## 9. Summary Report

In [None]:
# Generate summary
summary = pd.DataFrame({
    'Variable': df.columns,
    'Mean': df.mean(),
    'Std Dev': df.std(),
    'Min': df.min(),
    'Max': df.max(),
    'Trend': ['Increasing' if df[col].iloc[-1] > df[col].iloc[0] else 'Decreasing' 
             for col in df.columns]
})

print("="*80)
print("ECONOMIC TIME SERIES ANALYSIS SUMMARY")
print("="*80)
print(f"\nPeriod: {df.index[0].strftime('%Y-Q%q')} to {df.index[-1].strftime('%Y-Q%q')}")
print(f"Observations: {len(df)}")
print("\nVariable Statistics:")
print(summary.to_string(index=False))
print("="*80)

# Save
summary.to_csv('economic_analysis_summary.csv', index=False)
print("\n✓ Summary saved to economic_analysis_summary.csv")

## Key Findings

### Time Series Characteristics
- Most economic variables are **non-stationary** in levels
- First differencing achieves stationarity
- Strong **autocorrelation** present in all series

### ARIMA Modeling
- ARIMA(1,1,1) provides good fit for GDP
- Residuals show white noise properties
- 4-quarter ahead forecasts with 95% confidence intervals

### Granger Causality
- Interest rates may Granger-cause inflation (policy response)
- GDP growth affects unemployment (Okun's Law)
- Bidirectional relationships common in macro data

### Correlations
- Strong negative: Unemployment vs Stock Index
- Strong positive: GDP vs Stock Index
- Interest rates respond to inflation

## Next Steps

1. **VAR modeling**: Multi-variate analysis
2. **Cointegration**: Test long-run relationships
3. **GARCH models**: Model volatility
4. **Structural breaks**: Test for regime changes
5. **Real data**: Use FRED API for actual economic data

## Resources

- [Statsmodels Documentation](https://www.statsmodels.org/)
- [FRED Economic Data](https://fred.stlouisfed.org/)
- [Time Series Analysis (Hyndman)](https://otexts.com/fpp3/)