# Macroeconomic Forecasting Analysis

This notebook demonstrates:
1. Downloading economic time series data
2. Uploading data to S3
3. Querying forecast results from DynamoDB
4. Visualizing predictions with confidence intervals
5. Comparing multiple forecasting models
6. Analyzing forecast accuracy

**Prerequisites:**
- AWS credentials configured
- S3 bucket created
- Lambda function deployed
- DynamoDB table created

## 1. Setup and Imports

In [None]:
import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from boto3.dynamodb.conditions import Key
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

print("✓ Imports successful")

## 2. Configuration

Update these with your AWS resource names:

In [None]:
# AWS Configuration
BUCKET_NAME = 'economic-data-YOUR-ID'  # Update with your bucket name
TABLE_NAME = 'EconomicForecasts'
AWS_REGION = 'us-east-1'

# Initialize AWS clients
s3_client = boto3.client('s3', region_name=AWS_REGION)
dynamodb = boto3.resource('dynamodb', region_name=AWS_REGION)
table = dynamodb.Table(TABLE_NAME)

print(f"✓ Connected to S3 bucket: {BUCKET_NAME}")
print(f"✓ Connected to DynamoDB table: {TABLE_NAME}")

## 3. Create Sample Economic Data

Generate sample time series for testing (or skip if using real data):

In [None]:
def create_sample_gdp_data(start_year=2018, end_year=2023):
    """Create sample quarterly GDP data."""
    dates = pd.date_range(
        start=f'{start_year}-01-01',
        end=f'{end_year}-10-01',
        freq='Q'
    )
    
    # Simulate GDP growth with trend and seasonality
    n = len(dates)
    trend = np.linspace(20000, 25000, n)
    seasonal = 200 * np.sin(np.arange(n) * 2 * np.pi / 4)
    noise = np.random.normal(0, 100, n)
    
    values = trend + seasonal + noise
    
    df = pd.DataFrame({
        'date': dates,
        'value': values
    })
    
    return df

# Create sample data
gdp_data = create_sample_gdp_data()

print(f"✓ Created {len(gdp_data)} quarters of GDP data")
print(f"  Date range: {gdp_data['date'].min()} to {gdp_data['date'].max()}")

# Display first few rows
gdp_data.head()

## 4. Visualize Historical Data

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(gdp_data['date'], gdp_data['value'], marker='o', linewidth=2, markersize=4)
plt.title('Historical GDP (Quarterly)', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('GDP (Billions USD)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Summary statistics
print("\nSummary Statistics:")
print(gdp_data['value'].describe())

## 5. Upload Data to S3

This will trigger the Lambda forecasting function:

In [None]:
import io

def upload_to_s3(df, bucket, key):
    """Upload DataFrame as CSV to S3."""
    csv_buffer = io.StringIO()
    df.to_csv(csv_buffer, index=False)
    
    s3_client.put_object(
        Bucket=bucket,
        Key=key,
        Body=csv_buffer.getvalue(),
        ContentType='text/csv'
    )
    
    print(f"✓ Uploaded to s3://{bucket}/{key}")

# Upload GDP data
s3_key = 'raw/gdp/usa_gdp_quarterly.csv'
upload_to_s3(gdp_data, BUCKET_NAME, s3_key)

print("\n⏳ Lambda function should be triggered automatically...")
print("   Wait 30-60 seconds for processing to complete")

## 6. Query Forecast Results from DynamoDB

In [None]:
import time

# Wait for Lambda processing
print("Waiting for forecasts to be generated...")
time.sleep(30)

# Query forecasts
response = table.query(
    KeyConditionExpression=Key('indicator_country').eq('GDP_USA')
)

items = response.get('Items', [])

if not items:
    print("✗ No forecasts found. Check Lambda logs for errors.")
else:
    print(f"✓ Found {len(items)} forecasts")
    
    # Convert to DataFrame
    forecasts_df = pd.DataFrame(items)
    forecasts_df['forecast_date'] = pd.to_datetime(forecasts_df['forecast_date'], unit='s')
    forecasts_df = forecasts_df.sort_values('forecast_date')
    
    # Display sample
    display(forecasts_df[[
        'forecast_date',
        'forecast_value',
        'confidence_95_lower',
        'confidence_95_upper',
        'model_type'
    ]].head(10))

## 7. Visualize Forecasts with Confidence Intervals

In [None]:
if items:
    # Separate by model type
    models = forecasts_df['model_type'].unique()
    
    fig, axes = plt.subplots(len(models), 1, figsize=(14, 6 * len(models)))
    
    if len(models) == 1:
        axes = [axes]
    
    for ax, model in zip(axes, models):
        # Filter by model
        model_df = forecasts_df[forecasts_df['model_type'] == model]
        
        # Plot historical data
        ax.plot(gdp_data['date'], gdp_data['value'], 
                marker='o', linewidth=2, markersize=4, 
                label='Historical', color='blue')
        
        # Plot forecasts
        ax.plot(model_df['forecast_date'], model_df['forecast_value'],
                marker='s', linewidth=2, markersize=6,
                label='Forecast', color='red', linestyle='--')
        
        # Plot 95% confidence interval
        ax.fill_between(
            model_df['forecast_date'],
            model_df['confidence_95_lower'],
            model_df['confidence_95_upper'],
            alpha=0.2, color='red', label='95% CI'
        )
        
        # Plot 80% confidence interval
        if 'confidence_80_lower' in model_df.columns:
            ax.fill_between(
                model_df['forecast_date'],
                model_df['confidence_80_lower'],
                model_df['confidence_80_upper'],
                alpha=0.3, color='red', label='80% CI'
            )
        
        ax.set_title(f'GDP Forecast - {model}', fontsize=14, fontweight='bold')
        ax.set_xlabel('Date', fontsize=12)
        ax.set_ylabel('GDP (Billions USD)', fontsize=12)
        ax.legend(loc='best')
        ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 8. Compare Multiple Models

In [None]:
if items and len(models) > 1:
    plt.figure(figsize=(14, 7))
    
    # Plot historical data
    plt.plot(gdp_data['date'], gdp_data['value'],
             marker='o', linewidth=2, markersize=4,
             label='Historical', color='blue')
    
    # Plot forecasts from each model
    colors = ['red', 'green', 'orange', 'purple']
    
    for i, model in enumerate(models):
        model_df = forecasts_df[forecasts_df['model_type'] == model]
        
        plt.plot(model_df['forecast_date'], model_df['forecast_value'],
                 marker='s', linewidth=2, markersize=6,
                 label=f'{model}', color=colors[i % len(colors)],
                 linestyle='--')
        
        # Add confidence interval
        plt.fill_between(
            model_df['forecast_date'],
            model_df['confidence_95_lower'],
            model_df['confidence_95_upper'],
            alpha=0.15, color=colors[i % len(colors)]
        )
    
    plt.title('GDP Forecast - Model Comparison', fontsize=16, fontweight='bold')
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('GDP (Billions USD)', fontsize=12)
    plt.legend(loc='best', fontsize=11)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    # Print comparison statistics
    print("\nForecast Comparison (Mean Values):")
    print(forecasts_df.groupby('model_type')['forecast_value'].agg(['mean', 'std', 'min', 'max']))

## 9. Analyze Forecast Uncertainty

In [None]:
if items:
    # Calculate confidence interval widths
    forecasts_df['ci_95_width'] = forecasts_df['confidence_95_upper'] - forecasts_df['confidence_95_lower']
    forecasts_df['ci_80_width'] = forecasts_df['confidence_80_upper'] - forecasts_df['confidence_80_lower']
    
    # Plot uncertainty over time
    fig, axes = plt.subplots(2, 1, figsize=(12, 10))
    
    # Plot 1: CI width over time
    for model in models:
        model_df = forecasts_df[forecasts_df['model_type'] == model]
        axes[0].plot(model_df['forecast_date'], model_df['ci_95_width'],
                     marker='o', linewidth=2, label=f'{model} (95% CI)')
    
    axes[0].set_title('Forecast Uncertainty Over Time', fontsize=14, fontweight='bold')
    axes[0].set_xlabel('Forecast Date', fontsize=12)
    axes[0].set_ylabel('Confidence Interval Width', fontsize=12)
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Plot 2: Relative uncertainty
    forecasts_df['relative_uncertainty'] = (
        forecasts_df['ci_95_width'] / forecasts_df['forecast_value'] * 100
    )
    
    for model in models:
        model_df = forecasts_df[forecasts_df['model_type'] == model]
        axes[1].plot(model_df['forecast_date'], model_df['relative_uncertainty'],
                     marker='o', linewidth=2, label=model)
    
    axes[1].set_title('Relative Forecast Uncertainty', fontsize=14, fontweight='bold')
    axes[1].set_xlabel('Forecast Date', fontsize=12)
    axes[1].set_ylabel('Uncertainty (%)', fontsize=12)
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\nAverage Uncertainty by Model:")
    print(forecasts_df.groupby('model_type')['relative_uncertainty'].agg(['mean', 'std']))

## 10. Export Results

In [None]:
if items:
    # Export to CSV
    output_file = 'gdp_forecasts_export.csv'
    forecasts_df.to_csv(output_file, index=False)
    print(f"✓ Forecasts exported to {output_file}")
    
    # Create summary report
    summary = {
        'total_forecasts': len(forecasts_df),
        'models_used': list(models),
        'date_range': f"{forecasts_df['forecast_date'].min()} to {forecasts_df['forecast_date'].max()}",
        'mean_forecast': forecasts_df['forecast_value'].mean(),
        'forecast_range': f"{forecasts_df['forecast_value'].min():.2f} - {forecasts_df['forecast_value'].max():.2f}"
    }
    
    print("\n" + "="*60)
    print("FORECAST SUMMARY")
    print("="*60)
    for key, value in summary.items():
        print(f"{key.replace('_', ' ').title()}: {value}")
    print("="*60)

## 11. Cost Analysis

In [None]:
if items:
    # Estimate costs
    num_forecasts = len(items)
    num_indicators = 1  # Update if processing multiple indicators
    
    # Cost estimates (approximate)
    s3_storage_gb = 0.0001  # Small CSV files
    s3_storage_cost = s3_storage_gb * 0.023 * 7 / 30  # 7 days storage
    
    lambda_invocations = num_indicators
    lambda_cost = lambda_invocations * 0.0000002 * 30  # 30 seconds per invocation
    
    dynamodb_writes = num_forecasts
    dynamodb_cost = dynamodb_writes * 0.00000125
    
    total_cost = s3_storage_cost + lambda_cost + dynamodb_cost
    
    print("\n" + "="*60)
    print("ESTIMATED COSTS")
    print("="*60)
    print(f"S3 Storage (7 days):        ${s3_storage_cost:.4f}")
    print(f"Lambda Invocations:         ${lambda_cost:.4f}")
    print(f"DynamoDB Writes:            ${dynamodb_cost:.4f}")
    print("-" * 60)
    print(f"Total Estimated Cost:       ${total_cost:.4f}")
    print("="*60)
    print("\nNote: This is a rough estimate. Check AWS Cost Explorer for actual costs.")

## 12. Next Steps

Ideas for extending this analysis:

1. **Add More Indicators**: Upload unemployment, inflation, etc.
2. **Compare Countries**: Forecast GDP for multiple countries
3. **Improve Models**: Tune ARIMA parameters, add seasonality
4. **Validation**: Compare forecasts with actual values
5. **Automation**: Set up scheduled Lambda runs for real-time forecasts
6. **Dashboard**: Create interactive dashboard with Plotly/Dash
7. **Alerts**: Set up SNS notifications for significant changes

**Cleanup**: Remember to delete AWS resources when done (see cleanup_guide.md)

## 13. Cleanup Preview

When you're done, run these commands to delete resources:

In [None]:
# IMPORTANT: Only run this when you're completely done!
# Uncomment to execute cleanup

# print("Cleanup commands:")
# print(f"aws dynamodb delete-table --table-name {TABLE_NAME}")
# print(f"aws s3 rm s3://{BUCKET_NAME} --recursive")
# print(f"aws s3 rb s3://{BUCKET_NAME}")
# print(f"aws lambda delete-function --function-name forecast-economic-indicators")
# print("\nSee cleanup_guide.md for detailed instructions")