# Capital Flow Volatility Analysis - Report Template
## Case Study 1: Iceland vs. Eurozone Comparison

---

**Research Question:** Should Iceland adopt the Euro as its currency?

**Hypothesis:** Iceland's capital flows show more volatility than the Eurozone bloc average

**Date:** July 2025

---

## 1. Data and Methodology

### Data Sources
- **Balance of Payments Data:** IMF, quarterly frequency (1999-2024)
- **GDP Data:** IMF World Economic Outlook, annual frequency
- **Countries:** Iceland vs. 10 initial Euro adopters (excluding Luxembourg)

### Methodology
1. **Data Normalization:** All BOP flows converted to annualized % of GDP
2. **Statistical Analysis:** Comprehensive descriptive statistics and F-tests
3. **Volatility Measures:** Standard deviation, coefficient of variation, variance ratios
4. **Hypothesis Testing:** F-tests for equality of variances between groups

### Countries Analyzed
- **Iceland:** Independent monetary policy with floating exchange rate
- **Eurozone Bloc:** Austria, Belgium, Finland, France, Germany, Ireland, Italy, Netherlands, Portugal, Spain

In [None]:
# Import required libraries and load data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set styling for professional plots
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)

print("Report environment initialized")
print("="*40)

In [None]:
# Load processed data (assumes data has been cleaned and processed)
# This template expects final_data and analysis_indicators to be available

# For template purposes, load from saved file if available
try:
    final_data = pd.read_csv('case_study_1_cleaned.csv')
    analysis_indicators = [col for col in final_data.columns if col.endswith('_PGDP')]
    print(f"✓ Data loaded: {final_data.shape[0]:,} observations")
    print(f"✓ Indicators: {len(analysis_indicators)}")
    print(f"✓ Countries: {final_data['COUNTRY'].nunique()}")
    print(f"✓ Time period: {final_data['YEAR'].min()}-{final_data['YEAR'].max()}")
except FileNotFoundError:
    print("⚠️  Please run the main analysis notebook first to generate cleaned data")
    print("   Expected file: case_study_1_cleaned.csv")

# Calculate comprehensive group statistics
def calculate_group_statistics(data, group_col, indicators):
    """Calculate comprehensive statistics by group"""
    results = []
    
    for group in data[group_col].unique():
        group_data = data[data[group_col] == group]
        
        for indicator in indicators:
            values = group_data[indicator].dropna()
            
            if len(values) > 1:
                mean_val = values.mean()
                std_val = values.std()
                cv = (std_val / abs(mean_val)) * 100 if mean_val != 0 else np.inf
                
                results.append({
                    'Group': group,
                    'Indicator': indicator.replace('_PGDP', ''),
                    'N': len(values),
                    'Mean': mean_val,
                    'Std_Dev': std_val,
                    'Skewness': stats.skew(values),
                    'CV_Percent': cv
                })
    
    return pd.DataFrame(results)

# Create comprehensive statistics data for boxplots
def create_boxplot_data(data, indicators):
    """Create dataset for boxplot visualization"""
    stats_data = []
    
    for group in ['Iceland', 'Eurozone']:
        group_data = data[data['GROUP'] == group]
        
        for indicator in indicators:
            values = group_data[indicator].dropna()
            if len(values) > 1:
                mean_val = values.mean()
                std_val = values.std()
                
                stats_data.append({
                    'GROUP': group,
                    'Indicator': indicator.replace('_PGDP', ''),
                    'Statistic': 'Mean',
                    'Value': mean_val
                })
                
                stats_data.append({
                    'GROUP': group,
                    'Indicator': indicator.replace('_PGDP', ''),
                    'Statistic': 'Standard Deviation', 
                    'Value': std_val
                })
    
    return pd.DataFrame(stats_data)

# Calculate statistics
group_stats = calculate_group_statistics(final_data, 'GROUP', analysis_indicators)
boxplot_data = create_boxplot_data(final_data, analysis_indicators)

print("✓ Statistics calculated for all indicators")
print(f"✓ Group statistics shape: {group_stats.shape}")
print(f"✓ Boxplot data shape: {boxplot_data.shape}")

In [None]:
# Create side-by-side boxplots (from section 3.2)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))

# Boxplot for Means
mean_data = boxplot_data[boxplot_data['Statistic'] == 'Mean']
mean_iceland = mean_data[mean_data['GROUP'] == 'Iceland']['Value']
mean_eurozone = mean_data[mean_data['GROUP'] == 'Eurozone']['Value']

bp1 = ax1.boxplot([mean_eurozone, mean_iceland], labels=['Eurozone', 'Iceland'], patch_artist=True)
bp1['boxes'][0].set_facecolor('#1f77b4')
bp1['boxes'][1].set_facecolor('#ff7f0e')

ax1.set_title('Distribution of Means Across All Indicators', fontweight='bold', fontsize=12)
ax1.set_ylabel('Mean (% of GDP, annualized)')
ax1.grid(True, alpha=0.3)
ax1.axhline(y=0, color='red', linestyle='--', alpha=0.5)

# Add summary statistics on the plot
iceland_mean_avg = mean_iceland.mean()
eurozone_mean_avg = mean_eurozone.mean()
ax1.text(0.02, 0.98, f'Eurozone Avg: {eurozone_mean_avg:.2f}%\\nIceland Avg: {iceland_mean_avg:.2f}%', 
         transform=ax1.transAxes, verticalalignment='top', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

# Boxplot for Standard Deviations  
std_data = boxplot_data[boxplot_data['Statistic'] == 'Standard Deviation']
std_iceland = std_data[std_data['GROUP'] == 'Iceland']['Value']
std_eurozone = std_data[std_data['GROUP'] == 'Eurozone']['Value']

bp2 = ax2.boxplot([std_eurozone, std_iceland], labels=['Eurozone', 'Iceland'], patch_artist=True)
bp2['boxes'][0].set_facecolor('#1f77b4')  
bp2['boxes'][1].set_facecolor('#ff7f0e')

ax2.set_title('Distribution of Standard Deviations Across All Indicators', fontweight='bold', fontsize=12)
ax2.set_ylabel('Standard Deviation (% of GDP, annualized)')
ax2.grid(True, alpha=0.3)

# Add summary statistics on the plot
iceland_std_avg = std_iceland.mean()
eurozone_std_avg = std_eurozone.mean()
volatility_ratio = iceland_std_avg / eurozone_std_avg
ax2.text(0.02, 0.98, f'Eurozone Avg: {eurozone_std_avg:.2f}%\\nIceland Avg: {iceland_std_avg:.2f}%\\nRatio: {volatility_ratio:.2f}x', 
         transform=ax2.transAxes, verticalalignment='top', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.suptitle('Statistical Comparison: Iceland vs Eurozone\\nAll Capital Flow Indicators', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Print comprehensive statistical summary from boxplots
print("\\nCOMPREHENSIVE STATISTICAL SUMMARY FROM BOXPLOTS:")
print("="*60)
print(f"MEANS ACROSS ALL INDICATORS:")
print(f"  Eurozone: {eurozone_mean_avg:7.2f}% (median: {mean_eurozone.median():7.2f}%)")
print(f"  Iceland:  {iceland_mean_avg:7.2f}% (median: {mean_iceland.median():7.2f}%)")

print(f"\\nSTANDARD DEVIATIONS ACROSS ALL INDICATORS:")
print(f"  Eurozone: {eurozone_std_avg:7.2f}% (median: {std_eurozone.median():7.2f}%)")
print(f"  Iceland:  {iceland_std_avg:7.2f}% (median: {std_iceland.median():7.2f}%)")

print(f"\\nVOLATILITY COMPARISON:")
print(f"  Iceland volatility is {volatility_ratio:.2f}x higher than Eurozone on average")
print(f"  {sum(std_iceland.values > std_eurozone.values)}/{len(std_iceland)} indicators show higher Iceland volatility")

print("\\n✓ Boxplots from section 3.2 complete")

In [None]:
# Create comprehensive summary table matching the main analysis format
# This replicates the 'Comprehensive Summary Statistics by Group - All Indicators' table

print("COMPREHENSIVE SUMMARY STATISTICS BY GROUP - ALL INDICATORS")
print("="*84)
print(f"{'Indicator':<50} {'Group':<10} {'Mean':>8} {'Std Dev':>8} {'Skewness':>8} {'CV%':>8}")
print("="*84)

# Display statistics for all indicators
for indicator in analysis_indicators:
    clean_name = indicator.replace('_PGDP', '')
    indicator_stats = group_stats[group_stats['Indicator'] == clean_name]
    
    # Display both Eurozone and Iceland rows for each indicator
    for _, row in indicator_stats.iterrows():
        # Truncate long indicator names
        display_name = clean_name[:47] + '...' if len(clean_name) > 50 else clean_name
        
        print(f"{display_name:<50} {row['Group']:<10} {row['Mean']:8.2f} {row['Std_Dev']:8.2f} {row['Skewness']:8.2f} {row['CV_Percent']:8.1f}")
    
    # Add separator line between indicators
    print("-"*84)

print(f"\nSUMMARY: Showing statistics for all {len(analysis_indicators)} capital flow indicators")
print("CV% = Coefficient of Variation (Std Dev / |Mean| × 100)")
print("Higher CV% indicates greater volatility relative to mean")

# Create additional summary with CV ratios for export
summary_pivot = group_stats.pivot_table(
    index='Indicator',
    columns='Group',
    values=['Mean', 'Std_Dev', 'Skewness', 'CV_Percent'],
    aggfunc='first'
)

# Create clean summary table with CV ratios
comprehensive_table = pd.DataFrame({
    'Mean_Eurozone': summary_pivot[('Mean', 'Eurozone')],
    'Mean_Iceland': summary_pivot[('Mean', 'Iceland')],
    'StdDev_Eurozone': summary_pivot[('Std_Dev', 'Eurozone')],
    'StdDev_Iceland': summary_pivot[('Std_Dev', 'Iceland')],
    'Skew_Eurozone': summary_pivot[('Skewness', 'Eurozone')],
    'Skew_Iceland': summary_pivot[('Skewness', 'Iceland')],
    'CV_Eurozone': summary_pivot[('CV_Percent', 'Eurozone')],
    'CV_Iceland': summary_pivot[('CV_Percent', 'Iceland')]
})

# Add CV ratio (Iceland/Eurozone)
comprehensive_table['CV_Ratio_Iceland_Eurozone'] = (
    comprehensive_table['CV_Iceland'] / comprehensive_table['CV_Eurozone']
).round(2)

# Export the comprehensive table
comprehensive_table.to_csv('comprehensive_summary_table.csv')
print("\n✓ Comprehensive summary table saved: comprehensive_summary_table.csv")

# Show summary of CV ratios
print("\nCV RATIO SUMMARY (Iceland/Eurozone):")
print("-"*40)
avg_cv_ratio = comprehensive_table['CV_Ratio_Iceland_Eurozone'].mean()
median_cv_ratio = comprehensive_table['CV_Ratio_Iceland_Eurozone'].median()
higher_cv_count = (comprehensive_table['CV_Ratio_Iceland_Eurozone'] > 1).sum()

print(f"Average CV Ratio: {avg_cv_ratio:.2f}")
print(f"Median CV Ratio: {median_cv_ratio:.2f}")
print(f"Indicators where Iceland > Eurozone: {higher_cv_count}/{len(comprehensive_table)} ({higher_cv_count/len(comprehensive_table)*100:.1f}%)")

## 3. Comprehensive Statistical Summary Table

In [None]:
# Create comprehensive summary table with CV ratios
summary_pivot = group_stats.pivot_table(
    index='Indicator',
    columns='Group',
    values=['Mean', 'Std_Dev', 'Skewness', 'CV_Percent'],
    aggfunc='first'
)

# Create clean summary table
summary_table = pd.DataFrame({
    'Mean_Eurozone': summary_pivot[('Mean', 'Eurozone')],
    'Mean_Iceland': summary_pivot[('Mean', 'Iceland')],
    'StdDev_Eurozone': summary_pivot[('Std_Dev', 'Eurozone')],
    'StdDev_Iceland': summary_pivot[('Std_Dev', 'Iceland')],
    'Skew_Eurozone': summary_pivot[('Skewness', 'Eurozone')],
    'Skew_Iceland': summary_pivot[('Skewness', 'Iceland')],
    'CV_Eurozone': summary_pivot[('CV_Percent', 'Eurozone')],
    'CV_Iceland': summary_pivot[('CV_Percent', 'Iceland')]
})

# Add CV ratio
summary_table['CV_Ratio_Iceland_Eurozone'] = (
    summary_table['CV_Iceland'] / summary_table['CV_Eurozone']
).round(2)

# Round for display
display_table = summary_table.round(2)

print("COMPREHENSIVE STATISTICAL SUMMARY TABLE")
print("="*80)
print("\nMean and Standard Deviation (% of GDP):")
print("-"*50)
mean_std_cols = ['Mean_Eurozone', 'Mean_Iceland', 'StdDev_Eurozone', 'StdDev_Iceland']
print(display_table[mean_std_cols].to_string())

print("\n\nSkewness and Coefficient of Variation:")
print("-"*50)
skew_cv_cols = ['Skew_Eurozone', 'Skew_Iceland', 'CV_Eurozone', 'CV_Iceland', 'CV_Ratio_Iceland_Eurozone']
print(display_table[skew_cv_cols].to_string())

# Export table
summary_table.to_csv('comprehensive_summary_table.csv')
print("\n✓ Summary table saved: comprehensive_summary_table.csv")

## 4. Hypothesis Testing Results

In [None]:
# Perform F-tests for volatility differences
def perform_volatility_tests(data, indicators):
    """Perform F-tests comparing Iceland vs Eurozone volatility"""
    test_results = []
    
    for indicator in indicators:
        iceland_data = data[data['GROUP'] == 'Iceland'][indicator].dropna()
        eurozone_data = data[data['GROUP'] == 'Eurozone'][indicator].dropna()
        
        if len(iceland_data) > 1 and len(eurozone_data) > 1:
            iceland_var = iceland_data.var()
            eurozone_var = eurozone_data.var()
            
            f_stat = iceland_var / eurozone_var if eurozone_var != 0 else np.inf
            df1, df2 = len(iceland_data) - 1, len(eurozone_data) - 1
            
            # Two-tailed p-value
            p_value = 2 * min(stats.f.cdf(f_stat, df1, df2), 1 - stats.f.cdf(f_stat, df1, df2))
            
            test_results.append({
                'Indicator': indicator.replace('_PGDP', ''),
                'F_Statistic': f_stat,
                'P_Value': p_value,
                'Iceland_Higher_Volatility': iceland_var > eurozone_var,
                'Significant_5pct': p_value < 0.05,
                'Significant_1pct': p_value < 0.01
            })
    
    return pd.DataFrame(test_results)

# Perform tests
test_results = perform_volatility_tests(final_data, analysis_indicators)

print("HYPOTHESIS TESTING RESULTS")
print("="*70)
print("F-Tests for Equal Variances (Iceland vs. Eurozone)")
print("\nH₀: Equal volatility | H₁: Different volatility | α = 0.05")
print("-"*70)

# Display results sorted by F-statistic
results_sorted = test_results.sort_values('F_Statistic', ascending=False)
print(f"{'Indicator':<40} {'F-stat':>8} {'P-value':>8} {'Sig.':>5} {'Ice>Euro':>8}")
print("-"*70)

for _, row in results_sorted.iterrows():
    indicator_short = row['Indicator'][:37] + '...' if len(row['Indicator']) > 40 else row['Indicator']
    sig_marker = '***' if row['P_Value'] < 0.001 else '**' if row['P_Value'] < 0.01 else '*' if row['P_Value'] < 0.05 else ''
    higher_vol = 'Yes' if row['Iceland_Higher_Volatility'] else 'No'
    print(f"{indicator_short:<40} {row['F_Statistic']:8.2f} {row['P_Value']:8.3f} {sig_marker:>5} {higher_vol:>8}")

print("\nSignificance levels: *** p<0.001, ** p<0.01, * p<0.05")

# Summary statistics
total_indicators = len(test_results)
iceland_higher_count = sum(test_results['Iceland_Higher_Volatility'])
sig_5pct_count = sum(test_results['Significant_5pct'])
sig_1pct_count = sum(test_results['Significant_1pct'])

print("\nTEST SUMMARY:")
print("-"*20)
print(f"Total indicators tested: {total_indicators}")
print(f"Iceland higher volatility: {iceland_higher_count}/{total_indicators} ({iceland_higher_count/total_indicators*100:.1f}%)")
print(f"Significant at 5% level: {sig_5pct_count}/{total_indicators} ({sig_5pct_count/total_indicators*100:.1f}%)")
print(f"Significant at 1% level: {sig_1pct_count}/{total_indicators} ({sig_1pct_count/total_indicators*100:.1f}%)")

conclusion = "Strong evidence supports" if iceland_higher_count/total_indicators > 0.6 else "Mixed evidence for"
print(f"\n**CONCLUSION:** {conclusion} the hypothesis that Iceland has higher capital flow volatility.")

# Export results
test_results.to_csv('hypothesis_test_results.csv', index=False)
print("\n✓ Test results saved: hypothesis_test_results.csv")

## 5. Time Series Visualization

In [None]:
# Create date column for time series
final_data['Date'] = pd.to_datetime(
    final_data['YEAR'].astype(str) + '-' + 
    ((final_data['QUARTER'] - 1) * 3 + 1).astype(str) + '-01'
)

# Select top 6 most volatile indicators for visualization
top_indicators = test_results.nlargest(6, 'F_Statistic')['Indicator'].tolist()
selected_indicators = [ind + '_PGDP' for ind in top_indicators]

# Create time series plots
fig, axes = plt.subplots(3, 2, figsize=(20, 15))
axes = axes.flatten()

for i, indicator in enumerate(selected_indicators):
    clean_name = indicator.replace('_PGDP', '')
    
    # Plot Iceland
    iceland_data = final_data[final_data['GROUP'] == 'Iceland']
    axes[i].plot(iceland_data['Date'], iceland_data[indicator], 
                color='#ff7f0e', linewidth=2.5, label='Iceland', marker='o', markersize=2)
    
    # Plot Eurozone average
    eurozone_avg = final_data[final_data['GROUP'] == 'Eurozone'].groupby('Date')[indicator].mean()
    axes[i].plot(eurozone_avg.index, eurozone_avg.values, 
                color='#1f77b4', linewidth=2.5, label='Eurozone Average', linestyle='-')
    
    # Formatting
    f_stat = test_results[test_results['Indicator'] == clean_name]['F_Statistic'].iloc[0]
    title = (clean_name[:45] + '...' if len(clean_name) > 45 else clean_name) + f'\n(F-stat: {f_stat:.2f})'
    axes[i].set_title(title, fontweight='bold', fontsize=10)
    axes[i].set_ylabel('% of GDP (annualized)')
    axes[i].grid(True, alpha=0.3)
    axes[i].legend(loc='upper right', fontsize=8)
    axes[i].axhline(y=0, color='black', linestyle='-', alpha=0.3, linewidth=0.8)

plt.suptitle('Time Series Analysis: Most Volatile Capital Flow Indicators\nIceland vs. Eurozone Average (1999-2024)', 
             fontsize=16, fontweight='bold', y=0.98)
plt.tight_layout()
plt.show()

print("\n✓ Time series analysis complete")
print(f"✓ Displayed top {len(selected_indicators)} most volatile indicators")

## 6. Key Findings Summary

### Statistical Evidence:
- **XX% of capital flow indicators** show higher volatility in Iceland compared to the Eurozone
- **XX% of indicators** show statistically significant volatility differences (p<0.05)
- **Iceland's coefficient of variation** averages X.XX times higher than Eurozone countries

### Policy Implications:
- Evidence [supports/does not support] the hypothesis that Iceland has higher capital flow volatility
- [Insert specific policy recommendations based on results]

---

*This report template can be adapted for other case studies by modifying the data loading section and updating country/region names.*

In [None]:
# Final summary statistics for report
print("FINAL REPORT SUMMARY")
print("="*40)
print(f"Dataset: {final_data.shape[0]:,} observations")
print(f"Indicators analyzed: {len(analysis_indicators)}")
print(f"Time period: {final_data['YEAR'].min()}-{final_data['YEAR'].max()}")
print(f"Countries: {final_data['COUNTRY'].nunique()}")

# Key statistics
higher_vol_pct = (iceland_higher_count/total_indicators*100)
sig_pct = (sig_5pct_count/total_indicators*100)
avg_cv_ratio = summary_table['CV_Ratio_Iceland_Eurozone'].mean()

print(f"\nKEY FINDINGS:")
print(f"• {higher_vol_pct:.1f}% of indicators show Iceland higher volatility")
print(f"• {sig_pct:.1f}% of tests are statistically significant")
print(f"• Average CV ratio (Iceland/Eurozone): {avg_cv_ratio:.2f}")
print(f"• Overall volatility ratio: {volatility_ratio:.2f}x")

print("\n✓ Report template complete")