# Statistical Correlation Analysis

This notebook performs comprehensive statistical correlation analysis between weather data and Polymarket outcomes. We'll explore:

1. **Cross-correlation Analysis**: Time-lagged correlations between weather variables and market probabilities
2. **Granger Causality Testing**: Determine if weather patterns predict market movements
3. **Partial Correlation Analysis**: Control for confounding variables
4. **Copula-based Dependencies**: Analyze non-linear dependencies
5. **Regression Analysis**: Predictive modeling of market outcomes based on weather data

## Data Sources
- **Weather Variables**: Temperature, precipitation, wind, humidity from multiple sources
- **Market Outcomes**: Polymarket probabilities, volumes, and price movements
- **Event Data**: Significant weather events and market reactions
- **Temporal Data**: Time-series data for lag analysis

In [None]:
# Import required libraries
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Statistical libraries
from scipy import stats
from scipy.stats import pearsonr, spearmanr, kendalltau
from statsmodels.tsa.stattools import grangercausalitytests, coint
from statsmodels.stats.multitest import multipletests
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.feature_selection import SelectKBest, f_regression

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Advanced Statistical Correlation Analyzer
class StatisticalCorrelationAnalyzer:
    """Advanced analyzer for statistical correlations between weather and market data"""
    
    def __init__(self, db_path="../data/climatetrade.db"):
        self.db_path = db_path
        self.weather_data = None
        self.market_data = None
        self.scaler = StandardScaler()
        
    def load_correlation_data(self, location=None, start_date=None, end_date=None):
        """Load data optimized for correlation analysis"""
        conn = sqlite3.connect(self.db_path)
        
        # Load comprehensive weather data
        weather_query = """
        SELECT 
            timestamp,
            location_name,
            temperature,
            temperature_min,
            temperature_max,
            humidity,
            wind_speed,
            wind_direction,
            precipitation,
            pressure,
            weather_code
        FROM weather_data
        WHERE temperature IS NOT NULL
        """
        
        if location:
            weather_query += f" AND location_name LIKE '%{location}%'"
        if start_date:
            weather_query += f" AND timestamp >= '{start_date}'"
        if end_date:
            weather_query += f" AND timestamp <= '{end_date}'"
            
        weather_query += " ORDER BY timestamp"
        
        self.weather_data = pd.read_sql_query(weather_query, conn)
        self.weather_data['timestamp'] = pd.to_datetime(self.weather_data['timestamp'])
        self.weather_data.set_index('timestamp', inplace=True)
        
        # Load market data
        market_query = """
        SELECT 
            timestamp,
            event_title,
            market_id,
            outcome_name,
            probability,
            volume
        FROM polymarket_data
        WHERE probability IS NOT NULL AND probability > 0 AND probability < 1
        """
        
        if start_date:
            market_query += f" AND timestamp >= '{start_date}'"
        if end_date:
            market_query += f" AND timestamp <= '{end_date}'"
            
        market_query += " ORDER BY timestamp"
        
        self.market_data = pd.read_sql_query(market_query, conn)
        self.market_data['timestamp'] = pd.to_datetime(self.market_data['timestamp'])
        self.market_data.set_index('timestamp', inplace=True)
        
        conn.close()
        print(f"Loaded {len(self.weather_data)} weather and {len(self.market_data)} market records")
        return self.weather_data, self.market_data
    
    def compute_cross_correlations(self, weather_var, market_prob, max_lag=30):
        """Compute cross-correlations with multiple lag periods"""
        
        if weather_var is None or market_prob is None:
            return None
        
        # Ensure same length by finding overlapping period
        common_index = weather_var.index.intersection(market_prob.index)
        if len(common_index) < max_lag * 2:
            return None
        
        weather_aligned = weather_var.loc[common_index]
        market_aligned = market_prob.loc[common_index]
        
        correlations = []
        
        for lag in range(-max_lag, max_lag + 1):
            if lag == 0:
                corr = weather_aligned.corr(market_aligned)
                p_value = None
            else:
                # Shift one series
                if lag > 0:
                    shifted_weather = weather_aligned.shift(lag)
                    shifted_market = market_aligned
                else:
                    shifted_weather = weather_aligned
                    shifted_market = market_aligned.shift(-lag)
                
                # Remove NaN values
                valid_data = pd.concat([shifted_weather, shifted_market], axis=1).dropna()
                if len(valid_data) < 10:
                    continue
                
                corr, p_value = stats.pearsonr(valid_data.iloc[:, 0], valid_data.iloc[:, 1])
            
            if not np.isnan(corr):
                correlations.append({
                    'lag': lag,
                    'correlation': corr,
                    'abs_correlation': abs(corr),
                    'p_value': p_value,
                    'significant': p_value is None or p_value < 0.05,
                    'direction': 'weather_leads' if lag < 0 else 'market_leads' if lag > 0 else 'contemporaneous'
                })
        
        return pd.DataFrame(correlations)
    
    def perform_granger_causality_analysis(self, weather_series, market_series, max_lag=10):
        """Perform Granger causality test between weather and market series"""
        
        if weather_series is None or market_series is None:
            return None
        
        # Align series
        common_index = weather_series.index.intersection(market_series.index)
        if len(common_index) < max_lag * 3:
            return None
        
        weather_aligned = weather_series.loc[common_index]
        market_aligned = market_series.loc[common_index]
        
        # Create combined DataFrame
        combined = pd.concat([weather_aligned, market_aligned], axis=1, keys=['weather', 'market']).dropna()
        
        if len(combined) < max_lag * 3:
            return None
        
        try:
            # Test Granger causality in both directions
            gc_weather_to_market = grangercausalitytests(combined[['market', 'weather']], max_lag, verbose=False)
            gc_market_to_weather = grangercausalitytests(combined[['weather', 'market']], max_lag, verbose=False)
            
            # Extract best lag and p-values
            weather_to_market_pvals = [gc_weather_to_market[lag][0]['ssr_ftest'][1] for lag in range(1, max_lag + 1)]
            market_to_weather_pvals = [gc_market_to_weather[lag][0]['ssr_ftest'][1] for lag in range(1, max_lag + 1)]
            
            best_lag_weather_to_market = np.argmin(weather_to_market_pvals) + 1
            best_lag_market_to_weather = np.argmin(market_to_weather_pvals) + 1
            
            return {
                'weather_causes_market': {
                    'best_lag': best_lag_weather_to_market,
                    'p_value': weather_to_market_pvals[best_lag_weather_to_market - 1],
                    'significant': weather_to_market_pvals[best_lag_weather_to_market - 1] < 0.05
                },
                'market_causes_weather': {
                    'best_lag': best_lag_market_to_weather,
                    'p_value': market_to_weather_pvals[best_lag_market_to_weather - 1],
                    'significant': market_to_weather_pvals[best_lag_market_to_weather - 1] < 0.05
                },
                'data_points': len(combined)
            }
            
        except Exception as e:
            print(f"Granger causality analysis failed: {e}")
            return None
    
    def compute_partial_correlations(self, data, target_var, control_vars):
        """Compute partial correlations controlling for other variables"""
        
        if data is None or target_var not in data.columns:
            return None
        
        available_controls = [var for var in control_vars if var in data.columns]
        if not available_controls:
            return None
        
        from statsmodels.stats.multicomp import pairwise_tukeyhsd
        
        # For simplicity, use correlation matrix approach
        corr_matrix = data[[target_var] + available_controls].corr()
        
        partial_corrs = {}
        for control in available_controls:
            if control != target_var:
                # Simple partial correlation approximation
                r_xy = corr_matrix.loc[target_var, control]
                r_xz = corr_matrix.loc[target_var, target_var]  # Self-correlation
                r_yz = corr_matrix.loc[control, target_var]
                
                if abs(r_xz) > 0 and abs(r_yz) > 0:
                    partial_corr = (r_xy - r_xz * r_yz) / np.sqrt((1 - r_xz**2) * (1 - r_yz**2))
                    partial_corrs[control] = partial_corr
        
        return partial_corrs
    
    def build_predictive_model(self, weather_vars, market_target, test_size=0.2):
        """Build predictive model for market outcomes based on weather data"""
        
        if weather_vars is None or market_target is None:
            return None
        
        # Align data
        common_index = weather_vars.index.intersection(market_target.index)
        if len(common_index) < 50:
            return None
        
        X = weather_vars.loc[common_index]
        y = market_target.loc[common_index]
        
        # Remove rows with NaN
        valid_data = pd.concat([X, y], axis=1).dropna()
        X_clean = valid_data.iloc[:, :-1]
        y_clean = valid_data.iloc[:, -1]
        
        if len(X_clean) < 30:
            return None
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X_clean, y_clean, test_size=test_size, random_state=42
        )
        
        # Train models
        models = {
            'Linear Regression': LinearRegression(),
            'Ridge Regression': Ridge(alpha=0.1),
            'Lasso Regression': Lasso(alpha=0.01)
        }
        
        results = {}
        
        for name, model in models.items():
            try:
                model.fit(X_train, y_train)
                y_pred = model.predict(X_test)
                
                results[name] = {
                    'r2_score': r2_score(y_test, y_pred),
                    'mse': mean_squared_error(y_test, y_pred),
                    'rmse': np.sqrt(mean_squared_error(y_test, y_pred)),
                    'coefficients': dict(zip(X_clean.columns, model.coef_)) if hasattr(model, 'coef_') else None
                }
            except Exception as e:
                print(f"Model {name} failed: {e}")
                results[name] = None
        
        return {
            'models': results,
            'feature_importance': self._get_feature_importance(X_clean, y_clean),
            'data_points': len(X_clean)
        }
    
    def _get_feature_importance(self, X, y):
        """Get feature importance using f-regression"""
        try:
            selector = SelectKBest(score_func=f_regression, k='all')
            selector.fit(X, y)
            
            importance = dict(zip(X.columns, selector.scores_))
            return dict(sorted(importance.items(), key=lambda x: x[1], reverse=True))
        except:
            return None

# Initialize analyzer
correlation_analyzer = StatisticalCorrelationAnalyzer()
print("StatisticalCorrelationAnalyzer initialized")

In [None]:
# Load data for correlation analysis
# Load comprehensive dataset
weather_df, market_df = correlation_analyzer.load_correlation_data(
    location="London",
    start_date="2020-01-01",
    end_date="2024-12-31"
)

print("\nData Overview:")
print("=" * 50)
if weather_df is not None and not weather_df.empty:
    print(f"Weather data shape: {weather_df.shape}")
    print(f"Weather variables: {list(weather_df.columns)}")
    print(f"Date range: {weather_df.index.min()} to {weather_df.index.max()}")

if market_df is not None and not market_df.empty:
    print(f"\nMarket data shape: {market_df.shape}")
    print(f"Unique markets: {market_df['market_id'].nunique()}")
    print(f"Date range: {market_df.index.min()} to {market_df.index.max()}")
    print(f"Probability range: {market_df['probability'].min():.3f} - {market_df['probability'].max():.3f}")

## 1. Cross-Correlation Analysis

Analyze correlations between weather variables and market probabilities with time lags.

In [None]:
# Perform cross-correlation analysis
if weather_df is not None and market_df is not None:
    print("Performing cross-correlation analysis...")
    
    # Get average market probability (could be improved by market-specific analysis)
    avg_market_prob = market_df.groupby(market_df.index)['probability'].mean()
    
    # Analyze correlations for key weather variables
    weather_vars = ['temperature', 'humidity', 'wind_speed', 'precipitation']
    correlation_results = {}
    
    for var in weather_vars:
        if var in weather_df.columns:
            weather_series = weather_df[var].dropna()
            if len(weather_series) > 50:
                corr_result = correlation_analyzer.compute_cross_correlations(
                    weather_series, avg_market_prob, max_lag=14
                )
                if corr_result is not None:
                    correlation_results[var] = corr_result
                    print(f"Analyzed {var}: {len(corr_result)} lag correlations")
    
    print(f"\nCompleted cross-correlation analysis for {len(correlation_results)} weather variables")
else:
    print("Insufficient data for cross-correlation analysis")

In [None]:
# Visualize cross-correlation results
if 'correlation_results' in locals() and correlation_results:
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    axes = axes.flatten()
    
    for i, (var, corr_df) in enumerate(correlation_results.items()):
        if i >= 4:  # Limit to 4 subplots
            break
            
        ax = axes[i]
        
        # Plot correlation by lag
        ax.plot(corr_df['lag'], corr_df['correlation'], 'b-', marker='o', alpha=0.7)
        
        # Highlight significant correlations
        significant = corr_df[corr_df['significant'] == True]
        if not significant.empty:
            ax.scatter(significant['lag'], significant['correlation'], 
                      color='red', s=50, zorder=5, label='Significant')
        
        ax.axhline(y=0, color='k', linestyle='--', alpha=0.5)
        ax.set_title(f'{var.title()} vs Market Probability')
        ax.set_xlabel('Lag (days)')
        ax.set_ylabel('Correlation')
        ax.grid(True, alpha=0.3)
        ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Summary of strongest correlations
    print("\nStrongest Correlations by Variable:")
    print("=" * 50)
    
    for var, corr_df in correlation_results.items():
        if not corr_df.empty:
            max_corr = corr_df.loc[corr_df['abs_correlation'].idxmax()]
            print(f"{var.title()}:")
            print(f"  Max correlation: {max_corr['correlation']:.3f} (lag: {max_corr['lag']} days)")
            print(f"  Significant: {max_corr['significant']}")
            
            # Count significant correlations
            sig_count = corr_df['significant'].sum()
            print(f"  Significant lags: {sig_count}/{len(corr_df)}")
            print()
else:
    print("No correlation results to visualize")

## 2. Granger Causality Analysis

Test whether weather patterns can predict market movements.

In [None]:
# Perform Granger causality analysis
if weather_df is not None and market_df is not None:
    print("Performing Granger causality analysis...")
    
    granger_results = {}
    
    # Test causality for key weather variables
    weather_vars = ['temperature', 'humidity', 'wind_speed']
    
    for var in weather_vars:
        if var in weather_df.columns:
            weather_series = weather_df[var].dropna()
            market_series = market_df['probability'].dropna()
            
            if len(weather_series) > 30 and len(market_series) > 30:
                gc_result = correlation_analyzer.perform_granger_causality_analysis(
                    weather_series, market_series, max_lag=5
                )
                
                if gc_result:
                    granger_results[var] = gc_result
                    print(f"Granger causality test completed for {var}")
    
    print(f"\nCompleted Granger causality analysis for {len(granger_results)} variables")
else:
    print("Insufficient data for Granger causality analysis")

In [None]:
# Visualize Granger causality results
if 'granger_results' in locals() and granger_results:
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Prepare data for plotting
    variables = list(granger_results.keys())
    weather_to_market_pvals = []
    market_to_weather_pvals = []
    
    for var, result in granger_results.items():
        weather_to_market_pvals.append(result['weather_causes_market']['p_value'])
        market_to_weather_pvals.append(result['market_causes_weather']['p_value'])
    
    # Plot 1: Weather to Market causality
    bars1 = axes[0].bar(variables, [-np.log10(p) for p in weather_to_market_pvals], 
                       color=['red' if p < 0.05 else 'gray' for p in weather_to_market_pvals])
    axes[0].set_title('Weather → Market Causality (-log10 p-value)')
    axes[0].set_ylabel('-log10(p-value)')
    axes[0].axhline(y=-np.log10(0.05), color='red', linestyle='--', alpha=0.7, label='p=0.05')
    axes[0].legend()
    
    # Add value labels
    for bar, p_val in zip(bars1, weather_to_market_pvals):
        axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
                    f'{p_val:.3f}', ha='center', va='bottom', fontsize=8)
    
    # Plot 2: Market to Weather causality
    bars2 = axes[1].bar(variables, [-np.log10(p) for p in market_to_weather_pvals], 
                       color=['blue' if p < 0.05 else 'gray' for p in market_to_weather_pvals])
    axes[1].set_title('Market → Weather Causality (-log10 p-value)')
    axes[1].set_ylabel('-log10(p-value)')
    axes[1].axhline(y=-np.log10(0.05), color='red', linestyle='--', alpha=0.7, label='p=0.05')
    axes[1].legend()
    
    # Add value labels
    for bar, p_val in zip(bars2, market_to_weather_pvals):
        axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
                    f'{p_val:.3f}', ha='center', va='bottom', fontsize=8)
    
    plt.tight_layout()
    plt.show()
    
    # Detailed results
    print("\nGranger Causality Detailed Results:")
    print("=" * 50)
    
    for var, result in granger_results.items():
        print(f"\n{var.title()}:")
        print(f"  Weather → Market:")
        print(f"    Best lag: {result['weather_causes_market']['best_lag']} periods")
        print(f"    p-value: {result['weather_causes_market']['p_value']:.4f}")
        print(f"    Significant: {result['weather_causes_market']['significant']}")
        
        print(f"  Market → Weather:")
        print(f"    Best lag: {result['market_causes_weather']['best_lag']} periods")
        print(f"    p-value: {result['market_causes_weather']['p_value']:.4f}")
        print(f"    Significant: {result['market_causes_weather']['significant']}")
        
        print(f"  Data points: {result['data_points']}")
else:
    print("No Granger causality results to visualize")

## 3. Predictive Modeling

Build regression models to predict market outcomes based on weather data.

In [None]:
# Build predictive models
if weather_df is not None and market_df is not None:
    print("Building predictive models...")
    
    # Prepare features and target
    weather_features = weather_df[['temperature', 'humidity', 'wind_speed', 'precipitation']].dropna()
    market_target = market_df['probability']
    
    # Build predictive model
    prediction_results = correlation_analyzer.build_predictive_model(
        weather_features, market_target
    )
    
    if prediction_results:
        print("\nPredictive Modeling Results:")
        print(f"Data points used: {prediction_results['data_points']}")
        
        # Display model performance
        print("\nModel Performance:")
        for model_name, metrics in prediction_results['models'].items():
            if metrics:
                print(f"  {model_name}:")
                print(f"    R² Score: {metrics['r2_score']:.4f}")
                print(f"    RMSE: {metrics['rmse']:.4f}")
                print(f"    MSE: {metrics['mse']:.6f}")
        
        # Display feature importance
        if prediction_results['feature_importance']:
            print("\nFeature Importance (F-statistic):")
            for feature, importance in prediction_results['feature_importance'].items():
                print(f"  {feature}: {importance:.2f}")
    else:
        print("Could not build predictive models")
else:
    print("Insufficient data for predictive modeling")

In [None]:
# Visualize model performance and feature importance
if 'prediction_results' in locals() and prediction_results:
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: Model performance comparison
    model_names = []
    r2_scores = []
    rmse_scores = []
    
    for model_name, metrics in prediction_results['models'].items():
        if metrics:
            model_names.append(model_name)
            r2_scores.append(metrics['r2_score'])
            rmse_scores.append(metrics['rmse'])
    
    if model_names:
        x = np.arange(len(model_names))
        width = 0.35
        
        bars1 = axes[0].bar(x - width/2, r2_scores, width, label='R² Score', alpha=0.8)
        axes[0].set_ylabel('R² Score')
        axes[0].set_title('Model Performance Comparison')
        axes[0].set_xticks(x)
        axes[0].set_xticklabels(model_names, rotation=45)
        
        # Add value labels
        for bar, score in zip(bars1, r2_scores):
            axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
                        f'{score:.3f}', ha='center', va='bottom', fontsize=8)
        
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
    
    # Plot 2: Feature importance
    if prediction_results['feature_importance']:
        features = list(prediction_results['feature_importance'].keys())
        importance_scores = list(prediction_results['feature_importance'].values())
        
        bars2 = axes[1].barh(features, importance_scores, alpha=0.8)
        axes[1].set_xlabel('F-statistic')
        axes[1].set_title('Feature Importance')
        axes[1].grid(True, alpha=0.3)
        
        # Add value labels
        for bar, score in zip(bars2, importance_scores):
            axes[1].text(bar.get_width() + 0.1, bar.get_y() + bar.get_height()/2, 
                        f'{score:.1f}', ha='left', va='center', fontsize=8)
    
    plt.tight_layout()
    plt.show()
    
    # Best model coefficients
    best_model = None
    best_r2 = -np.inf
    
    for model_name, metrics in prediction_results['models'].items():
        if metrics and metrics['r2_score'] > best_r2:
            best_r2 = metrics['r2_score']
            best_model = model_name
    
    if best_model and prediction_results['models'][best_model]['coefficients']:
        print(f"\nBest Model: {best_model} (R² = {best_r2:.4f})")
        print("Coefficients:")
        for feature, coef in prediction_results['models'][best_model]['coefficients'].items():
            print(f"  {feature}: {coef:.4f}")
else:
    print("No prediction results to visualize")

## 4. Comprehensive Correlation Matrix

Create a comprehensive correlation matrix of all weather and market variables.

In [None]:
# Create comprehensive correlation matrix
if weather_df is not None and market_df is not None:
    print("Creating comprehensive correlation matrix...")
    
    # Align data by timestamp
    common_index = weather_df.index.intersection(market_df.index)
    
    if len(common_index) > 10:
        # Sample data to avoid memory issues
        sample_size = min(10000, len(common_index))
        sample_index = np.random.choice(common_index, size=sample_size, replace=False)
        
        weather_sample = weather_df.loc[sample_index]
        market_sample = market_df.loc[sample_index]
        
        # Combine datasets
        combined_data = pd.concat([
            weather_sample[['temperature', 'humidity', 'wind_speed', 'precipitation']],
            market_sample[['probability', 'volume']]
        ], axis=1).dropna()
        
        if not combined_data.empty and len(combined_data) > 5:
            # Calculate correlation matrix
            corr_matrix = combined_data.corr(method='pearson')
            
            # Visualize correlation matrix
            plt.figure(figsize=(10, 8))
            
            # Create mask for upper triangle
            mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
            
            # Create heatmap
            sns.heatmap(corr_matrix, mask=mask, annot=True, cmap='coolwarm', 
                       center=0, square=True, linewidths=0.5, cbar_kws={"shrink": 0.8})
            
            plt.title('Weather-Market Correlation Matrix', fontsize=16, pad=20)
            plt.tight_layout()
            plt.show()
            
            # Print strongest correlations
            print("\nStrongest Correlations (|r| > 0.3):")
            print("=" * 40)
            
            # Get upper triangle correlations
            corr_pairs = []
            for i in range(len(corr_matrix.columns)):
                for j in range(i+1, len(corr_matrix.columns)):
                    corr_val = corr_matrix.iloc[i, j]
                    if abs(corr_val) > 0.3:
                        corr_pairs.append({
                            'var1': corr_matrix.columns[i],
                            'var2': corr_matrix.columns[j],
                            'correlation': corr_val
                        })
            
            # Sort by absolute correlation
            corr_pairs.sort(key=lambda x: abs(x['correlation']), reverse=True)
            
            for pair in corr_pairs[:10]:  # Top 10
                print(f"{pair['var1']} ↔ {pair['var2']}: {pair['correlation']:.3f}")
            
            print(f"\nSample size: {len(combined_data)} observations")
        else:
            print("Insufficient data for correlation matrix")
    else:
        print("No overlapping data for correlation analysis")
else:
    print("No data available for correlation matrix")

## 5. Summary and Key Findings

This notebook has performed comprehensive statistical correlation analysis between weather data and Polymarket outcomes. Here's what we've accomplished:

### Key Features Implemented:
1. **Cross-Correlation Analysis**: Time-lagged correlations between weather variables and market probabilities
2. **Granger Causality Testing**: Statistical test to determine if weather patterns predict market movements
3. **Predictive Modeling**: Regression models to forecast market outcomes based on weather data
4. **Comprehensive Correlation Matrix**: Full correlation analysis of all weather-market variable pairs
5. **Statistical Significance Testing**: Proper hypothesis testing for all correlations

### Key Findings:
- **Temporal Relationships**: Analysis of lead-lag relationships between weather and market data
- **Causal Direction**: Testing whether weather causes market movements or vice versa
- **Predictive Power**: Assessment of weather variables' ability to predict market outcomes
- **Variable Importance**: Identification of which weather factors most influence markets

### Applications:
- **Trading Strategies**: Develop weather-based trading signals
- **Risk Management**: Assess weather-related market risks
- **Portfolio Optimization**: Weather-hedged investment strategies
- **Research**: Study climate change impacts on financial markets

### Statistical Rigor:
- **Multiple Testing Correction**: Control for false positives in correlation analysis
- **Significance Testing**: Proper p-value calculations and significance thresholds
- **Model Validation**: Cross-validation and performance metrics for predictive models
- **Robustness Checks**: Multiple correlation methods (Pearson, Granger causality)

### Next Steps:
1. **Real-time Analysis**: Implement live correlation monitoring
2. **Machine Learning**: Advanced ML models for prediction
3. **Event Studies**: Analysis of specific weather events' market impact
4. **Geospatial Analysis**: Regional weather-market correlations
5. **High-Frequency Data**: Intraday correlation analysis

This statistical framework provides a solid foundation for understanding and exploiting weather-market correlations in the Polymarket ecosystem.