# Weather Pattern Analysis: Climate Science Fundamentals

Learn climate science basics by analyzing 10 years of weather station data.

## Dataset

Monthly weather observations (2015-2024) from a single station:
- **Temperature**: Air temperature (°C)
- **Precipitation**: Monthly rainfall (mm)
- **Humidity**: Relative humidity (%)
- **Pressure**: Atmospheric pressure (hPa)
- **Wind Speed**: Average wind speed (km/h)

## Methods
- Time series visualization
- Seasonal decomposition
- Trend analysis
- Anomaly detection
- Climate normals (30-year averages, simplified for 10-year dataset)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('Set2')
%matplotlib inline

print("✓ Setup complete")

## 1. Load and Explore Data

In [None]:
# Load weather data
df = pd.read_csv('sample_weather_data.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

print(f"Dataset shape: {df.shape}")
print(f"Time period: {df.index.min().strftime('%Y-%m')} to {df.index.max().strftime('%Y-%m')}")
print(f"Variables: {', '.join(df.columns)}")

df.head(12)

In [None]:
# Summary statistics
print("Weather Statistics (2015-2024):")
print(df.describe().round(2))

## 2. Time Series Visualization

In [None]:
# Plot all variables
fig, axes = plt.subplots(5, 1, figsize=(14, 16))

# Temperature
axes[0].plot(df.index, df['temperature_celsius'], linewidth=2, color='crimson')
axes[0].set_ylabel('Temperature (°C)', fontsize=11)
axes[0].set_title('Temperature Time Series', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Precipitation
axes[1].bar(df.index, df['precipitation_mm'], width=20, color='steelblue', alpha=0.7)
axes[1].set_ylabel('Precipitation (mm)', fontsize=11)
axes[1].set_title('Precipitation Time Series', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

# Humidity
axes[2].plot(df.index, df['humidity_percent'], linewidth=2, color='green')
axes[2].set_ylabel('Humidity (%)', fontsize=11)
axes[2].set_title('Relative Humidity Time Series', fontsize=12, fontweight='bold')
axes[2].grid(True, alpha=0.3)

# Pressure
axes[3].plot(df.index, df['pressure_hpa'], linewidth=2, color='purple')
axes[3].set_ylabel('Pressure (hPa)', fontsize=11)
axes[3].set_title('Atmospheric Pressure Time Series', fontsize=12, fontweight='bold')
axes[3].grid(True, alpha=0.3)

# Wind Speed
axes[4].plot(df.index, df['wind_speed_kmh'], linewidth=2, color='orange')
axes[4].set_ylabel('Wind Speed (km/h)', fontsize=11)
axes[4].set_xlabel('Date', fontsize=11)
axes[4].set_title('Wind Speed Time Series', fontsize=12, fontweight='bold')
axes[4].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Note: Clear seasonal patterns visible in temperature and precipitation.")
print("Temperature shows potential warming trend over the decade.")

## 3. Seasonal Patterns

In [None]:
# Add month and year columns
df['month'] = df.index.month
df['year'] = df.index.year

# Calculate monthly climatology (average by month)
monthly_climatology = df.groupby('month')[['temperature_celsius', 'precipitation_mm']].mean()

# Plot seasonal cycle
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Temperature seasonality
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
axes[0].plot(range(1, 13), monthly_climatology['temperature_celsius'], 
             marker='o', markersize=8, linewidth=2, color='crimson')
axes[0].set_xticks(range(1, 13))
axes[0].set_xticklabels(months, rotation=45)
axes[0].set_ylabel('Temperature (°C)', fontsize=11)
axes[0].set_title('Average Temperature by Month', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Precipitation seasonality
axes[1].bar(range(1, 13), monthly_climatology['precipitation_mm'], 
            color='steelblue', alpha=0.7, edgecolor='black')
axes[1].set_xticks(range(1, 13))
axes[1].set_xticklabels(months, rotation=45)
axes[1].set_ylabel('Precipitation (mm)', fontsize=11)
axes[1].set_title('Average Precipitation by Month', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nMonthly Climatology:")
print(monthly_climatology.round(2))

## 4. Seasonal Decomposition

In [None]:
# Perform seasonal decomposition on temperature
decomposition = seasonal_decompose(df['temperature_celsius'], model='additive', period=12)

# Plot decomposition
fig, axes = plt.subplots(4, 1, figsize=(14, 12))

# Original
axes[0].plot(df.index, decomposition.observed, linewidth=2, color='black')
axes[0].set_ylabel('Observed', fontsize=11)
axes[0].set_title('Temperature Decomposition', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Trend
axes[1].plot(df.index, decomposition.trend, linewidth=2, color='crimson')
axes[1].set_ylabel('Trend', fontsize=11)
axes[1].grid(True, alpha=0.3)

# Seasonal
axes[2].plot(df.index, decomposition.seasonal, linewidth=2, color='green')
axes[2].set_ylabel('Seasonal', fontsize=11)
axes[2].grid(True, alpha=0.3)

# Residual
axes[3].plot(df.index, decomposition.resid, linewidth=2, color='purple')
axes[3].set_ylabel('Residual', fontsize=11)
axes[3].set_xlabel('Date', fontsize=11)
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Decomposition separates:")
print("  - Trend: Long-term pattern (warming signal)")
print("  - Seasonal: Annual cycle (summer-winter)")
print("  - Residual: Random variations and anomalies")

## 5. Trend Analysis

In [None]:
# Calculate annual averages
annual_temp = df.groupby('year')['temperature_celsius'].mean()

# Fit linear trend
X = annual_temp.index.values.reshape(-1, 1)
y = annual_temp.values

model = LinearRegression()
model.fit(X, y)
trend_line = model.predict(X)

# Calculate warming rate
warming_rate_per_year = model.coef_[0]
warming_rate_per_decade = warming_rate_per_year * 10

# Statistical significance
slope, intercept, r_value, p_value, std_err = stats.linregress(X.flatten(), y)

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(annual_temp.index, annual_temp.values, marker='o', markersize=10, 
        linewidth=2, label='Annual Average', color='crimson')
ax.plot(annual_temp.index, trend_line, '--', linewidth=2, 
        label=f'Trend: +{warming_rate_per_decade:.2f}°C/decade', color='darkred')
ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Temperature (°C)', fontsize=12)
ax.set_title('Annual Temperature Trend', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nTrend Analysis Results:")
print(f"  Warming rate: {warming_rate_per_decade:.3f}°C per decade")
print(f"  R²: {r_value**2:.3f}")
print(f"  p-value: {p_value:.4f}")
if p_value < 0.05:
    print("  ✓ Trend is statistically significant (p < 0.05)")
else:
    print("  ✗ Trend not statistically significant")

print(f"\n  Total warming (2015-2024): {(annual_temp.iloc[-1] - annual_temp.iloc[0]):.2f}°C")

## 6. Anomaly Detection

In [None]:
# Calculate temperature anomalies (deviation from monthly climatology)
df['temp_anomaly'] = df.apply(
    lambda row: row['temperature_celsius'] - monthly_climatology.loc[row['month'], 'temperature_celsius'],
    axis=1
)

# Identify extreme months (>2 standard deviations)
anomaly_std = df['temp_anomaly'].std()
anomaly_threshold = 2 * anomaly_std

hot_anomalies = df[df['temp_anomaly'] > anomaly_threshold]
cold_anomalies = df[df['temp_anomaly'] < -anomaly_threshold]

# Plot anomalies
fig, ax = plt.subplots(figsize=(14, 6))

# Plot all anomalies
colors = ['red' if x > 0 else 'blue' for x in df['temp_anomaly']]
ax.bar(df.index, df['temp_anomaly'], width=20, color=colors, alpha=0.6)

# Mark threshold lines
ax.axhline(anomaly_threshold, color='red', linestyle='--', linewidth=2, 
          label=f'Extreme hot threshold (+{anomaly_threshold:.1f}°C)')
ax.axhline(-anomaly_threshold, color='blue', linestyle='--', linewidth=2, 
          label=f'Extreme cold threshold (-{anomaly_threshold:.1f}°C)')
ax.axhline(0, color='black', linestyle='-', linewidth=1, alpha=0.5)

ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Temperature Anomaly (°C)', fontsize=12)
ax.set_title('Temperature Anomalies from Monthly Climatology', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\nExtreme Temperature Events:")
print(f"\nExtreme HOT months (>{anomaly_threshold:.1f}°C above normal): {len(hot_anomalies)}")
if len(hot_anomalies) > 0:
    for idx, row in hot_anomalies.iterrows():
        print(f"  {idx.strftime('%Y-%m')}: +{row['temp_anomaly']:.2f}°C (absolute: {row['temperature_celsius']:.1f}°C)")

print(f"\nExtreme COLD months (<-{anomaly_threshold:.1f}°C below normal): {len(cold_anomalies)}")
if len(cold_anomalies) > 0:
    for idx, row in cold_anomalies.iterrows():
        print(f"  {idx.strftime('%Y-%m')}: {row['temp_anomaly']:.2f}°C (absolute: {row['temperature_celsius']:.1f}°C)")

## 7. Precipitation Analysis

In [None]:
# Annual precipitation totals
annual_precip = df.groupby('year')['precipitation_mm'].sum()

# Calculate trend
X_precip = annual_precip.index.values.reshape(-1, 1)
y_precip = annual_precip.values

model_precip = LinearRegression()
model_precip.fit(X_precip, y_precip)
precip_trend = model_precip.predict(X_precip)

precip_change_per_decade = model_precip.coef_[0] * 10

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
ax.bar(annual_precip.index, annual_precip.values, color='steelblue', 
       alpha=0.7, edgecolor='black', label='Annual Total')
ax.plot(annual_precip.index, precip_trend, '--', linewidth=2, color='darkblue',
       label=f'Trend: {precip_change_per_decade:+.1f} mm/decade')
ax.axhline(annual_precip.mean(), color='red', linestyle=':', linewidth=2, 
          label=f'Mean: {annual_precip.mean():.1f} mm/year')
ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Annual Precipitation (mm)', fontsize=12)
ax.set_title('Annual Precipitation Totals', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\nPrecipitation Statistics:")
print(f"  Mean annual: {annual_precip.mean():.1f} mm")
print(f"  Wettest year: {annual_precip.idxmax()} ({annual_precip.max():.1f} mm)")
print(f"  Driest year: {annual_precip.idxmin()} ({annual_precip.min():.1f} mm)")
print(f"  Trend: {precip_change_per_decade:+.1f} mm per decade")

## 8. Correlation Analysis

In [None]:
# Calculate correlation matrix
correlation = df[['temperature_celsius', 'precipitation_mm', 'humidity_percent', 
                  'pressure_hpa', 'wind_speed_kmh']].corr()

# Plot heatmap
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='RdBu_r', center=0,
           square=True, ax=ax, vmin=-1, vmax=1,
           cbar_kws={'label': 'Correlation Coefficient'})
ax.set_title('Weather Variable Correlations', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nKey Relationships:")
print(f"  Temperature - Humidity: r = {correlation.loc['temperature_celsius', 'humidity_percent']:.2f}")
print(f"  Temperature - Precipitation: r = {correlation.loc['temperature_celsius', 'precipitation_mm']:.2f}")
print(f"  Temperature - Wind Speed: r = {correlation.loc['temperature_celsius', 'wind_speed_kmh']:.2f}")
print("\nInterpretation:")
print("  Negative temp-humidity: Warmer air can hold more moisture, reducing relative humidity")
print("  Negative temp-wind: Calmer conditions in summer, windier in winter")

## 9. Extreme Events

In [None]:
# Identify extreme events
extreme_summary = pd.DataFrame({
    'Variable': ['Temperature', 'Precipitation', 'Humidity', 'Pressure', 'Wind Speed'],
    'Maximum': [
        f"{df['temperature_celsius'].max():.1f}°C ({df['temperature_celsius'].idxmax().strftime('%Y-%m')})",
        f"{df['precipitation_mm'].max():.1f} mm ({df['precipitation_mm'].idxmax().strftime('%Y-%m')})",
        f"{df['humidity_percent'].max():.0f}% ({df['humidity_percent'].idxmax().strftime('%Y-%m')})",
        f"{df['pressure_hpa'].max():.1f} hPa ({df['pressure_hpa'].idxmax().strftime('%Y-%m')})",
        f"{df['wind_speed_kmh'].max():.1f} km/h ({df['wind_speed_kmh'].idxmax().strftime('%Y-%m')})"
    ],
    'Minimum': [
        f"{df['temperature_celsius'].min():.1f}°C ({df['temperature_celsius'].idxmin().strftime('%Y-%m')})",
        f"{df['precipitation_mm'].min():.1f} mm ({df['precipitation_mm'].idxmin().strftime('%Y-%m')})",
        f"{df['humidity_percent'].min():.0f}% ({df['humidity_percent'].idxmin().strftime('%Y-%m')})",
        f"{df['pressure_hpa'].min():.1f} hPa ({df['pressure_hpa'].idxmin().strftime('%Y-%m')})",
        f"{df['wind_speed_kmh'].min():.1f} km/h ({df['wind_speed_kmh'].idxmin().strftime('%Y-%m')})"
    ]
})

print("Extreme Events (2015-2024):")
print("="*80)
print(extreme_summary.to_string(index=False))
print("="*80)

## 10. Summary Report

In [None]:
# Generate comprehensive summary
summary = {
    'Period': f"{df.index.min().strftime('%Y-%m')} to {df.index.max().strftime('%Y-%m')}",
    'Total Months': len(df),
    'Mean Temperature': f"{df['temperature_celsius'].mean():.2f}°C",
    'Temperature Range': f"{df['temperature_celsius'].min():.1f}°C to {df['temperature_celsius'].max():.1f}°C",
    'Warming Rate': f"{warming_rate_per_decade:.3f}°C/decade",
    'Total Warming': f"{(annual_temp.iloc[-1] - annual_temp.iloc[0]):.2f}°C",
    'Mean Annual Precip': f"{annual_precip.mean():.1f} mm",
    'Precip Trend': f"{precip_change_per_decade:+.1f} mm/decade",
    'Extreme Hot Months': len(hot_anomalies),
    'Extreme Cold Months': len(cold_anomalies)
}

print("="*80)
print("WEATHER PATTERN ANALYSIS SUMMARY")
print("="*80)
for key, value in summary.items():
    print(f"{key:.<30} {value}")
print("="*80)

print("\n✓ Analysis complete!")
print("\nKey Findings:")
print("  1. Clear seasonal temperature cycle (winter cold, summer hot)")
print("  2. Warming trend detected over the 10-year period")
print("  3. Precipitation shows inter-annual variability")
print("  4. Temperature and humidity negatively correlated")
print("  5. Multiple extreme temperature events identified")

## Key Concepts Learned

### Time Series Analysis
- **Seasonal Patterns**: Regular annual cycles in weather
- **Trends**: Long-term changes (warming/cooling)
- **Anomalies**: Deviations from normal conditions

### Climate Normals
- Typically 30-year averages (WMO standard)
- Provide baseline for comparison
- Help identify unusual conditions

### Decomposition
- **Observed = Trend + Seasonal + Residual**
- Separates signal components
- Reveals underlying patterns

### Statistical Significance
- p-values test if trends are real
- R² measures strength of relationship
- Confidence intervals quantify uncertainty

## Next Steps

### Extend the Analysis
- Add more stations (spatial analysis)
- Use daily data (higher resolution)
- Calculate derived variables (heat index, wind chill)
- Perform forecasting (ARIMA models)

### Real Climate Data
- **[NOAA Climate Data Online](https://www.ncdc.noaa.gov/cdo-web/)**: Historical weather
- **[ERA5 Reanalysis](https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5)**: Global gridded data
- **[GHCN](https://www.ncdc.noaa.gov/ghcn/)**: Global Historical Climatology Network

### Advanced Methods
- Extreme value analysis (return periods)
- Climate change attribution
- Downscaling techniques
- Machine learning for pattern recognition

## Resources

- **[IPCC Reports](https://www.ipcc.ch/)**: Climate science assessments
- **[NOAA Climate.gov](https://www.climate.gov/)**: Climate data and education
- **[Copernicus Climate Change Service](https://climate.copernicus.eu/)**: European climate data
- **Textbook**: *Statistical Methods in the Atmospheric Sciences* by Wilks