# Data Analysis Project: Climate Change and Economic Indicators

This notebook analyzes the relationship between CO2 emissions, temperature changes, and economic indicators using multiple datasets.

## Datasets Used:
1. CO2 per capita consumption (1751-2019)
2. Energy use per capita (World Bank)
3. GDP per capita growth (World Bank)
4. Climate data (CMIP6 models)
5. Natural disasters data (EM-DAT)

## 1. Data Import

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

In [None]:
co2_data = pd.read_csv('data/co2_pcap_cons.csv')
print("CO2 Data Shape:", co2_data.shape)
print("\nFirst few rows:")
print(co2_data.head())

In [None]:
energy_data = pd.read_excel('data/API_EG.USE.PCAP.KG.OE_DS2_en_excel_v2_20374.xls', skiprows=3)
print("Energy Data Shape:", energy_data.shape)
print("\nFirst few rows:")
print(energy_data.head())

In [None]:
gdp_data = pd.read_excel('data/API_NY.GDP.PCAP.KD.ZG_DS2_en_excel_v2_122434.xls', skiprows=3)
print("GDP Data Shape:", gdp_data.shape)
print("\nFirst few rows:")
print(gdp_data.head())

In [None]:
climate_data = pd.read_excel('data/cmip6-x0.25_timeseries_cdd65,hdd65,tas_timeseries_annual_1950-2014,2015-2100_median_historical_ensemble_all_mean.xlsx')
print("Climate Data Shape:", climate_data.shape)
print("\nFirst few rows:")
print(climate_data.head())

In [None]:
disaster_data = pd.read_excel('data/public_emdat_custom_request_2025-08-17_cf186ed7-74bb-4cc4-ae27-dcd164c54e48.xlsx')
print("Disaster Data Shape:", disaster_data.shape)
print("\nFirst few rows:")
print(disaster_data.head())

In [None]:
us_energy_data = pd.read_excel('data/us_energy.xls', skiprows=3)
print("US Energy Data Shape:", us_energy_data.shape)
print("\nFirst few rows:")
print(us_energy_data.head())

The line graph shows that the leading per-capita emitters have over time very different emission profiles. The most extreme profile is the wild volatility of oil-producing nations, which could have peaks of more than 50 metric tons per capita at some points in history. These peaks likely reflect the strange economics of oil-dominated economies in which small populations achieve unusually large volumes of oil production and thereby unusually high rates per capita. The chart shows how emission patterns are associated with economic development paths and endowments of resources rather than just industrial process.

## 2. Data Wrangling

In [None]:
def clean_co2_data(df):
    df_clean = df.copy()
    
    if 'country' not in df_clean.columns:
        if df_clean.index.name == 'country':
            df_clean = df_clean.reset_index()
        elif 'Country' in df_clean.columns:
            df_clean = df_clean.rename(columns={'Country': 'country'})
        else:
            df_clean.columns = ['country'] + list(df_clean.columns[1:])
    
    year_columns = [col for col in df_clean.columns if col != 'country' and str(col).isdigit()]
    
    df_clean = df_clean.melt(id_vars=['country'], value_vars=year_columns, 
                           var_name='Year', value_name='CO2_per_capita')
    
    df_clean['Year'] = pd.to_numeric(df_clean['Year'])
    df_clean['CO2_per_capita'] = pd.to_numeric(df_clean['CO2_per_capita'], errors='coerce')
    
    df_clean = df_clean.rename(columns={'country': 'Country'})
    
    return df_clean.dropna()

co2_clean = clean_co2_data(co2_data)
print("Cleaned CO2 Data Shape:", co2_clean.shape)
print("\nSample data:")
print(co2_clean.head())
print("\nCountries in dataset:", co2_clean['Country'].nunique())
print("Year range:", co2_clean['Year'].min(), "to", co2_clean['Year'].max())

In [None]:
def clean_worldbank_data(df, value_name):
    df_clean = df.copy()
    
    cols_to_drop = ['Country Code', 'Indicator Name', 'Indicator Code', 'Unnamed: 67']
    for col in cols_to_drop:
        if col in df_clean.columns:
            df_clean = df_clean.drop(col, axis=1)
    
    df_clean = df_clean.melt(id_vars=['Country Name'], var_name='Year', value_name=value_name)
    
    df_clean['Year'] = pd.to_numeric(df_clean['Year'], errors='coerce')
    df_clean[value_name] = pd.to_numeric(df_clean[value_name], errors='coerce')
    df_clean = df_clean.rename(columns={'Country Name': 'Country'})
    
    return df_clean.dropna()

energy_clean = clean_worldbank_data(energy_data, 'Energy_use_per_capita')
gdp_clean = clean_worldbank_data(gdp_data, 'GDP_growth')

print("Cleaned Energy Data Shape:", energy_clean.shape)
print("Cleaned GDP Data Shape:", gdp_clean.shape)

In [None]:
def clean_climate_data(df):
    df_clean = df.copy()
    
    year_cols = [col for col in df_clean.columns if '-' in str(col) and len(str(col)) > 4]
    
    if not year_cols:
        year_cols = [col for col in df_clean.columns if isinstance(col, (int, float)) or str(col).isdigit()]
    
    if year_cols:
        years = []
        temps = []
        
        for col in year_cols:
            try:
                if '-' in str(col):
                    year = int(str(col).split('-')[0])
                else:
                    year = int(col)
                
                temp_val = df_clean[col].iloc[0]
                if pd.notna(temp_val):
                    years.append(year)
                    temps.append(float(temp_val))
            except (ValueError, IndexError):
                continue
        
        result_df = pd.DataFrame({
            'Year': years,
            'Temperature': temps
        })
    else:
        print("Warning: Can't parse")
    
    return result_df.sort_values('Year')

climate_clean = clean_climate_data(climate_data)
print("Cleaned Climate Data Shape:", climate_clean.shape)
print("\nClimate data sample:")
print(climate_clean.head())

In [None]:
def clean_disaster_data(df):
    df_clean = df.copy()
    
    column_mapping = {
        'year': 'Year',
        'Year': 'Year',
        'country': 'Country',
        'Country': 'Country',
        'Country Name': 'Country'
    }
    
    for old_name, new_name in column_mapping.items():
        if old_name in df_clean.columns:
            df_clean = df_clean.rename(columns={old_name: new_name})
    
    if 'Year' in df_clean.columns:
        df_clean['Year'] = pd.to_numeric(df_clean['Year'], errors='coerce')
        df_clean = df_clean.dropna(subset=['Year'])
    
    return df_clean

disaster_clean = clean_disaster_data(disaster_data)
print("Cleaned Disaster Data Shape:", disaster_clean.shape)

In [None]:
def clean_us_energy_data(df):
    df_clean = df.copy()
    
    cols_to_drop = ['Country Code', 'Indicator Name', 'Indicator Code', 'Unnamed: 67']
    for col in cols_to_drop:
        if col in df_clean.columns:
            df_clean = df_clean.drop(col, axis=1)
    
    df_clean = df_clean.melt(id_vars=['Country Name'], var_name='Year', value_name='US_Energy_use_per_capita')
    
    df_clean['Year'] = pd.to_numeric(df_clean['Year'], errors='coerce')
    df_clean['US_Energy_use_per_capita'] = pd.to_numeric(df_clean['US_Energy_use_per_capita'], errors='coerce')
    df_clean = df_clean.rename(columns={'Country Name': 'Country'})
    
    return df_clean.dropna()

us_energy_clean = clean_us_energy_data(us_energy_data)
us_energy_filtered = us_energy_clean[us_energy_clean['Country'].str.contains('United States', case=False, na=False)]
if len(us_energy_filtered) == 0:
    us_energy_filtered = us_energy_clean[us_energy_clean['Country'].str.contains('USA', case=False, na=False)]

print(f"US Energy data points: {len(us_energy_filtered)}")

In [None]:
print("Merging datasets...")

merged_data = co2_clean.copy()
print(f"Starting with CO2 data: {merged_data.shape}")

merged_data = merged_data.merge(energy_clean, on=['Country', 'Year'], how='left')
print(f"After merging energy data: {merged_data.shape}")

merged_data = merged_data.merge(gdp_clean, on=['Country', 'Year'], how='left')
print(f"After merging GDP data: {merged_data.shape}")

merged_data = merged_data.merge(climate_clean, on='Year', how='left')
print(f"After merging climate data: {merged_data.shape}")

print("\nMerged data sample:")
print(merged_data.head())

print("\nData completeness:")
print(merged_data.isnull().sum())

In [None]:
top_emitters = merged_data.groupby('Country')['CO2_per_capita'].mean().sort_values(ascending=False).head(10)
print("Top 10 CO2 emitting countries (average per capita):")
for i, (country, emissions) in enumerate(top_emitters.items(), 1):
    print(f"{i:2d}. {country}: {emissions:.2f} metric tons per capita")

norway_names = ['Norway', 'NORWAY']
norway_country = None
for name in norway_names:
    if name in merged_data['Country'].values:
        norway_country = name
        break

if norway_country:
    print(f"\nNorway data found under name: '{norway_country}'")
    norway_data = merged_data[merged_data['Country'] == norway_country].copy()
    print(f"Norway data points: {len(norway_data)}")
    print(f"Norway year range: {norway_data['Year'].min()} to {norway_data['Year'].max()}")
else:
    print("\nWarning: Norway data not found. Available countries sample:")
    print(merged_data['Country'].unique()[:20])

The perspective with labels extended to nations shows the trajectory of leadership in emissions over time as different countries peaked at different times throughout history. The time pattern captures the sequential nature of global industrialization, whereby nations go through development stages at different times. The annotation system positively indicates current levels of emissions, such that the previously high-emitting nations have begun to decline while others continue to grow. This trend shows that emission trajectories are not rigid and can turn around as a result of policy adjustments and economic transformations.

## 3. Data Visualization

### 1) First line plot under the "adding color" subsection

In [None]:
plt.figure(figsize=(12, 8))

colors = plt.cm.Set3(np.linspace(0, 1, len(top_emitters)))

for i, country in enumerate(top_emitters.index[:5]):
    country_data = merged_data[merged_data['Country'] == country].sort_values('Year')
    if len(country_data) > 0:
        plt.plot(country_data['Year'], country_data['CO2_per_capita'], 
                label=country, linewidth=2.5, alpha=0.8, color=colors[i])

plt.title('CO2 Emissions per Capita Over Time - Top 5 Countries', fontsize=16, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('CO2 per Capita (metric tons)', fontsize=12)
plt.legend(fontsize=10, loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 2) Top 10 emitting country line plot with names attached to the lines at the end

In [None]:
plt.figure(figsize=(14, 10))
colors = plt.cm.tab10(np.linspace(0, 1, 10))

for i, country in enumerate(top_emitters.index):
    country_data = merged_data[merged_data['Country'] == country].sort_values('Year')
    if len(country_data) > 0:
        plt.plot(country_data['Year'], country_data['CO2_per_capita'], 
                color=colors[i], linewidth=2, alpha=0.8, label=country)
        
        if len(country_data) > 0:
            last_point = country_data.iloc[-1]
            plt.annotate(country, 
                        xy=(last_point['Year'], last_point['CO2_per_capita']), 
                        xytext=(5, 5), textcoords='offset points', 
                        fontsize=9, fontweight='bold',
                        bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))

plt.title('CO2 Emissions per Capita Over Time - Top 10 Countries', fontsize=16, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('CO2 per Capita (metric tons)', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=9)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 3) Tile plot of the top 10 countries

In [None]:
top_10_data = merged_data[merged_data['Country'].isin(top_emitters.index)]
pivot_data = top_10_data.pivot_table(values='CO2_per_capita', 
                                     index='Country', columns='Year', 
                                     aggfunc='mean')

plt.figure(figsize=(16, 10))
sns.heatmap(pivot_data, cmap='YlOrRd', cbar_kws={'label': 'CO2 per Capita (metric tons)'},
            linewidths=0.1, alpha=0.8)
plt.title('CO2 Emissions per Capita Heatmap - Top 10 Countries', fontsize=16, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('Country', fontsize=12)
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

### 4) 3x2 facet figure showing all of the world and then just the chosen country

In [None]:
chosen_country = norway_country if norway_country else 'Norway'

fig, axes = plt.subplots(3, 2, figsize=(16, 20))
fig.suptitle(f'Global vs {chosen_country} Climate Analysis', fontsize=20, fontweight='bold')

global_co2 = merged_data.groupby('Year')['CO2_per_capita'].mean()
if norway_country:
    country_co2 = merged_data[merged_data['Country'] == norway_country].groupby('Year')['CO2_per_capita'].mean()
else:
    proxy_country = top_emitters.index[0]
    country_co2 = merged_data[merged_data['Country'] == proxy_country].groupby('Year')['CO2_per_capita'].mean()
    chosen_country = proxy_country

axes[0,0].plot(global_co2.index, global_co2.values, 'b-', linewidth=3, label='Global Average')
axes[0,0].fill_between(global_co2.index, global_co2.values, alpha=0.3, color='blue')
axes[0,0].set_title('Global CO2 Emissions per Capita', fontsize=14, fontweight='bold')
axes[0,0].set_ylabel('CO2 per Capita (metric tons)', fontsize=11)
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

axes[0,1].plot(country_co2.index, country_co2.values, 'r-', linewidth=3, label=chosen_country)
axes[0,1].fill_between(country_co2.index, country_co2.values, alpha=0.3, color='red')
axes[0,1].set_title(f'{chosen_country} CO2 Emissions per Capita', fontsize=14, fontweight='bold')
axes[0,1].set_ylabel('CO2 per Capita (metric tons)', fontsize=11)
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)

global_energy = merged_data.groupby('Year')['Energy_use_per_capita'].mean()
if norway_country:
    country_energy = merged_data[merged_data['Country'] == norway_country].groupby('Year')['Energy_use_per_capita'].mean()
else:
    country_energy = merged_data[merged_data['Country'] == chosen_country].groupby('Year')['Energy_use_per_capita'].mean()

axes[1,0].plot(global_energy.index, global_energy.values, 'g-', linewidth=3, label='Global Average')
axes[1,0].fill_between(global_energy.index, global_energy.values, alpha=0.3, color='green')
axes[1,0].set_title('Global Energy Use per Capita', fontsize=14, fontweight='bold')
axes[1,0].set_ylabel('Energy Use per Capita (kg oil eq.)', fontsize=11)
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

if len(country_energy) > 0 and not country_energy.empty:
    axes[1,1].plot(country_energy.index, country_energy.values, 'orange', linewidth=3, label=chosen_country)
    axes[1,1].fill_between(country_energy.index, country_energy.values, alpha=0.3, color='orange')
    axes[1,1].set_ylim(bottom=0)
else:
    if len(country_co2) > 0:
        synthetic_energy = country_co2 * 200
        axes[1,1].plot(synthetic_energy.index, synthetic_energy.values, 'orange', linewidth=3, 
                      label=f'{chosen_country} (estimated)', linestyle='--')
        axes[1,1].fill_between(synthetic_energy.index, synthetic_energy.values, alpha=0.3, color='orange')
        axes[1,1].text(0.05, 0.95, 'Estimated from CO2 data', transform=axes[1,1].transAxes, 
                      bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7), fontsize=10)

axes[1,1].set_title(f'{chosen_country} Energy Use per Capita', fontsize=14, fontweight='bold')
axes[1,1].set_ylabel('Energy Use per Capita (kg oil eq.)', fontsize=11)
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

global_temp = merged_data.groupby('Year')['Temperature'].mean()
global_temp = global_temp.dropna()

if len(global_temp) > 10:
    axes[2,0].plot(global_temp.index, global_temp.values, 'purple', linewidth=3, label='Global Temperature')
    axes[2,0].fill_between(global_temp.index, global_temp.values, alpha=0.3, color='purple')
    
    baseline_temp = global_temp.iloc[:min(10, len(global_temp))].mean()
    temp_anomaly = global_temp - baseline_temp
    
    axes[2,1].plot(temp_anomaly.index, temp_anomaly.values, 'brown', linewidth=3, label='Temperature Anomaly')
    axes[2,1].fill_between(temp_anomaly.index, temp_anomaly.values, alpha=0.3, color='brown')
    axes[2,1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
else:
    years = range(1960, 2020)
    synthetic_temp = [14.0 + 0.01 * (year - 1960) + np.random.normal(0, 0.1) for year in years]
    axes[2,0].plot(years, synthetic_temp, 'purple', linewidth=3, label='Global Temperature (synthetic)')
    axes[2,0].fill_between(years, synthetic_temp, alpha=0.3, color='purple')
    
    baseline_temp = np.mean(synthetic_temp[:10])
    temp_anomaly = [temp - baseline_temp for temp in synthetic_temp]
    axes[2,1].plot(years, temp_anomaly, 'brown', linewidth=3, label='Temperature Anomaly (synthetic)')
    axes[2,1].fill_between(years, temp_anomaly, alpha=0.3, color='brown')
    axes[2,1].axhline(y=0, color='black', linestyle='--', alpha=0.5)

axes[2,0].set_title('Global Temperature Trend', fontsize=14, fontweight='bold')
axes[2,0].set_ylabel('Temperature (°C)', fontsize=11)
axes[2,0].set_xlabel('Year', fontsize=11)
axes[2,0].legend()
axes[2,0].grid(True, alpha=0.3)

axes[2,1].set_title('Global Temperature Anomaly', fontsize=14, fontweight='bold')
axes[2,1].set_ylabel('Temperature Anomaly (°C)', fontsize=11)
axes[2,1].set_xlabel('Year', fontsize=11)
axes[2,1].legend()
axes[2,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nAnalysis Summary for {chosen_country}:")
if len(country_co2) > 0:
    print(f"- Average CO2 emissions: {country_co2.mean():.2f} metric tons per capita")
    print(f"- CO2 trend: {((country_co2.iloc[-1] - country_co2.iloc[0]) / country_co2.iloc[0] * 100):.1f}% change over period")

if len(country_energy) > 0 and not country_energy.empty:
    print(f"- Average energy use: {country_energy.mean():.0f} kg oil equivalent per capita")
else:
    print("- Energy use data: Not available or incomplete")

if len(global_temp) > 0:
    if isinstance(temp_anomaly, pd.Series):
        print(f"- Global temperature increase: {temp_anomaly.iloc[-1]:.2f}°C above baseline")
    else:
        print(f"- Global temperature increase: {temp_anomaly[-1]:.2f}°C above baseline")
else:
    print("- Temperature data: Using synthetic data")

print(f"\nData availability check:")
print(f"- CO2 data points for {chosen_country}: {len(country_co2)}")
print(f"- Energy data points for {chosen_country}: {len(country_energy)}")
print(f"- Global temperature data points: {len(global_temp)}")

### 5) Scatter plots with trend lines

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

scatter_data = merged_data.dropna(subset=['CO2_per_capita', 'Energy_use_per_capita'])
if len(scatter_data) > 0:
    ax1.scatter(scatter_data['Energy_use_per_capita'], scatter_data['CO2_per_capita'], 
               alpha=0.6, s=30, color='steelblue')
    
    z = np.polyfit(scatter_data['Energy_use_per_capita'], scatter_data['CO2_per_capita'], 1)
    p = np.poly1d(z)
    ax1.plot(scatter_data['Energy_use_per_capita'], 
            p(scatter_data['Energy_use_per_capita']), "r--", alpha=0.8, linewidth=2)
    
    corr_energy_co2 = scatter_data['Energy_use_per_capita'].corr(scatter_data['CO2_per_capita'])
    ax1.text(0.05, 0.95, f'r = {corr_energy_co2:.3f}', transform=ax1.transAxes, 
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8), fontsize=12)

ax1.set_xlabel('Energy Use per Capita (kg oil equivalent)', fontsize=11)
ax1.set_ylabel('CO2 per Capita (metric tons)', fontsize=11)
ax1.set_title('CO2 vs Energy Use per Capita', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)

scatter_data2 = merged_data.dropna(subset=['CO2_per_capita', 'Temperature'])
if len(scatter_data2) > 0:
    ax2.scatter(scatter_data2['Temperature'], scatter_data2['CO2_per_capita'], 
               alpha=0.6, s=30, color='forestgreen')
    
    z2 = np.polyfit(scatter_data2['Temperature'], scatter_data2['CO2_per_capita'], 1)
    p2 = np.poly1d(z2)
    ax2.plot(scatter_data2['Temperature'], 
            p2(scatter_data2['Temperature']), "r--", alpha=0.8, linewidth=2)
    
    corr_temp_co2 = scatter_data2['Temperature'].corr(scatter_data2['CO2_per_capita'])
    ax2.text(0.05, 0.95, f'r = {corr_temp_co2:.3f}', transform=ax2.transAxes, 
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8), fontsize=12)

ax2.set_xlabel('Temperature (°C)', fontsize=11)
ax2.set_ylabel('CO2 per Capita (metric tons)', fontsize=11)
ax2.set_title('CO2 vs Temperature', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

The heatmap graph form easily shows the intensity of high emissions over the past few decades, with a disproportionate number of high emissions falling among oil-exporting nations. The spatial and temporal patterns of color intensity show that while some countries experienced relatively stable emissions over time, others experienced sheer peaks that register as dark red patches. These temporal and spatial intensities of high emissions carry strong implications for projecting historic responsibility and future reduction potential of different types of economies.

## 4. Data Analysis

### 1) Calculate Mean and SD for emissions and temperature for chosen country

In [None]:
if norway_country:
    country_analysis = merged_data[merged_data['Country'] == norway_country].dropna(subset=['CO2_per_capita', 'Temperature'])
else:
    country_analysis = merged_data[merged_data['Country'] == chosen_country].dropna(subset=['CO2_per_capita', 'Temperature'])

if len(country_analysis) > 0:
    co2_mean = country_analysis['CO2_per_capita'].mean()
    co2_std = country_analysis['CO2_per_capita'].std()
    co2_median = country_analysis['CO2_per_capita'].median()
    
    temp_mean = country_analysis['Temperature'].mean()
    temp_std = country_analysis['Temperature'].std()
    temp_median = country_analysis['Temperature'].median()

    print(f"\n=== STATISTICAL SUMMARY FOR {chosen_country.upper()} ===")
    print(f"\nCO2 Emissions per Capita:")
    print(f"  Mean: {co2_mean:.2f} metric tons")
    print(f"  Median: {co2_median:.2f} metric tons")
    print(f"  Standard Deviation: {co2_std:.2f} metric tons")
    print(f"  Range: {country_analysis['CO2_per_capita'].min():.2f} - {country_analysis['CO2_per_capita'].max():.2f} metric tons")
    
    print(f"\nTemperature:")
    print(f"  Mean: {temp_mean:.2f}°C")
    print(f"  Median: {temp_median:.2f}°C")
    print(f"  Standard Deviation: {temp_std:.2f}°C")
    print(f"  Range: {country_analysis['Temperature'].min():.2f} - {country_analysis['Temperature'].max():.2f}°C")
    
    print(f"\nData Coverage:")
    print(f"  Years analyzed: {country_analysis['Year'].min():.0f} - {country_analysis['Year'].max():.0f}")
    print(f"  Total data points: {len(country_analysis)}")
    
    global_co2_mean = merged_data['CO2_per_capita'].mean()
    global_temp_mean = merged_data['Temperature'].mean()
    
    print(f"\nGlobal Comparison:")
    print(f"  {chosen_country} CO2 vs Global: {((co2_mean - global_co2_mean) / global_co2_mean * 100):+.1f}% difference")
    print(f"  {chosen_country} Temperature vs Global: {(temp_mean - global_temp_mean):+.2f}°C difference")
    
else:
    print(f"Warning: Insufficient data for {chosen_country} analysis")

### 2) Calculate correlation coefficient for emissions and temperature

In [None]:
if len(country_analysis) > 5:
    correlation = country_analysis['CO2_per_capita'].corr(country_analysis['Temperature'])
    
    print(f"\n=== CORRELATION ANALYSIS FOR {chosen_country.upper()} ===")
    print(f"\nCorrelation coefficient between CO2 emissions and temperature: {correlation:.4f}")

    if abs(correlation) > 0.7:
        strength = "strong"
    elif abs(correlation) > 0.5:
        strength = "moderate to strong"
    elif abs(correlation) > 0.3:
        strength = "moderate"
    elif abs(correlation) > 0.1:
        strength = "weak"
    else:
        strength = "very weak"

    if correlation > 0:
        direction = "positive"
    else:
        direction = "negative"

    print(f"This indicates a {strength} {direction} correlation.")
    
    from scipy.stats import pearsonr
    corr_coef, p_value = pearsonr(country_analysis['CO2_per_capita'], country_analysis['Temperature'])
    
    print(f"\nStatistical significance:")
    print(f"  Pearson correlation coefficient: {corr_coef:.4f}")
    print(f"  P-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print(f"  Result: Statistically significant at α = 0.05")
    else:
        print(f"  Result: Not statistically significant at α = 0.05")
    
    print(f"\nAdditional Correlations:")
    
    if 'Energy_use_per_capita' in country_analysis.columns:
        co2_energy_corr = country_analysis['CO2_per_capita'].corr(country_analysis['Energy_use_per_capita'])
        print(f"  CO2 vs Energy Use: {co2_energy_corr:.4f}")
    
    co2_year_corr = country_analysis['CO2_per_capita'].corr(country_analysis['Year'])
    temp_year_corr = country_analysis['Temperature'].corr(country_analysis['Year'])
    
    print(f"  CO2 vs Year: {co2_year_corr:.4f}")
    print(f"  Temperature vs Year: {temp_year_corr:.4f}")
    
else:
    print(f"Warning: Insufficient data points for correlation analysis ({len(country_analysis)} points)")

### 3) Scaled scatter plot showing relationship between correlation and linear regression

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

if len(country_analysis) > 5:
    analysis_data = country_analysis[['CO2_per_capita', 'Temperature']].dropna()
    
    if len(analysis_data) > 5:
        scaler = StandardScaler()
        scaled_data = scaler.fit_transform(analysis_data)
        scaled_df = pd.DataFrame(scaled_data, columns=['CO2_scaled', 'Temperature_scaled'])
        
        X_orig = analysis_data['Temperature'].values.reshape(-1, 1)
        y_orig = analysis_data['CO2_per_capita'].values
        reg_orig = LinearRegression().fit(X_orig, y_orig)
        y_pred_orig = reg_orig.predict(X_orig)
        r2_orig = r2_score(y_orig, y_pred_orig)
        
        X_scaled = scaled_df['Temperature_scaled'].values.reshape(-1, 1)
        y_scaled = scaled_df['CO2_scaled'].values
        reg_scaled = LinearRegression().fit(X_scaled, y_scaled)
        y_pred_scaled = reg_scaled.predict(X_scaled)
        r2_scaled = r2_score(y_scaled, y_pred_scaled)
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
        
        ax1.scatter(analysis_data['Temperature'], analysis_data['CO2_per_capita'], 
                   alpha=0.7, s=60, color='darkblue', edgecolor='white', linewidth=0.5)
        ax1.plot(analysis_data['Temperature'], y_pred_orig, 'r-', linewidth=3, 
                label=f'Regression Line (R² = {r2_orig:.3f})')
        
        ax1.set_xlabel('Temperature (°C)', fontsize=12)
        ax1.set_ylabel('CO2 per Capita (metric tons)', fontsize=12)
        ax1.set_title(f'Original Scale: CO2 vs Temperature\n{chosen_country}', fontsize=14, fontweight='bold')
        ax1.legend(fontsize=11)
        ax1.grid(True, alpha=0.3)
        
        ax1.text(0.05, 0.95, f'Correlation: {correlation:.4f}\nSlope: {reg_orig.coef_[0]:.3f}', 
                transform=ax1.transAxes, 
                bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8), 
                fontsize=11, verticalalignment='top')
        
        ax2.scatter(scaled_df['Temperature_scaled'], scaled_df['CO2_scaled'], 
                   alpha=0.7, s=60, color='darkgreen', edgecolor='white', linewidth=0.5)
        ax2.plot(scaled_df['Temperature_scaled'], y_pred_scaled, 'r-', linewidth=3, 
                label=f'Regression Line (R² = {r2_scaled:.3f})')
        
        ax2.set_xlabel('Temperature (Standardized)', fontsize=12)
        ax2.set_ylabel('CO2 per Capita (Standardized)', fontsize=12)
        ax2.set_title(f'Standardized Scale: CO2 vs Temperature\n{chosen_country}', fontsize=14, fontweight='bold')
        ax2.legend(fontsize=11)
        ax2.grid(True, alpha=0.3)
        
        scaled_correlation = scaled_df['Temperature_scaled'].corr(scaled_df['CO2_scaled'])
        ax2.text(0.05, 0.95, f'Correlation: {scaled_correlation:.4f}\nSlope: {reg_scaled.coef_[0]:.3f}', 
                transform=ax2.transAxes, 
                bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8), 
                fontsize=11, verticalalignment='top')
        
        plt.tight_layout()
        plt.show()
        
        print(f"\n=== REGRESSION ANALYSIS SUMMARY ===")
        print(f"\nOriginal Scale:")
        print(f"  R-squared: {r2_orig:.4f}")
        print(f"  Slope: {reg_orig.coef_[0]:.4f} metric tons CO2 per °C")
        print(f"  Intercept: {reg_orig.intercept_:.4f} metric tons CO2")
        
        print(f"\nStandardized Scale:")
        print(f"  R-squared: {r2_scaled:.4f}")
        print(f"  Slope: {reg_scaled.coef_[0]:.4f} (standardized units)")
        print(f"  Intercept: {reg_scaled.intercept_:.4f} (standardized units)")
        
        print(f"\nInterpretation:")
        print(f"  - Correlation remains the same across scales: {correlation:.4f}")
        print(f"  - R-squared values are identical: {r2_orig:.4f}")
        print(f"  - Standardization preserves relationships while enabling comparison")
        
        if reg_orig.coef_[0] > 0:
            print(f"  - For every 1°C temperature increase, CO2 emissions increase by {reg_orig.coef_[0]:.3f} metric tons per capita")
        else:
            print(f"  - For every 1°C temperature increase, CO2 emissions decrease by {abs(reg_orig.coef_[0]):.3f} metric tons per capita")
    
    else:
        print("Warning: Insufficient data for regression analysis after removing missing values")
else:
    print("Warning: Insufficient data points for scaling and regression analysis")

The faceted comparison discovers that Norway far exceeds world averages for per-capita emissions, running a 7.28 metric tons average versus the global average. Norway emissions have a distinctive pattern of steadily rising from 1950 to the 1970s followed by more unstable but overall increasing trends. The energy consumption data for Norway indicate much higher consumption levels compared to global averages, and this correlates strongly with the high emissions. This trend is a reflection of Norway's energy-intensive economy, which features heavy oil and gas extraction in spite of the nation's leadership in renewable energy use for domestic purposes.

The scatter plots demonstrate a highly positive correlation (r = 0.827) between CO2 emissions and energy consumption, once again demonstrating the baseline relationship between energy consumption and carbon emissions. The lower correlation (r = 0.098) between temperature and CO2 at a global scale, however, shows that climate correlations are complex and driven by many forces beyond emissions alone. For Norway itself, the correlation between temperature and emissions is significantly higher (r = 0.606), which suggests that local emission patterns may be more immediately linked with changes in temperature, possibly due to heating requirements or seasonal economic enterprises.

Norway's regression analysis shows a moderate-strong positive relationship between CO2 emissions and temperature (R^2 = 0.367) and is statistically significant (p < 0.001). Analysis reveals that for every increase of 1 unit in temperature, Norway's CO2 emissions increase by approximately 0.97 metric tons per capita. Standardization indicates that this correlation holds regardless of the measurement scale, which gives credibility to the statistical finding. The very strong correlations with CO2 and with both energy consumption (r = 0.951) and year (r = 0.945) for Norway indicate that the country's emissions have tight relationships with both patterns of energy consumption and long-term over time.

## Summary and Conclusions

This detailed analysis of global CO2 emissions through the eyes of Norway reveals interesting trends with respect to how different countries contribute to climatic warming. Norway is a high per-capita emitter with emissions 357% above the world average despite being referred to as an environmental champion. The country finds strong statistical relationships between emissions and temperature, energy use, and trend over time, which suggests that its trend in emissions has intimate relationships with economic activities and climatic factors.

The comparative examination demonstrates that emission trends are radically divergent across countries and over time and are a product of diverse phases of economic development, availability of resources, and policy regimes. While global energy consumption-emission correlations are extraordinarily strong, emission-temperature correlations are complex and enormously uncertain at the country level. Norway's experience exemplifies the challenge the majority of developed nations are confronting in balancing economic growth with emission reductions, particularly for energy-resource-endowed economies.

These findings highlight the necessity of national climate policy, which is based on special national circumstances, but contributes to global emission reduction targets. The high coefficient results for Norway's data suggest that policy levers in energy efficiency and economic diversification can have large effects on the country's emission trajectory. The analysis shows that sound climate action requires international trend knowledge in addition to local circumstances in order to create correct and efficient policy measures.
