### 🌍🌡️ **Durchschnittliche monatliche Temperaturänderung in Europa (2012–2025)**  
### 🌍🌡️ **Europe’s Average Monthly Temperature Change (2012–2025)**
-----------------------------
#### 📌 ***Überblick / Overview***:
- 📅 Datum / Date: 23. Mai 2025
- 🎯 Ziel: den Trend der durchschnittlichen Temperaturveränderung zu bestimmen
- 🎯 Goal: Determine average temperature change trendline 
- 📄 Datenquelle / Dataset source: Climate Data Store des ECMWF
- 🌍 Begrenzungsrahmen für Kontinentaleuropa: / Bounding Box for Continental EU:

| Himmelsrichtung/Direction      | Grad/Degrees     | Region                                     |
|--------------------------------|------------------|------------------------------------------- |
|   Norden/North                 | 60.0° N          | Southern edge of Norway and central Sweden |
|   Süden/South                  | 36.0° N          | Southern Greece                           |
|   Westen/West                  | -10.5° W         | Portugal                                  |
|   Osten/East                   | 28.0° E          | Eastern Baltics (stopping before Russia) |

In [None]:
# Code: Plot map pictures for general info
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# List of filenames
image_files = [
    'copernicus_map_eu_temp_0000h_multiband.png',
    'copernicus_map_eu_temp_0000h_singleband_linear.png',
    'copernicus_map_eu_temp_0000h_singleband_discrete.png'
]

# Create subplots
fig, axes = plt.subplots(1, 3, figsize=(18, 6))  # 1 row, 3 columns

for ax, filename in zip(axes, image_files):
    img = mpimg.imread(filename)
    # img = mpimg.imread(f'../input/copernicus-era5-europe-map-avg-temp-c/{filename}')
    ax.imshow(img)
    ax.set_title(filename, fontsize=10)
    ax.axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Code: Keggle import Umgebung aktivierung
# Aktivieren Sie diesen Teil, wenn die Datei online in Keggle verwendet wird
# from plotly.offline import init_notebook_mode
# init_notebook_mode(connected=True)

# # Suppress the Warning in Kaggle (or any notebook) for the Seaborn deprecated module
# import warnings
# warnings.filterwarnings("ignore", message="use_inf_as_na option is deprecated*")


### **Aufgabe 1**: Durchschnittstemperatur aus ECMWF-Daten mit ERA5 meteorologischen und Klimadaten ermitteln

📦 Dateityp:

GRIB (GRIdded Binary)  
Ein binäres Dateiformat zur Speicherung meteorologischer und klimatologischer Daten.  
Standardisiert von der Weltorganisation für Meteorologie (WMO).  

📁 Für ERA5 monatlich gemittelte Daten:

Die GRIB-Datei von Copernicus enthält typischerweise:  
    - 2m Temperatur (t2m)  
    - Monatlich gemittelte Werte  
    - Daten auf einem globalen oder regionalen Gitter (z. B. 0,25° × 0,25°)  
    - Dimensionen: Zeit, Breitengrad, Längengrad  

### **Task 1**: Obtain average temperature from ECMWF containing ERA5 meteorological and climate data

📦 File Type:

GRIB (GRIdded Binary)  
    A binary file format for storing meteorological and climate data.  
    Standardized by the World Meteorological Organization (WMO).  

📁 For ERA5 Monthly Averaged Data:

The GRIB file from Copernicus typically includes:  
    - 2m temperature (t2m)  
    - Monthly average values  
    - Data on a global or regional grid (e.g., 0.25° x 0.25°)  
    - Dimensions: time, latitude, longitude

In [None]:
# Code: Install cfgrib and its dependency eccodes   - Kaggle only
# Install cfgrib and eccodes quietly (no output shown)
# !pip install --quiet cfgrib > /dev/null 2>&1
# !apt-get update > /dev/null 2>&1
# !apt-get install -y libeccodes0 > /dev/null 2>&1

In [None]:
# Code: Load GRIB dataset using cfgrib engine & display stats
import cfgrib  # <- This registers the engine with xarray
import xarray as xr
import pandas as pd

# # Remote on Kaggle -  Load GRIB dataset using cfgrib engine
# ds = xr.open_dataset(
#     "/kaggle/input/copernicus-cds-climate-data-temp-eu-2011-2025/copernicus_cds_climate_data_avgmotemp_eu_2011-2025.grib",
#     engine="cfgrib",
#     backend_kwargs={"decode_timedelta": False}
# )

# Local version - Load GRIB dataset using cfgrib engine
file_name = "copernicus_cds_climate_data_avgmotemp_1400h_eu_2011-2025.grib"
file_name2 = "copernicus_cds_climate_data_avgmotemp_0000h_eu_2011-2025.grib"
ds = xr.open_dataset(file_name, engine="cfgrib")

# Print dataset structure
# print(ds)

# --- Summary Statistics ---
print("\n--- Dataset Summary ---")
print(f"Dimensions: {ds.dims}")  # dict: dimension name -> size
print(f"Coordinates: {list(ds.coords)}")
print(f"Data Variables: {list(ds.data_vars)}")

# Loop over variables to extract shape, element count, and basic statistics
for var in ds.data_vars:
    da = ds[var]
    print(f"\nVariable: {var}")
    print(f" - Shape: {da.shape}")
    print(f" - Total elements: {da.size}")
    print(f" - Mean: {da.mean().item():.3f}")
    print(f" - Min: {da.min().item():.3f}")
    print(f" - Max: {da.max().item():.3f}")

# Optional: Convert to DataFrame
df = ds.to_dataframe().reset_index()
print(f"\nConverted to DataFrame with {df.shape[0]} rows and {df.shape[1]} columns")
# display(df.head())


In [None]:
# Code: Assign geographical latitude bands and save Dataframe to CSV file
# Select a single variable if multiple exist
ds_var = ds["t2m"]

# Optionally reduce other dimensions
df = ds_var.to_dataframe().reset_index()

# Add geographical Latitude Bands
def get_lat_band(lat):
    if lat < 45.0:
        return 'Südlich (< 45°N)'
    elif lat <= 55.0:
        return 'Zentral (45–55°N)'
    else:
        return 'Nördlich (> 55°N)'

df['lat_band'] = df['latitude'].apply(get_lat_band)
display(df.head())
# print(df.columns.to_list())

# Save to CSV
df.to_csv("copernicus_cds_climate_data_avgmotemp_eu_2011-2025.csv", index=False)


In [None]:
# Code: Load from CSV into a new DataFrame
import pandas as pd

# Load from CSV into a new DataFrame
df = pd.read_csv("copernicus_cds_climate_data_avgmotemp_eu_2011-2025.csv")

# Optional: preview the first few rows, sorted descending
df_sorted = df.sort_values(by="time", ascending=False)
display(df.head(10))
display(df.tail(10))

### 📈 Beschreibende Statistische Zusammenfassung der t2m-Variable  
### 📈 Descriptive Statistical Summary of t2m Variable

In [None]:
# Code: Some statistic KPI

display(df['t2m'].describe())
print()
# display(df.info())

# Print min, max, mean, median for both Kelvin and Celsius
print("Min t2m:", round(df['t2m'].min(), 3), "K  / ", round((df['t2m'] - 273.15).min(), 3), "°C")
print("Max t2m:", round(df['t2m'].max(), 3), "K  / ", round((df['t2m'] - 273.15).max(), 3), "°C")
print("Mean t2m:", round(df['t2m'].mean(), 3), "K  / ", round((df['t2m'] - 273.15).mean(), 3), "°C")
print("Median t2m:", round(df['t2m'].median(), 3), "K  / ", round((df['t2m'] - 273.15).median(), 3), "°C")


### 📈 Histogramm der Temperaturverteilung  
### 📈 Histogram of the Temperature Distribution

In [None]:
# Code: Distribution of temperature values
import plotly.graph_objects as go

# Add a Celsius column (if not already created)
df['t2m_celsius'] = df['t2m'] - 273.15

# Compute quartiles
q1 = df['t2m_celsius'].quantile(0.25)
q2 = df['t2m_celsius'].quantile(0.50)
q3 = df['t2m_celsius'].quantile(0.75)

# Create figure
fig11 = go.Figure()

# Histogram
fig11.add_trace(go.Histogram(
    x=df['t2m_celsius'],
    nbinsx=50,
    name='Temperaturverteilung',
    marker_color='salmon',
    opacity=0.75
))

# Add vertical lines for quartiles with annotation including value
fig11.add_vline(
    x=q1,
    line_dash='dash',
    line_color='blue',
    annotation_text=f'Q1 (25%) = {q1:.2f}°C',
    annotation_position='bottom left'
)

fig11.add_vline(
    x=q2,
    line_dash='dash',
    line_color='green',
    annotation_text=f'Q2 (Median) = {q2:.2f}°C',
    annotation_position='top right'
)

fig11.add_vline(
    x=q3,
    line_dash='dash',
    line_color='red',
    annotation_text=f'Q3 (75%) = {q3:.2f}°C',
    annotation_position='bottom right'
)

## Quartile lines as scatter traces (so they show in legend)
# for value, color, name in [(q1, 'blue', 'Q1 (25%)'), (q2, 'green', 'Q2 (Median)'), (q3, 'red', 'Q3 (75%)')]:
#    fig11.add_trace(go.Scatter(
#        x=[value, value],
#        y=[0, df['t2m_celsius'].count() / 5],  # Estimate y max for line height
#        mode='lines',
#        line=dict(color=color, dash='dash'),
#        name=name,
#        showlegend=True
#    ))

# Layout
fig11.update_layout(
    title="Interaktive Histogramm der Temperaturverteilung mit Quartilen (°C)",
    xaxis_title="t2m (°C)",
    yaxis_title="Häufigkeit",
    template='plotly_white',
    bargap=0.05
)

# Show
fig11.show()

# # Export to HTML and display via iframe
# fig11.write_html("figure11.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure11.html", width=860, height=500)


### 🌍 Zeitliche Kartenvisualisierung der Temperaturveränderungen in Europa  
### 🌍 Temporal Map visualization of temperature variations across Europe

In [None]:
# Code: Plot a map with temperature points overlay and time slider
import pandas as pd
import plotly.express as px

# Add a Celsius column (if not already created)
df['t2m_celsius'] = df['t2m'] - 273.15

# Sample preparation: extract month and year
df['date'] = pd.to_datetime(df['time'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['Jahr-Monat'] = df['date'].dt.to_period('M').astype(str)

# Average temp per location and month
monthly_avg = df.groupby(['Jahr-Monat', 'latitude', 'longitude'])['t2m_celsius'].mean().round(2).reset_index()

# Plot using scatter_map
fig = px.scatter_map(
    monthly_avg,
    lat='latitude',
    lon='longitude',
    color='t2m_celsius',
    animation_frame='Jahr-Monat',
    color_continuous_scale='RdYlBu_r',
    range_color=(df['t2m_celsius'].min().round(2), df['t2m_celsius'].max().round(2)),
    size_max=8,
    zoom=3,
    center={"lat": 49.5, "lon": 8.75},
    title='Monatsdurchschnittliche Temperaturverteilung in Europa',
    labels={'t2m_celsius': 'Temperatur (°C)'}
)

fig.update_layout(
    mapbox_style='carto-positron',
    height=650,
    width=750,
    margin={"r": 5, "t": 40, "l": 10, "b": 10}
)

fig.show()


In [None]:
# Code: Plot a map with temperature points overlay and time slider
# Spare code for using on Kaggle
import pandas as pd
import plotly.express as px

# Add a Celsius column (if not already created)
df['t2m_celsius'] = df['t2m'] - 273.15

# Sample preparation: extract month and year
df['date'] = pd.to_datetime(df['time'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['Jahr-Monat'] = df['date'].dt.to_period('M').astype(str)

# Average temp per location and month
monthly_avg = df.groupby(['Jahr-Monat', 'latitude', 'longitude'])['t2m_celsius'].mean().round(2).reset_index()

# Plot using scatter_geo
fig12 = px.scatter_geo(
    monthly_avg,
    lat='latitude',
    lon='longitude',
    color='t2m_celsius',
    animation_frame='Jahr-Monat',
    color_continuous_scale='RdYlBu_r',
    range_color=(round(df['t2m_celsius'].min(), 2), round(df['t2m_celsius'].max(), 2)),
    projection='natural earth',
    title='- Monatsdurchschnittliche Temperaturverteilung in Europa',
    labels={'t2m_celsius': 'Temperatur (°C)'},
    height=700
)

# Zoom to Europe by setting geo bounds
fig12.update_geos(
    scope='world',     # 'europe'
    resolution=50,
    lataxis_range=[34, 65],
    lonaxis_range=[-15, 32],
    showcountries=True,
)
fig12.update_layout(
    mapbox_style='carto-positron',
    height=650,
    width=750,
    margin={"r": 5, "t": 40, "l": 10, "b": 10}
)

# Show
fig12.show()
# # Export to HTML and display via iframe
# fig12.write_html("figure12.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure12.html", width=770, height=670)


### **Aufgabe 2**: Durchschnittliche Monatstemperatur über alle Standorte hinweg nach Jahr und Monat gruppiert berechnen

✅ **Ziel 1**: Die monatliche durchschnittliche Temperaturänderung im Vergleich zum Vorjahr  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*(d. h. die Veränderung im Jahresvergleich für jeden Monat von 2011 bis 2025)*  
✅ **Ziel 2**: Die monatliche durchschnittliche Temperaturänderung im Vergleich zum Basisjahr  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*(d. h. die Veränderung gegenüber 2011 für jeden Monat von 2011 bis 2025)*


### **Task 2**: Calculate average monthly temperature across all locations grouped by year and month

✅ **Goal 1**: the monthly average temperature change from the previous year  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*(i.e., the year-on-year change for each month from 2011 to 2025)*  
✅ **Goal 2**: the monthly average temperature change from the baseline year  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*(i.e., the year vs. 2011 change for each month from 2011 to 2025)*


| Year | Month | Avg\_Temp\_ (°C)  | Baseline\_2011\_Avg\_Temp | Avg\_Temp\_Change\_Prev\_Year | Avg\_Temp\_Change\_From\_2011 |
| ---- | ----- | ----------------- | ------------------------- | ----------------------------- | ----------------------------- |
| 2012 | Jan   | +3.53             |   +2.00                   |     +0.32                     |    +1.53                      |
| 2012 | Feb   | +1.60             |   +1.00                   |     -0.15                     |    +0.60                      |
| ...  | ...   | ...               |   ...                     |     ...                       |    ...                        |
| 2024 | Dec   | +8.35             |   +6.80                   |     +0.42                     |    +1.55                      |


In [None]:
# Code: Step-by-step data conversion and average temp calculation
import pandas as pd

# Load the dataset
df = pd.read_csv("copernicus_cds_climate_data_avgmotemp_eu_2011-2025.csv")

# Convert time column to datetime (overwrite safely)
df['time'] = pd.to_datetime(df['time'], errors='coerce')  # coerce handles invalid values gracefully

# Drop rows with invalid datetime
df = df.dropna(subset=['time'])

# ✅ Convert temperature from Kelvin to Celsius
df['t2m_celsius'] = (df['t2m'] - 273.15).round(3)

# Extract year and month
df['Year'] = df['time'].dt.year
df['Month'] = df['time'].dt.month

# Group by Year and Month, average across all locations
monthly_avg = df.groupby(['Year', 'Month'])['t2m_celsius'].mean().reset_index()
monthly_avg.columns = ['Year', 'Month', 'Avg_Temp_C']

# Round average temp to 3 decimals
monthly_avg['Avg_Temp_C'] = monthly_avg['Avg_Temp_C'].round(3)

# Sort chronologically
monthly_avg = monthly_avg.sort_values(['Year', 'Month'])

# Calculate year-over-year change per calendar month
monthly_avg['Avg_Temp_Change_Prev_Year'] = monthly_avg['Avg_Temp_C'].diff(12).round(3)

# Get 2011 monthly averages as baseline
baseline_2011 = monthly_avg[monthly_avg['Year'] == 2011][['Month', 'Avg_Temp_C']]
baseline_2011 = baseline_2011.rename(columns={'Avg_Temp_C': 'Baseline_2011_Avg_Temp'})

# Merge baseline 2011 values back into main dataframe by month
monthly_avg = pd.merge(monthly_avg, baseline_2011, on='Month', how='left')

# Calculate change from same month of 2011
monthly_avg['Avg_Temp_Change_From_2011'] = (
    monthly_avg['Avg_Temp_C'] - monthly_avg['Baseline_2011_Avg_Temp']
).round(3)

# Keep only rows with a valid year-on-year difference
monthly_change = monthly_avg[monthly_avg['Year'] > 2011].reset_index(drop=True)

# Convert Year and Month to a single YYYY-MM string
monthly_change['YearMonth'] = pd.to_datetime(monthly_change[['Year', 'Month']].assign(DAY=1)).dt.strftime('%Y-%m')

# ✅ Add datetime version of YearMonth
monthly_change['Date'] = pd.to_datetime(monthly_change['YearMonth'], format='%Y-%m')

# Display and save final table
display(monthly_change[['Date', 'Year', 'YearMonth', 'Avg_Temp_C', 'Baseline_2011_Avg_Temp', 'Avg_Temp_Change_Prev_Year', 'Avg_Temp_Change_From_2011']])

monthly_change[['Date', 'Year', 'YearMonth', 'Avg_Temp_C', 'Baseline_2011_Avg_Temp', 'Avg_Temp_Change_Prev_Year', 'Avg_Temp_Change_From_2011']].to_csv("monthly_avg_temp_change_eu_2012_2025.csv", index=False)


### **Aufgabe 3**: Temperaturänderungstrend mit Linien- und Streudiagrammen darstellen  
### 📊 Plotly-Liniendiagramm: Gleitender Durchschnitt + Lineare Trendlinie + Zeitschieberegler

- X-Achse: Datum  
- Y-Achse: Durchschnittliche Temperaturveränderung  
🔍 Legende:  
-  📈 Blaue Linie: Temperaturveränderung seit 2011  
-  📈 Grüne Linie: Veränderung im Jahresvergleich  
-  📉 Orange gepunktete Linie: Gleitender 6-Monats-Durchschnitt (geglätteter Trend)  
-  📉 Rote gestrichelte Linie: Lineare Regressions-Trendlinie  
-  ⏳ Schieberegler: Einfaches Zoomen und Scrollen durch Zeiträume  

### **Task 3**: Visualize a temperature change trend using line and scatter charts  
### 📊 Plotly Line Chart: Rolling Average + Linear Trend + Time Slider

- X-axis: Date
- Y-axis: Average Temperature Change  
🔍 Legend:
-  📈 Blue line: Temperature change from 2011
-  📈 Green line: Year-over-year change
-  📉 Orange dotted line: 6-month rolling average (smoothed trend)
-  📉 Red dashed line: Linear regresion trendline
-  ⏳ Slider: Zoom and scroll through time periods easily

In [None]:
import pandas as pd
import plotly.graph_objects as go
import numpy as np

# Ensure 'Date' column exists and is datetime
if 'Date' not in monthly_change.columns:
    monthly_change['Date'] = pd.to_datetime(monthly_change['YearMonth'], format='%Y-%m')

# ✅ Calculate 6-month rolling average
monthly_change['Rolling_Avg_From_2011'] = monthly_change['Avg_Temp_Change_From_2011'].rolling(window=6, center=True).mean()

# ✅ Calculate linear trendline
x_numeric = np.arange(len(monthly_change))
y = monthly_change['Avg_Temp_Change_From_2011'].values
coeffs = np.polyfit(x_numeric, y, 1)
trendline = coeffs[0] * x_numeric + coeffs[1]

# Create the figure
fig0 = go.Figure()

# Line: Change from 2011 baseline
fig0.add_trace(go.Scatter(
    x=monthly_change['Date'],
    y=monthly_change['Avg_Temp_Change_From_2011'],
    mode='lines+markers',
    name = 'Veränderung gegenüber dem Basisjahr 2011',
    line=dict(color='blue'),
    visible='legendonly'  # ❗ Hidden by default
))

# Line: Year-over-year change
fig0.add_trace(go.Scatter(
    x=monthly_change['Date'],
    y=monthly_change['Avg_Temp_Change_Prev_Year'],
    mode='lines+markers',
    name = 'Veränderung zum Vorjahr',
    line=dict(color='green'),
    visible='legendonly'  # ❗ Hidden by default
))

# Line: 6-month rolling average
fig0.add_trace(go.Scatter(
    x=monthly_change['Date'],
    y=monthly_change['Rolling_Avg_From_2011'],
    mode='lines',
    name = '6-Monats-Gleitender Durchschnitt',
    line=dict(color='orange', dash='dot')
))

# Line: Linear trendline
fig0.add_trace(go.Scatter(
    x=monthly_change['Date'],
    y=trendline,
    mode='lines',
    name='Lineare Trendlinie',
    line=dict(color='red', dash='dash')
))

# Layout with time slider and selectors
fig0.update_layout(
    title = 'Monatliche durchschnittliche Temperaturveränderung (EU, 2012–2025)',
    xaxis_title='Datum',
    yaxis_title = 'Temperaturveränderung (°C)',
    template='plotly_white',
    hovermode='x unified',
    width=900,
    height=500,
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1, label="1y", step="year", stepmode="backward"),
                dict(count=3, label="3y", step="year", stepmode="backward"),
                dict(count=5, label="5y", step="year", stepmode="backward"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(visible=True),
        type="date"
    ),
     # ✅ Move legend underneath
    legend=dict(
        orientation='h',
        yanchor='top',
        y=-0.6,
        xanchor='center',
        x=0.5
    )
    
)

# Show the chart
fig0.show()
# # Export to HTML and display via iframe - use only in Kaggle notebook through web browser
# fig0.write_html("figure0.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure0.html", width=800, height=600)

### 📊 Plotly-Streudiagramm 1:

- X-Achse: Durchschnittliche Temperaturveränderung (im Jahresvergleich)  
- Y-Achse: Durchschnittliche Temperaturveränderung (seit 2011)  

🟢 Dies zeigt, ob stärkere Veränderungen im Jahresvergleich mit einem langfristigen Anstieg seit 2011 korrelieren.


### 📊 Plotly Scatter Chart 1:

- X-axis: Average Temperature Change (Year-on-Year)
- Y-axis: Average Temperature Change (From 2011)

🟢 This shows whether higher year-over-year changes correlate with a long-term increase since 2011.

In [None]:
import plotly.express as px

# Scatter plot: Year-over-year change vs change from 2011
fig = px.scatter(
    monthly_change,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Year',  # Optional: Color points by year
    hover_data=['YearMonth'],
    title='Streudiagramm: Temperaturveränderung im Jahresvergleich vs.<br>                         Veränderung gegenüber dem Basisjahr 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    width=800,   # Adjust width
    height=600   # Adjust height for a more balanced look
)

fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig.update_layout(template='plotly_white')
fig.show()
# # Export to HTML and display via iframe - use only in Kaggle notebook through web browser
# fig.write_html("figure.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure.html", width=800, height=600)

### 📊 Plotly-Streudiagramm 2 (nach 5-Jahres-Gruppen):

- X-Achse: Durchschnittliche Temperaturveränderung (im Jahresvergleich)  
- Y-Achse: Durchschnittliche Temperaturveränderung (seit 2011)  

🟢 Datenpunkte in 5-Jahres-Gruppen zusammengefasst


### 📊 Plotly Scatter Chart 2 (by 5-year group):

- X-axis: Average Temperature Change (Year-on-Year)
- Y-axis: Average Temperature Change (From 2011)

🟢 Data points groupped in 5-year bins

In [None]:
import plotly.express as px

# Create a new column grouping years into 5-year bins
def group_years(y):
    start = (y - 2011) // 5 * 5 + 2011
    end = start + 4
    return f"{start}-{end}"

monthly_change['Jahresgruppe'] = monthly_change['Year'].apply(group_years)

# Scatter plot: Year-over-year change vs. change from 2011 baseline
fig = px.scatter(
    monthly_change,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Jahresgruppe',  # Color by grouped 5-year period
    hover_data=['YearMonth'],
    title='Streudiagramm: Temperaturveränderung im Jahresvergleich vs.<br>                         Veränderung gegenüber dem Basisjahr 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    width=800,   # Adjust width
    height=600   # Adjust height for a more balanced look
)

fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig.update_layout(template='plotly_white')
fig.show()
# # Export to HTML and display via iframe - use only in Kaggle notebook through web browser
# fig.write_html("figure.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure.html", width=800, height=600)


### 📊 Plotly-Streudiagramm 2a (Farbkodiert nach Quartal):

- X-Achse: Durchschnittliche Temperaturveränderung (im Jahresvergleich)  
- Y-Achse: Durchschnittliche Temperaturveränderung (seit 2011)  

🟢 Datenpunkte in vierteljährlichen Intervallen gruppiert  

🎨 Legende:  
    🔵 Q1: Jan–Mär  
    🟢 Q2: Apr–Jun  
    🟠 Q3: Jul–Sep  
    🔴 Q4: Okt–Dez  


### 📊 Plotly Scatter Plot 2a (Color-Coded by Quarter):

- X-axis: Average Temperature Change (Year-over-Year)  
- Y-axis: Average Temperature Change (From 2011)  

🟢 Data points grouped in quarterly intervals  

🎨 Legend:  
    🔵 Q1: Jan–Mar  
    🟢 Q2: Apr–Jun  
    🟠 Q3: Jul–Sep  
    🔴 Q4: Oct–Dec  


In [None]:
import plotly.express as px
import pandas as pd

# Step 1: Group years into 5-year bins (optional)
def group_years(y):
    start = (y - 2011) // 5 * 5 + 2011
    end = start + 4
    return f"{start}-{end}"

monthly_change['Jahresgruppe'] = monthly_change['Year'].apply(group_years)

# Step 2: Extract month from YearMonth and assign quarter ("Quartal")
monthly_change['Month'] = pd.to_datetime(monthly_change['YearMonth']).dt.month

def assign_quartal(month):
    if month in [1, 2, 3]:
        return 'Q1 (Jan-Mär)'
    elif month in [4, 5, 6]:
        return 'Q2 (Apr-Jun)'
    elif month in [7, 8, 9]:
        return 'Q3 (Jul-Sep)'
    else:
        return 'Q4 (Okt-Dez)'

monthly_change['Quartal'] = monthly_change['Month'].apply(assign_quartal)

# Step 3: Plot the scatter chart with color = Quartal
fig = px.scatter(
    monthly_change,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Quartal',
    hover_data=['YearMonth'],
    title='Streudiagramm: Temperaturveränderung im Jahresvergleich vs.<br>Veränderung gegenüber dem Basisjahr 2011 (Farbkodiert nach Quartal)',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    width=800,
    height=600
)

# Final styling
fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig.update_layout(template='plotly_white')
fig.show()
# # Export to HTML and display via iframe - use only in Kaggle notebook through web browser
# fig.write_html("figure.html", include_plotlyjs='cdn')

# from IPython.display import IFrame
# IFrame("figure.html", width=800, height=600)


### 📊 Plotly-Streudiagramm 3: Clustering durch doppelte Kategorisierung

✅ Streudiagramm mit doppelter Kategorisierung:
- Farbe: Basierend auf Change_Category (Veränderung seit 2011)  
- Symbol- oder Markersymbol: Basierend auf Year_Group (5-Jahres-Zeiträume)  

🟦 Farbe: Gruppen wie 2011–2015, 2016–2020 usw.  
🔺 Symbolform: Veränderung ±0,5 °C, > ±1,5 °C usw.  

Dies ermöglicht einen differenzierteren visuellen Vergleich sowohl hinsichtlich der Veränderungsstärke als auch der zeitlichen Gruppierung.


### 📊 Plotly Scatter Chart 3: Clustering by Dual Categorization

✅ Dual categorization scatter plot:
- Color: Based on Change_Category (change from 2011)
- Symbol or marker shape: Based on Year_Group (5-year year span)

🟦 Color: groups like 2011–2015, 2016–2020, etc.

🔺 Shape: Change ±0.5 °C, > ±1.5 °C, etc.

This allows a richer visual comparison across both magnitude of change and time grouping.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Step 1–3: same preparation
clustering_data = monthly_change[['Year', 'YearMonth', 'Avg_Temp_Change_Prev_Year', 'Avg_Temp_Change_From_2011']].dropna().copy()

def group_years(y):
    start = (y - 2011) // 5 * 5 + 2011
    end = start + 4
    return f"{start}-{end}"

clustering_data['Jahresgruppe'] = clustering_data['Year'].apply(group_years)

def categorize_change(val):
    abs_val = abs(val)
    if abs_val <= 0.5:
        return 'Veränd. ±0.5 °C'
    elif abs_val <= 1.0:
        return 'Veränd. ±1.0 °C'
    elif abs_val <= 1.5:
        return 'Veränd. ±1.5 °C'
    else:
        return 'Veränd. > ±1.5 °C'

category_order = ['Veränd. ±0.5 °C', 'Veränd. ±1.0 °C', 'Veränd. ±1.5 °C', 'Veränd. > ±1.5 °C']
clustering_data['Veränd. Gruppe'] = pd.Categorical(
    clustering_data['Avg_Temp_Change_From_2011'].apply(categorize_change),
    categories=category_order,
    ordered=True
)

# Step 4: Use Plotly Express to create figure with correct color/symbol legend
fig_px = px.scatter(
    clustering_data,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Jahresgruppe',
    symbol='Veränd. Gruppe',
    hover_data=['YearMonth'],
    title='Streudiagramm: <br>Temperaturveränderung im Jahresvergleich vs. Veränderung seit 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    category_orders={'Veränd. Gruppe': category_order},
    width=800,
    height=600
)

# Step 5: Apply fixed ±3°C axis scale with 1:1 ratio
fig_px.update_layout(
    template='plotly_white',
    xaxis=dict(range=[-3, 3], constrain='domain'),
    yaxis=dict(range=[-3, 3], scaleanchor='x', scaleratio=1)
)

# Step 6: Add buttons to toggle temperature bins
# Create a mapping from temperature bin to trace indices
symbol_map = {cat: [] for cat in category_order}
for i, trace in enumerate(fig_px.data):
    name = trace.name.split(', ')[-1]
    if name in symbol_map:
        symbol_map[name].append(i)

# Build buttons
buttons = []
for cat in category_order:
    visibility = [i in symbol_map[cat] for i in range(len(fig_px.data))]
    buttons.append(dict(
        label=cat,
        method='update',
        args=[{'visible': visibility},
              {'title': f'Streudiagramm: {cat}'}]
    ))

# Add "All" button
buttons.insert(0, dict(
    label='Alle',
    method='update',
    args=[{'visible': [True] * len(fig_px.data)},
          {'title': 'Streudiagramm: <br>Temperaturveränderung im Jahresvergleich vs. Veränderung seit 2011'}]
))

# Add update menu (buttons)
fig_px.update_layout(
    updatemenus=[dict(
        type='buttons',
        direction='down',
        x=1.10,
        y=0.35,
        xanchor='left',
        yanchor='top',
        buttons=buttons,
        showactive=True
    )]
)

# Final style
fig_px.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig_px.show()

# # Step 7: Display in Kaggle (export to HTML and show via IFrame)
# fig_px.write_html("figure4_filtered.html", include_plotlyjs='cdn')
# from IPython.display import IFrame
# IFrame("figure4_filtered.html", width=850, height=650)

### 📊 Plotly-Streudiagramm 4: Clustering

✅ Plan: Verwendung von KMeans

Clustering basierend auf:
- X-Achse: Durchschnittliche Temperaturveränderung (im Jahresvergleich)  
- Y-Achse: Durchschnittliche Temperaturveränderung (seit 2011)  

📌 KMeans ist ein Clustering-Algorithmus, der Datenpunkte in Gruppen einteilt, indem er ähnliche Werte anhand ihrer Lage im Koordinatensystem zusammenfasst.


### 📊 Plotly Scatter Chart 4: Clustering

✅ Plan: Use KMeans

Cluster based on:
- X-axis: Average Temperature Change (Year-on-Year)
- Y-axis: Average Temperature Change (From 2011)

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Step 1: Prepare the data
clustering_data = monthly_change[['Avg_Temp_Change_Prev_Year', 'Avg_Temp_Change_From_2011']].dropna()

# Standardize features
scaler = StandardScaler()
scaled_data = scaler.fit_transform(clustering_data)

# Step 2: Apply KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42, n_init='auto')
clusters = kmeans.fit_predict(scaled_data)

# Step 3: Add cluster labels and date info
clustering_data = clustering_data.copy()
clustering_data['Cluster'] = clusters
clustering_data['YearMonth'] = monthly_change.loc[clustering_data.index, 'YearMonth']

# Step 4: Build scatter plot with clusters
fig = px.scatter(
    clustering_data,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Cluster',
    hover_data=['YearMonth'],
    title='Geklastertes Streudiagramm: <br>Temperaturveränderung im Jahresvergleich vs. Veränderung seit 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    width=800,   # Adjust width
    height=600   # Adjust height for a more balanced look
)

# Step 5: Add cluster center annotations
# Inverse transform cluster centers to original scale
centers = scaler.inverse_transform(kmeans.cluster_centers_)

# Add each cluster center as a marker and label
for i, (x, y) in enumerate(centers):
    fig.add_trace(go.Scatter(
        x=[x], y=[y],
        mode='markers+text',
        marker=dict(size=12, color='black', symbol='x'),
        text=[f'Center {i}'],
        textposition='top center',
        showlegend=False
    ))

# Final layout
fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig.update_layout(template='plotly_white')

# Show chart
fig.show()


### 📊 Plotly Scatter Chart 5 + Linear Regression Line per Year Group:

### What This Does:

* Scatter points are grouped and colored by `Year_Group`.
* For each group, a **dashed linear trendline** is added.

✅ What is R² (R-squared)?

R², or the coefficient of determination, is a statistical measure that shows how well a linear model explains the variability of the data it's trying to predict.

📐 In Simple Terms:

R² answers the question:

    "How much of the variation in Y can be explained by X using a linear model?"

🧠 How to Interpret R²:

| R² Value | Interpretation                                 |
| -------- | ---------------------------------------------- |
| 1.0      | Perfect fit — all points lie on the line       |
| 0.8      | 80% of the variation is explained by the model |
| 0.5      | 50% explained — moderate fit                   |
| 0.0      | Model explains none of the variability         |
| < 0      | Worse than a flat line (usually overfitting)   |

✅ Summary:

 -   High R² = strong linear relationship (model explains the trend well).
 -   Low R² = weak or no linear relationship.
 -   Used to evaluate the quality of regression lines — just like the trendlines in your chart.

⚠️ **Wichtig:** Dieses Diagramm dient nur zur Veranschaulichung. Es besteht kein tatsächlicher Zusammenhang zwischen den beiden Temperaturveränderungswerten, was den niedrigen R²-Koeffizienten erklärt.

⚠️ **Important:** This chart is only an illustration. There is no actual relation between the two temperature change values, hence the low R² coefficient.



In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

# Step 1: Group years into 5-year periods
def group_years(y):
    start = (y - 2011) // 5 * 5 + 2011
    end = start + 4
    return f"{start}-{end}"

monthly_change['Jahresgruppe'] = monthly_change['Year'].apply(group_years)

# Step 2: Base scatter plot
fig = px.scatter(
    monthly_change,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Jahresgruppe',
    hover_data=['YearMonth'],
    title = 'Streudiagramm mit Trendlinien und R²: <br>Temperaturveränderung im Jahresvergleich vs. Veränderung seit 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    width=800,
    height=700
)

# Step 3: Compute and overlay trendlines with R²
groups = monthly_change['Jahresgruppe'].unique()

for group in groups:
    subset = monthly_change[monthly_change['Jahresgruppe'] == group].dropna()

    if len(subset) >= 2:
        x = subset['Avg_Temp_Change_Prev_Year'].values
        y = subset['Avg_Temp_Change_From_2011'].values

        # Fit linear model
        coeffs = np.polyfit(x, y, 1)
        trend_y = coeffs[0] * x + coeffs[1]

        # Compute R²
        ss_res = np.sum((y - trend_y) ** 2)
        ss_tot = np.sum((y - np.mean(y)) ** 2)
        r2 = 1 - (ss_res / ss_tot)

        # Generate x/y for smooth line
        trend_x = np.linspace(x.min(), x.max(), 100)
        trend_y_line = coeffs[0] * trend_x + coeffs[1]

        # Add trendline with R² in name
        fig.add_trace(go.Scatter(
            x=trend_x,
            y=trend_y_line,
            mode='lines',
            name=f'Trendlinie {group} (R² = {r2:.2f})',
            line=dict(dash='dash'),
            showlegend=True
        ))

# Step 4: Apply fixed ±1°C scale and enforce equal aspect ratio
fig.update_layout(
    template='plotly_white',
    xaxis=dict(range=[-3, 3], constrain='domain'),
    yaxis=dict(range=[-3, 3], scaleanchor='x', scaleratio=1),
)

# Step 5: Final styling
fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))
fig.update_layout(template='plotly_white')
fig.show()


### 📊 Plotly-Streudiagramm 6: Clustering nach Temperaturveränderung seit 2011

✅ Cluster basierend auf der durchschnittlichen Temperaturveränderung im Vergleich zum Basisjahr 2011:  

 -   Veränderung ±0,5 °C  
 -   Veränderung ±1,0 °C  
 -   Veränderung ±1,5 °C  
 -   Veränderung > ±1,5 °C  

+ X-Achse: Durchschnittliche Temperaturveränderung (im Jahresvergleich)  
+ Y-Achse: Durchschnittliche Temperaturveränderung (seit 2011)  


### 📊 Plotly Scatter Chart 6: Clustering by Temperature change from 2011

✅ Cluster based on avg. temperature change compared to 2011 baseline:  

 -   Change ±0.5 °C
 -   Change ±1.0 °C
 -   Change ±1.5 °C
 -   Change > ±1.5 °C

+ X-axis: Average Temperature Change (Year-on-Year)
+ Y-axis: Average Temperature Change (From 2011)

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Step 1: Prepare the data (drop rows with NaNs in required fields)
clustering_data = monthly_change[['Avg_Temp_Change_Prev_Year', 'Avg_Temp_Change_From_2011']].dropna().copy()
clustering_data['YearMonth'] = monthly_change.loc[clustering_data.index, 'YearMonth']

# Step 2: Define temperature change groups
def categorize_change(val):
    abs_val = abs(val)
    if abs_val <= 0.5:
        return 'Veränd. ±0.5 °C'
    elif abs_val <= 1.0:
        return 'Veränd. ±1.0 °C'
    elif abs_val <= 1.5:
        return 'Veränd. ±1.5 °C'
    else:
        return 'Veränd. > ±1.5 °C'

# Apply category logic
category_order = [
    'Veränd. ±0.5 °C',
    'Veränd. ±1.0 °C',
    'Veränd. ±1.5 °C',
    'Veränd. > ±1.5 °C'
]
clustering_data['Veränd. Gruppe'] = pd.Categorical(
    clustering_data['Avg_Temp_Change_From_2011'].apply(categorize_change),
    categories=category_order,
    ordered=True
)


# Step 3: Plot with Plotly Express using custom categories
fig = px.scatter(
    clustering_data,
    x='Avg_Temp_Change_Prev_Year',
    y='Avg_Temp_Change_From_2011',
    color='Veränd. Gruppe',
    hover_data=['YearMonth'],
    title = 'Kategorisiertes Streudiagramm:  <br>Temperaturveränderung im Vergleich zum Basisjahr 2011',
    labels={
        'Avg_Temp_Change_Prev_Year': 'Veränderung zum Vorjahr (°C)',
        'Avg_Temp_Change_From_2011': 'Veränderung seit 2011 (°C)'
    },
    category_orders={'Veränd. Gruppe': category_order},
    width=700,
    height=800
)

# Step 4: Apply fixed ±1°C scale and enforce equal aspect ratio
fig.update_layout(
    template='plotly_white',
    xaxis=dict(range=[-3, 3], constrain='domain'),
    yaxis=dict(range=[-3, 3], scaleanchor='x', scaleratio=1),
)

# Step 5: Optional styling
fig.update_traces(marker=dict(size=8, line=dict(width=1, color='DarkSlateGrey')))

# Show chart
fig.show()


### 📊 Plotly-Liniendiagramm 7:  
✅ Visualisierungscode mit Trendlinie (Matplotlib + Seaborn)


### 📊 Plotly Line Chart 7: 
✅ Visualization Code with Trendline (Matplotlib + Seaborn)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset if needed
# monthly_change = pd.read_csv("monthly_avg_temp_change_2011_2024.csv")

# Ensure 'Date' column exists and is datetime
if 'Date' not in monthly_change.columns:
    monthly_change['Date'] = pd.to_datetime(monthly_change['YearMonth'], format='%Y-%m')

# Set visual theme
sns.set_theme(style="whitegrid")

# Create figure
plt.figure(figsize=(14, 6))

# Plot the actual line
sns.lineplot(
    data=monthly_change,
    x='Date',
    y='Avg_Temp_Change_From_2011',
    label='Monthly Temp Change from 2011',
    marker='o'
)

# Add trendline using seaborn with datetime-aware regression
# Create a temporary numeric column based on matplotlib date numbers
from matplotlib.dates import date2num
monthly_change['Date_Num'] = date2num(monthly_change['Date'])

# Plot trendline
sns.regplot(
    data=monthly_change,
    x='Date_Num',
    y='Avg_Temp_Change_From_2011',
    scatter=False,
    label='Trendline',
    color='red',
    line_kws={'linestyle': 'dashed'}
)

# Set the x-ticks back to actual datetime
plt.xticks(rotation=45, ha='right')
plt.xlabel('Year-Month')
plt.ylabel('Temperature Change (°C)')
plt.title('Monthly Avg Temp Change vs. 2011 Baseline (EU, 2012–2025)')
plt.legend()
plt.tight_layout()

# Show the plot
plt.show()
