# Temperature Statistics for Europe

Deroived from climate projections

This dataset provides temperature exposure statistics for Europe, focusing on daily 2-meter air temperature (mean, minimum, and maximum) for entire years, as well as for winter (DJF) and summer (JJA) seasons. Derived using bias-adjusted EURO-CORDEX data, the statistics span a smoothed 30-year average from 1971 to 2100, resulting in a timeseries from 1986 to 2085, with model ensemble averages and associated standard deviations included. These temperature percentiles are crucial in public health and epidemiology for assessing health risks and impacts, enabling comparisons across regions under various climate change scenarios.

**Information on Dataset:**
* Source: [Temperature Statistics for Europe](https://cds.climate.copernicus.eu/datasets/sis-temperature-statistics?tab=overview)
* Author:
* Notebook Version: 1.0 (Updated: December 02. 2024)

## 1. Specifying the paths and working directories

In [1]:
import os

''' ---- Hier die Verzeichnisse angeben ---- '''
download_folder = r".\data\sis-temperature-statistics\download"
working_folder = r".\data\sis-temperature-statistics\working"
geotiff_folder = r".\data\sis-temperature-statistics\geotiff"
csv_folder = r".\data\sis-temperature-statistics\csv"
output_folder = r".\data\sis-temperature-statistics\output"
''' ----- Ende der Eingaben ---- '''

os.makedirs(download_folder, exist_ok=True)
os.makedirs(working_folder, exist_ok=True)
os.makedirs(geotiff_folder, exist_ok=True)
os.makedirs(csv_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

## 2. Download and Extract Dataset

### 2.1 Authentication

In [2]:
import cdsapi

def main():
    api_key = "fdae60fd-35d4-436f-825c-c63fedab94a4"
    api_url = "https://cds.climate.copernicus.eu/api"
    client = cdsapi.Client(url=api_url, key=api_key)
    return client

### 2.2 Request Definition and Download

In [3]:
# Define additional request fields to ensure the request stays within the file size limit.
# These coordinates were obtained using the BBox Extractor tool:
# https://str-ucture.github.io/bbox-extractor/

bbox_wgs84_deutschland = [56.0, 5.8, 47.2, 15.0]
bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]

# Alternatively, use a shapefile for precise geographic filtering
import geopandas as gpd
import math

# Example: Load shapefile of Konstanz (WGS84 projection)
de_shapefile = r"./shapefiles/de_boundary.shp"
de_gdf = gpd.read_file(de_shapefile)
de_bounds = de_gdf.total_bounds

# Adjust and buffer
de_bounds_adjusted = [(math.floor(de_bounds[0]* 10)/10)-0.1,
                      (math.floor(de_bounds[1]* 10)/10)-0.1,
                      (math.ceil(de_bounds[2]* 10)/10)+0.1,
                      (math.ceil(de_bounds[3]* 10)/10)+0.1]

bbox_de_bounds_adjusted = [de_bounds_adjusted[3], de_bounds_adjusted[0],
                           de_bounds_adjusted[1], de_bounds_adjusted[2]]

bbox_de_bounds_adjusted

[55.2, 5.7, 47.1, 15.2]

In [4]:
dataset = "sis-temperature-statistics"
request = {
    "variable": "average_temperature", # Options: "average/ minimum/ maximum" _temperature
    "period": "year",
    "statistic": [
        "time_average",
        "10th_percentile",
        "90th_percentile"
    ],
    "experiment": [
        "rcp4_5",
        "rcp8_5"
    ],
    "ensemble_statistic": [
        "ensemble_members_average",
        "ensemble_members_standard_deviation"
    ],
    "area": bbox_de_bounds_adjusted
}

In [5]:
# Uncomment and run this cell to download the dataset:

def main_retrieve():
    dataset_filename = f"{dataset}_{request['variable']}.zip"
    dataset_filepath = os.path.join(download_folder, dataset_filename)

    # Download the dataset only if the dataset has not been downloaded before
    if not os.path.isfile(dataset_filepath):
        # Download the dataset with the defined request parameters
        client.retrieve(dataset, request, dataset_filepath)
    else:
        print("Dataset already downloaded.")

if __name__ == "__main__":
    client = main()
    main_retrieve()

2024-12-17 09:50:28,888 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.


2024-12-17 09:50:28,891 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.


2024-12-17 09:50:28,892 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**




Dataset already downloaded.


### 2.3 Extract the Zip folder

In [6]:
import zipfile

dataset_filename = f"{dataset}_{request['variable']}.zip"
dataset_filepath = os.path.join(download_folder, dataset_filename)
extract_folder = os.path.join(working_folder, f"{request['variable']}")

# Extract the zip file
try:
    os.makedirs(extract_folder, exist_ok=True)
    
    if not os.listdir(extract_folder):
        with zipfile.ZipFile(dataset_filepath, 'r') as zip_ref:
            zip_ref.extractall(extract_folder)
            print(f"Successfully extracted files to: {extract_folder}")
    else:
        print("Folder is not empty. Skipping extraction.")
except FileNotFoundError:
    print(f"Error: The file {dataset_filepath} was not found.")
except zipfile.BadZipFile:
    print(f"Error: The file {dataset_filepath} is not a valid zip file.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Folder is not empty. Skipping extraction.


## 3. Read the netCDF file and print the metadata

In [7]:
import re
import pandas as pd

def meta(filename):
    match = re.search(r'^(mean|p10|p90)_(Tmean|Tmax|Tmin)_Yearly_(rcp\d{2})_(mean|stdev)_v(\d+\.\d+)\.', filename)
    if not match:
        raise ValueError("the given filename does not fit the expected naming scheme")
    
    var = match.group(1)
    return dict(
        filename=filename,
        path=os.path.join(extract_folder, filename),
        variable=match.group(1),   # Die Variable (mean, p10 oder p90)
        temperature_type=match.group(2),  # Die Temperaturart (Tmean, Tmax oder Tmin)
        rcp=match.group(3),         # RCP (z.B. rcp85)
        statistic=match.group(4),   # Statistik (mean oder stdev)
        version=match.group(5),      # Version
    )

# Beispielverzeichnis (angepasst an deine Umgebung)
nc_files = [meta(f) for f in os.listdir(extract_folder) if f.endswith('.nc')]
df_nc_files = pd.DataFrame.from_dict(nc_files)

# Modify pandas display options
pd.options.display.max_colwidth = 30

# Display the DataFrame
df_nc_files

Unnamed: 0,filename,path,variable,temperature_type,rcp,statistic,version
0,mean_Tmean_Yearly_rcp45_me...,.\data\sis-temperature-sta...,mean,Tmean,rcp45,mean,1.0
1,mean_Tmean_Yearly_rcp45_st...,.\data\sis-temperature-sta...,mean,Tmean,rcp45,stdev,1.0
2,mean_Tmean_Yearly_rcp85_me...,.\data\sis-temperature-sta...,mean,Tmean,rcp85,mean,1.0
3,mean_Tmean_Yearly_rcp85_st...,.\data\sis-temperature-sta...,mean,Tmean,rcp85,stdev,1.0
4,p10_Tmean_Yearly_rcp45_mea...,.\data\sis-temperature-sta...,p10,Tmean,rcp45,mean,1.0
5,p10_Tmean_Yearly_rcp45_std...,.\data\sis-temperature-sta...,p10,Tmean,rcp45,stdev,1.0
6,p10_Tmean_Yearly_rcp85_mea...,.\data\sis-temperature-sta...,p10,Tmean,rcp85,mean,1.0
7,p10_Tmean_Yearly_rcp85_std...,.\data\sis-temperature-sta...,p10,Tmean,rcp85,stdev,1.0
8,p90_Tmean_Yearly_rcp45_mea...,.\data\sis-temperature-sta...,p90,Tmean,rcp45,mean,1.0
9,p90_Tmean_Yearly_rcp45_std...,.\data\sis-temperature-sta...,p90,Tmean,rcp45,stdev,1.0


### 3.1 For variable = 'mean'

In [8]:
import netCDF4 as nc

# Open the NetCDF file in read mode
nc_dataset = nc.Dataset(df_nc_files['path'][0], mode='r')

# List all variables in the dataset
variables_list = nc_dataset.variables.keys()
print(f"Available variables: {list(variables_list)}")

Available variables: ['mean_Tmean_Yearly', 'height', 'lat', 'lon', 'time']


In [9]:
# Define variable name from available variables and read variable data
variable_name = 'mean_Tmean_Yearly'
variable_data = nc_dataset[variable_name]

# Generate summary of the primary variable
summary = {
    "Variable Name": variable_name,
    "Data Type": variable_data.dtype,
    "Shape": variable_data.shape,
    "Variable Info": f"{variable_name}({', '.join(variable_data.dimensions)})",
    "Units": getattr(variable_data, "units", "N/A"),
    "Long Name": getattr(variable_data, "long_name", "N/A"),
}

# Display dataset summary as a DataFrame for better visualization
nc_summary = pd.DataFrame(list(summary.items()), columns=['Description', 'Remarks'])

# Display the summary DataFrame
nc_summary

Unnamed: 0,Description,Remarks
0,Variable Name,mean_Tmean_Yearly
1,Data Type,float32
2,Shape,"(100, 82, 95)"
3,Variable Info,"mean_Tmean_Yearly(time, la..."
4,Units,degrees C
5,Long Name,Ensemble members average o...


In [10]:
nc_file = nc_files[0]

nc_dataset = nc.Dataset(nc_file['path'], 'r')

time_var = nc_dataset.variables['time']
time_units = time_var.units
time_calendar = getattr(time_var, "calendar", "standard")
cftime = nc.num2date(time_var[:], units=time_units, calendar=time_calendar)