# Temperature Statistics for Europe

Deroived from climate projections

This dataset provides temperature exposure statistics for Europe, focusing on daily 2-meter air temperature (mean, minimum, and maximum) for entire years, as well as for winter (DJF) and summer (JJA) seasons. Derived using bias-adjusted EURO-CORDEX data, the statistics span a smoothed 30-year average from 1971 to 2100, resulting in a timeseries from 1986 to 2085, with model ensemble averages and associated standard deviations included. These temperature percentiles are crucial in public health and epidemiology for assessing health risks and impacts, enabling comparisons across regions under various climate change scenarios.

**Information on Dataset:**
* Source: [Temperature Statistics for Europe](https://cds.climate.copernicus.eu/datasets/sis-temperature-statistics?tab=overview)
* Author: T. Tewes (Stadt Konstanz) 
* Resolution: 0.1° x 0.1°
* Notebook Version: 1.1 (Updated: December 17. 2024)

## 1. Specifying the paths and working directories

In [1]:
import os

''' ---- Hier die Verzeichnisse angeben ---- '''
download_folder = r".\data\sis-temperature-statistics\download"
working_folder = r".\data\sis-temperature-statistics\working"
geotiff_folder = r".\data\sis-temperature-statistics\geotiff"
csv_folder = r".\data\sis-temperature-statistics\csv"
output_folder = r".\data\sis-temperature-statistics\output"
''' ----- Ende der Eingaben ---- '''

os.makedirs(download_folder, exist_ok=True)
os.makedirs(working_folder, exist_ok=True)
os.makedirs(geotiff_folder, exist_ok=True)
os.makedirs(csv_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

## 2. Download and Extract Dataset

### 2.1 Authentication

In [2]:
import cdsapi

def main():
    api_key = "fdae60fd-35d4-436f-825c-c63fedab94a4"
    api_url = "https://cds.climate.copernicus.eu/api"
    client = cdsapi.Client(url=api_url, key=api_key)
    return client

### 2.2 Request Definition and Download

In [3]:
# Define additional request fields to ensure the request stays within the file size limit.
# These coordinates were obtained using the BBox Extractor tool:
# https://str-ucture.github.io/bbox-extractor/

bbox_wgs84_deutschland = [56.0, 5.8, 47.2, 15.0]
bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]

# Alternatively, use a shapefile for precise geographic filtering
import geopandas as gpd
import math

# Example: Load shapefile of Konstanz (WGS84 projection)
de_shapefile = r"./shapefiles/de_boundary.shp"
de_gdf = gpd.read_file(de_shapefile)
de_bounds = de_gdf.total_bounds

# Adjust and buffer
de_bounds_adjusted = [(math.floor(de_bounds[0]* 10)/10)-0.1,
                      (math.floor(de_bounds[1]* 10)/10)-0.1,
                      (math.ceil(de_bounds[2]* 10)/10)+0.1,
                      (math.ceil(de_bounds[3]* 10)/10)+0.1]

bbox_de_bounds_adjusted = [de_bounds_adjusted[3], de_bounds_adjusted[0],
                           de_bounds_adjusted[1], de_bounds_adjusted[2]]

In [4]:
# Define the available options for periods and temperature variables
period_list = ["year", "summer", "winter"]
variable_list = ["average_temperature", "minimum_temperature", "maximum_temperature"]

# Select period and variable for processing; modify these as needed
selected_period = period_list[0]  # Default: "year"
selected_variable = variable_list[0]  # Default: "average_temperature"

# Display the selected variable and period
print(f"Selected Variable: {selected_variable}\nSelected Period: {selected_period}")

# Define statistics based on the selected variable
# For "average_temperature", include additional statistics; otherwise, use only "time_average"
statistic = (
    ['time_average', '10th_percentile', '90th_percentile']
    if selected_variable == "average_temperature"
    else ['time_average']
)

Selected Variable: average_temperature
Selected Period: year


In [5]:
dataset = "sis-temperature-statistics"
request = {
    "variable": selected_variable,
    "period": selected_period,
    "statistic": statistic,
    "experiment": [
        "rcp4_5",
        "rcp8_5"
    ],
    "ensemble_statistic": [
        "ensemble_members_average",
        "ensemble_members_standard_deviation"
    ],
    "area": bbox_de_bounds_adjusted
}

In [6]:
# Run this cell to download the dataset:

def main_retrieve():
    dataset_filename = f"{dataset}_{selected_period}_{selected_variable}.zip"
    dataset_filepath = os.path.join(download_folder, dataset_filename)

    # Download the dataset only if the dataset has not been downloaded before
    if not os.path.isfile(dataset_filepath):
        # Download the dataset with the defined request parameters
        client.retrieve(dataset, request, dataset_filepath)
    else:
        print("Dataset already downloaded.")

if __name__ == "__main__":
    client = main()
    main_retrieve()

2024-12-17 23:05:27,685 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.


2024-12-17 23:05:27,685 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.


2024-12-17 23:05:27,685 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**




Dataset already downloaded.


### 2.3 Extract the Zip folder

In [7]:
import zipfile

extract_folder = os.path.join(working_folder, selected_period)
os.makedirs(extract_folder, exist_ok=True)

# Extract the zip file
try:
    os.makedirs(extract_folder, exist_ok=True)
    
    if not os.listdir(extract_folder):
        for variable in variable_list:
            dataset_filename = f"{dataset}_{selected_period}_{variable}.zip"
            dataset_filepath = os.path.join(download_folder, dataset_filename)
            
            with zipfile.ZipFile(dataset_filepath, 'r') as zip_ref:
                zip_ref.extractall(extract_folder)
                print(f"Successfully extracted files to: {extract_folder}")
    else:
        print("Folder is not empty. Skipping extraction.")
except FileNotFoundError:
    print(f"Error: The file {dataset_filepath} was not found.")
except zipfile.BadZipFile:
    print(f"Error: The file {dataset_filepath} is not a valid zip file.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Folder is not empty. Skipping extraction.


## 3. Read the netCDF file and print the metadata

In [8]:
# Modify the selected period here
selected_period = period_list[0] # 0: year, 1: summer, 2: winter
print(selected_period)

extract_folder = os.path.join(working_folder, selected_period)
os.makedirs(extract_folder, exist_ok=True)

year


In [9]:
import re
import pandas as pd

def meta(filename):
    match = re.search(r'^(mean|p10|p90)_(Tmean|Tmax|Tmin)_(Yearly|Winter|Summer)_(rcp\d{2})_(mean|stdev)_v(\d+\.\d+)\.', filename)
    if not match:
        raise ValueError("the given filename does not fit the expected naming scheme")
    
    variable_name=f"{match.group(1)}_{match.group(2)}_{match.group(3)}"
    return dict(
        filename=filename,
        path=os.path.join(extract_folder, filename),
        ds_period=match.group(3),
        ds_variable=match.group(2),
        ds_statistic=match.group(1),
        variable_name=variable_name,
        rcp=match.group(4),
        rcp_statistic=match.group(5),
    )

# Beispielverzeichnis (angepasst an deine Umgebung)
nc_files = [meta(f) for f in os.listdir(extract_folder) if f.endswith('.nc')]
nc_files_sorted = sorted(nc_files, key=lambda x: (x['ds_variable'], x['ds_statistic']))

df_nc_files = pd.DataFrame.from_dict(nc_files_sorted)

# Modify pandas display options
pd.options.display.max_colwidth = 30

# Display the DataFrame without displaying path
df_nc_files.loc[:, df_nc_files.columns != 'path']

Unnamed: 0,filename,ds_period,ds_variable,ds_statistic,variable_name,rcp,rcp_statistic
0,mean_Tmax_Yearly_rcp45_mea...,Yearly,Tmax,mean,mean_Tmax_Yearly,rcp45,mean
1,mean_Tmax_Yearly_rcp45_std...,Yearly,Tmax,mean,mean_Tmax_Yearly,rcp45,stdev
2,mean_Tmax_Yearly_rcp85_mea...,Yearly,Tmax,mean,mean_Tmax_Yearly,rcp85,mean
3,mean_Tmax_Yearly_rcp85_std...,Yearly,Tmax,mean,mean_Tmax_Yearly,rcp85,stdev
4,mean_Tmean_Yearly_rcp45_me...,Yearly,Tmean,mean,mean_Tmean_Yearly,rcp45,mean
5,mean_Tmean_Yearly_rcp45_st...,Yearly,Tmean,mean,mean_Tmean_Yearly,rcp45,stdev
6,mean_Tmean_Yearly_rcp85_me...,Yearly,Tmean,mean,mean_Tmean_Yearly,rcp85,mean
7,mean_Tmean_Yearly_rcp85_st...,Yearly,Tmean,mean,mean_Tmean_Yearly,rcp85,stdev
8,p10_Tmean_Yearly_rcp45_mea...,Yearly,Tmean,p10,p10_Tmean_Yearly,rcp45,mean
9,p10_Tmean_Yearly_rcp45_std...,Yearly,Tmean,p10,p10_Tmean_Yearly,rcp45,stdev


### 3.1 Short info on downloaded data structure

The diagram shows the structure of **sis-temperature-statistics** data. It is organized into four levels:

1. **Period**: Represents the time unit (e.g., year).

2. **Variable**: Three variables:
    - **Average Temperature**  
    - **Minimum Temperature**  
    - **Maximum Temperature**

3. **Statistic**: For each variable, specific statistics are calculated:
    - **Time Average**  
    - **10th Percentile** (for Average Temperature Only)  
    - **9th Percentile** (for Average Temperature Only)

4. **Experiment**: Data is provided under two climate scenarios:
    - **rcp45**
    - **rcp85**
    - For each experiment:  
        - rep45_mean, rcp45_std
        - rep85_mean, rcp85_std

The diagram flows from **Period** to **Variable**, then to **Statistic**, and finally to **Experiment** outputs. It shows how temperature data is structured and analyzed.

<img src="./images/sis-temperature-statistics-data-structure_v2.jpg" width="550" style="display: block; margin: 0 auto; border: 1px solid #aeaeae">

In [10]:
import netCDF4 as nc

seen_variables = set()
for i, nc_file in enumerate(nc_files):
    variable_name = nc_file['variable_name']
    
    if variable_name in seen_variables:
        continue

    # Open the NetCDF file in read mode
    nc_dataset = nc.Dataset(nc_file['path'], mode='r')

    # List all variables in the dataset
    variables_list = nc_dataset.variables.keys()
    print(f"{i+1:<2} {variable_name:<17}: Available variables: {list(variables_list)}")
    
    # Add the variable name to the seen set
    seen_variables.add(variable_name)

1  mean_Tmax_Yearly : Available variables: ['mean_Tmax_Yearly', 'height', 'lat', 'lon', 'time']
5  mean_Tmean_Yearly: Available variables: ['mean_Tmean_Yearly', 'height', 'lat', 'lon', 'time']
9  mean_Tmin_Yearly : Available variables: ['mean_Tmin_Yearly', 'height', 'lat', 'lon', 'time']
13 p10_Tmean_Yearly : Available variables: ['p10_Tmean_Yearly', 'quantile', 'lat', 'lon', 'time']
17 p90_Tmean_Yearly : Available variables: ['p90_Tmean_Yearly', 'quantile', 'lat', 'lon', 'time']


In [11]:
seen_variables = set()
for i, nc_file in enumerate(nc_files):
    variable_name = nc_file['variable_name']
    
    if variable_name in seen_variables:
        continue
    
    nc_dataset = nc.Dataset(nc_file['path'], mode='r')
    variable_data = nc_dataset[variable_name]
    
    # Generate summary of the primary variable
    summary = {
        "Variable Name": variable_name,
        "Data Type": variable_data.dtype,
        "Shape": variable_data.shape,
        "Variable Info": f"{variable_data.dimensions}",
        "Units": getattr(variable_data, "units", "N/A"),
        "Long Name": getattr(variable_data, "long_name", "N/A"),
    }
    
    # Display dataset summary as a DataFrame for better visualization
    nc_summary = pd.DataFrame(list(summary.items()), columns=['Description', 'Remarks'])

    # Display the summary DataFrame
    print(f"{i+1}.")
    display(nc_summary)
    
    # Add the variable name to the seen set
    seen_variables.add(variable_name)
    if len(seen_variables)>=2:
        print("....")
        break

1.


Unnamed: 0,Description,Remarks
0,Variable Name,mean_Tmax_Yearly
1,Data Type,float32
2,Shape,"(100, 82, 95)"
3,Variable Info,"('time', 'lat', 'lon')"
4,Units,degrees C
5,Long Name,Ensemble members average o...


5.


Unnamed: 0,Description,Remarks
0,Variable Name,mean_Tmean_Yearly
1,Data Type,float32
2,Shape,"(100, 82, 95)"
3,Variable Info,"('time', 'lat', 'lon')"
4,Units,degrees C
5,Long Name,Ensemble members average o...


....


## 4. Export Dataset to CSV

In [12]:
# import numpy as np
# import netCDF4 as nc

# def netcdf_to_dataframe(
#     nc_file,
#     bounding_box=None):
#     """
#     Converts a netCDF file to a DataFrame, optionally filtering by a bounding box.

#     Parameters:
#         nc_file (dict): Dictionary with keys.
#         bbox (list): Bounding box as [lon_min, lat_min, lon_max, lat_max] (optional).

#     Returns:
#         pd.DataFrame: DataFrame with time, latitude, longitude, and the variable's values.
#     """
#     # Open the netCDF file
#     nc_dataset = nc.Dataset(nc_file['path'], 'r')
#     lon = nc_dataset['lon'][:]
#     lat = nc_dataset['lat'][:]
    
#     # Extract time variable and convert it to readable dates
#     time_var = nc_dataset.variables['time']
#     time_units = time_var.units
#     time_calendar = getattr(time_var, "calendar", "standard")
#     cftime = nc.num2date(time_var[:], units=time_units, calendar=time_calendar)

#     # Extract temperature/variable data
#     variable_data = nc_dataset.variables[nc_file['variable_name']]
    
#     # Filter by bounding box if provided
#     if bounding_box:
#         lon_min, lat_min, lon_max, lat_max = bounding_box
        
#         indices_lat = np.where((lat >= lat_min) & (lat <= lat_max))[0]
#         indices_lon = np.where((lon >= lon_min) & (lon <= lon_max))[0]
        
#         start_lat, end_lat = indices_lat[0], indices_lat[-1] + 1
#         start_lon, end_lon = indices_lon[0], indices_lon[-1] + 1
        
#         filtered_lat = lat[start_lat:end_lat]
#         filtered_lon = lon[start_lon:end_lon]
#         variable_data_subset = variable_data[:, start_lat:end_lat, start_lon:end_lon]
#     else:
#         filtered_lat = lat
#         filtered_lon = lon
#         variable_data_subset = variable_data        
            
#     # Create a column name for the variable
#     variable_column_name = f"{nc_file['ds_period']}_{nc_file['ds_variable']}_{nc_file['ds_statistic']}_{nc_file['rcp']}_{nc_file['rcp_statistic']}"
#     print(variable_column_name)
    
#     # Create rows for the DataFrame
#     rows = []
#     for t in range(variable_data_subset.shape[0]):
#         for i in range(variable_data_subset.shape[1]):
#             for j in range(variable_data_subset.shape[2]):
#                 if not np.ma.is_masked(variable_data_subset[t, i, j]):
#                     rows.append({
#                         'time': cftime[t],
#                         'latitude': filtered_lat[i],
#                         'longitude': filtered_lon[j],
#                         variable_column_name: variable_data_subset[t, i, j]
#                     })

#     # Create a DataFrame from the rows
#     df = pd.DataFrame(rows)
#     df['time'] = pd.to_datetime(df['time'].map(str))
#     df['latitude'] = pd.to_numeric(df['latitude'])
#     df['longitude'] = pd.to_numeric(df['longitude'])
#     df[variable_column_name] = pd.to_numeric(df[variable_column_name])
    
#     # Set the index to time, latitude, and longitude
#     return df.set_index(['time', 'latitude', 'longitude'])

In [13]:
import numpy as np
import pandas as pd
import netCDF4 as nc

def netcdf_to_dataframe(nc_file, bounding_box=None):
    """
    Converts a netCDF file to a DataFrame, optionally filtering by a bounding box.
    
    Parameters:
        nc_file (dict): Dictionary with keys.
        bounding_box (list): Bounding box as [lon_min, lat_min, lon_max, lat_max] (optional).
        
    Returns:
        pd.DataFrame: DataFrame with time, latitude, longitude, and the variable's values.
    """
    # Open the netCDF file
    with nc.Dataset(nc_file['path'], 'r') as nc_dataset:
        lon = nc_dataset['lon'][:]
        lat = nc_dataset['lat'][:]
        
        # Extract time and convert it to readable dates
        time_var = nc_dataset.variables['time']
        time_units = time_var.units
        time_calendar = getattr(time_var, "calendar", "standard")
        cftime = nc.num2date(time_var[:], units=time_units, calendar=time_calendar)

        # Extract variable data
        variable_data = nc_dataset.variables[nc_file['variable_name']]

        # Filter by bounding box if provided
        if bounding_box:
            lon_min, lat_min, lon_max, lat_max = bounding_box
            lat_mask = (lat >= lat_min) & (lat <= lat_max)
            lon_mask = (lon >= lon_min) & (lon <= lon_max)

            lat_indices = np.where(lat_mask)[0]
            lon_indices = np.where(lon_mask)[0]

            filtered_lat = lat[lat_indices]
            filtered_lon = lon[lon_indices]
            variable_data_subset = variable_data[:, lat_indices, :][:, :, lon_indices]
        else:
            filtered_lat = lat
            filtered_lon = lon
            variable_data_subset = variable_data[:]

    # Flatten the data using NumPy
    time_size, lat_size, lon_size = variable_data_subset.shape
    variable_column_name = f"{nc_file['ds_period']}_{nc_file['ds_variable']}_{nc_file['ds_statistic']}_{nc_file['rcp']}_{nc_file['rcp_statistic']}"
    
    # Masked values will be excluded automatically
    masked_data = variable_data_subset.filled(np.nan)  # Fill masked values with NaN
    valid_mask = ~np.isnan(masked_data)

    # Generate time, latitude, longitude mesh grids
    time_grid, lat_grid, lon_grid = np.meshgrid(
        cftime, filtered_lat, filtered_lon, indexing='ij'
    )

    # Extract valid (non-NaN) data
    time_flat = time_grid[valid_mask]
    lat_flat = lat_grid[valid_mask]
    lon_flat = lon_grid[valid_mask]
    values_flat = masked_data[valid_mask]

    # Construct a DataFrame
    df = pd.DataFrame({
        'time': time_flat,
        'latitude': lat_flat,
        'longitude': lon_flat,
        variable_column_name: values_flat
    })

    # Convert time to datetime
    df['time'] = pd.to_datetime(df['time'].astype(str))

    # Set the index
    return df.set_index(['time', 'latitude', 'longitude'])


### 4.1 Create DataFrame and Export as merged CSV file

In [14]:
from tqdm.notebook import tqdm

csv_filename = "sis-temperature-statistics.csv.zip"
csv_path = os.path.join(csv_folder, csv_filename)

if not os.path.isfile(csv_path):
    dataframes = [netcdf_to_dataframe(nc_file) for nc_file in tqdm(nc_files_sorted)]
    df_merged = pd.concat(dataframes, axis=1)
    df_merged.to_csv(csv_path, sep=',', encoding='utf8', compression='zip')
else:
    print(f"File already exists at {csv_path}. Skipping export.")
    df_merged = pd.read_csv(csv_path).set_index(['time', 'latitude', 'longitude'])
    
# Display DataFrame
df_merged

File already exists at .\data\sis-temperature-statistics\csv\sis-temperature-statistics.csv.zip. Skipping export.


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Yearly_Tmax_mean_rcp45_mean,Yearly_Tmax_mean_rcp45_stdev,Yearly_Tmax_mean_rcp85_mean,Yearly_Tmax_mean_rcp85_stdev,Yearly_Tmean_mean_rcp45_mean,Yearly_Tmean_mean_rcp45_stdev,Yearly_Tmean_mean_rcp85_mean,Yearly_Tmean_mean_rcp85_stdev,Yearly_Tmean_p10_rcp45_mean,Yearly_Tmean_p10_rcp45_stdev,Yearly_Tmean_p10_rcp85_mean,Yearly_Tmean_p10_rcp85_stdev,Yearly_Tmean_p90_rcp45_mean,Yearly_Tmean_p90_rcp45_stdev,Yearly_Tmean_p90_rcp85_mean,Yearly_Tmean_p90_rcp85_stdev,Yearly_Tmin_mean_rcp45_mean,Yearly_Tmin_mean_rcp45_stdev,Yearly_Tmin_mean_rcp85_mean,Yearly_Tmin_mean_rcp85_stdev
time,latitude,longitude,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1986-01-01,47.1,5.7,15.120951,0.152071,15.120951,0.152222,10.768780,0.235325,10.768780,0.224792,1.005748,0.263874,1.005748,0.239380,20.235330,0.338345,20.235330,0.318809,6.603446,0.389397,6.603446,0.370019
1986-01-01,47.1,5.8,14.910078,0.113431,14.910078,0.102336,10.568228,0.225237,10.568228,0.222283,0.820628,0.245507,0.820628,0.237274,20.066036,0.328781,20.066036,0.317933,6.417229,0.406068,6.417229,0.398072
1986-01-01,47.1,5.9,14.444681,0.140488,14.444681,0.127091,10.202229,0.158560,10.202229,0.151266,0.552755,0.209239,0.552755,0.202924,19.661780,0.260792,19.661780,0.254812,6.144354,0.307193,6.144354,0.292839
1986-01-01,47.1,6.0,13.965413,0.213235,13.965413,0.201410,9.743344,0.215348,9.743344,0.205612,0.187260,0.269148,0.187260,0.255193,19.152573,0.253804,19.152573,0.236893,5.692410,0.331262,5.692410,0.312642
1986-01-01,47.1,6.1,13.423471,0.224684,13.423471,0.208183,9.231763,0.213139,9.231763,0.203974,-0.302477,0.232850,-0.302477,0.226820,18.674124,0.266804,18.674124,0.250322,5.201129,0.382862,5.201129,0.366070
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2085-01-01,55.2,14.7,13.948444,0.474240,15.219458,0.366860,9.910995,0.545287,11.214604,0.442595,0.218722,0.899802,1.683859,0.812074,19.610245,0.528781,20.835072,0.419922,6.008667,0.877768,7.353578,0.790407
2085-01-01,55.2,14.8,14.119750,0.468124,15.389576,0.352643,9.740071,0.535297,11.057458,0.439172,-0.216917,0.960186,1.263995,0.879039,19.664570,0.517702,20.890070,0.417657,5.528337,0.933898,6.903546,0.855117
2085-01-01,55.2,14.9,14.119610,0.486920,15.394135,0.372972,9.598280,0.496425,10.927850,0.404170,-0.461368,0.863642,1.030694,0.782519,19.614447,0.522562,20.861654,0.449814,5.252343,0.803925,6.648636,0.732422
2085-01-01,55.2,15.0,14.059088,0.462216,15.333551,0.330266,9.665852,0.493948,10.989685,0.392008,-0.301334,0.768905,1.186376,0.693664,19.650105,0.538851,20.899890,0.461169,5.443778,0.689393,6.826317,0.608994


### 4.2 Filter by Area, Create DataFrame and Export as merged CSV file

In [15]:
# Define additional request fields to ensure the request stays within the file size limit.
# These coordinates were obtained using the BBox Extractor tool:
# https://str-ucture.github.io/bbox-extractor/

# Bounding box for the Konstanz region (WGS84 projection):
bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]  # Format: [North, West, South, East]
bbox_wgs84_konstanz_standard = [9.0, 47.6, 9.3, 47.8]  # Standard format: [West, South, East, North]

csv_filename = "sis-temperature-statistics-subset.csv.zip"
csv_path = os.path.join(csv_folder, csv_filename)

if not os.path.isfile(csv_path):
    dataframes = [netcdf_to_dataframe(nc_file, bounding_box=bbox_wgs84_konstanz_standard) for nc_file in tqdm(nc_files_sorted)]
    df_merged = pd.concat(dataframes, axis=1)
    df_merged.to_csv(csv_path, sep=',', encoding='utf8', compression='zip')
else:
    print(f"File already exists at {csv_path}. Skipping export.")
    df_merged = pd.read_csv(csv_path).set_index(['time', 'latitude', 'longitude'])
    
# Display DataFrame
df_merged

File already exists at .\data\sis-temperature-statistics\csv\sis-temperature-statistics-subset.csv.zip. Skipping export.


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Yearly_Tmax_mean_rcp45_mean,Yearly_Tmax_mean_rcp45_stdev,Yearly_Tmax_mean_rcp85_mean,Yearly_Tmax_mean_rcp85_stdev,Yearly_Tmean_mean_rcp45_mean,Yearly_Tmean_mean_rcp45_stdev,Yearly_Tmean_mean_rcp85_mean,Yearly_Tmean_mean_rcp85_stdev,Yearly_Tmean_p10_rcp45_mean,Yearly_Tmean_p10_rcp45_stdev,Yearly_Tmean_p10_rcp85_mean,Yearly_Tmean_p10_rcp85_stdev,Yearly_Tmean_p90_rcp45_mean,Yearly_Tmean_p90_rcp45_stdev,Yearly_Tmean_p90_rcp85_mean,Yearly_Tmean_p90_rcp85_stdev,Yearly_Tmin_mean_rcp45_mean,Yearly_Tmin_mean_rcp45_stdev,Yearly_Tmin_mean_rcp85_mean,Yearly_Tmin_mean_rcp85_stdev
time,latitude,longitude,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1986-01-01,47.6,9.0,14.331696,0.367926,14.331696,0.327256,9.969560,0.197087,9.969560,0.170345,-0.129010,0.353363,-0.129010,0.338264,20.145437,0.313090,20.145437,0.266228,5.737940,0.212152,5.737940,0.206423
1986-01-01,47.6,9.1,14.445174,0.360613,14.445174,0.318268,10.089812,0.194873,10.089812,0.163032,-0.042739,0.318802,-0.042739,0.302732,20.317759,0.323794,20.317759,0.271609,5.856429,0.201906,5.856429,0.192904
1986-01-01,47.6,9.2,14.473366,0.292059,14.473366,0.252910,10.179042,0.204742,10.179042,0.181399,0.114491,0.323799,0.114491,0.306196,20.345865,0.266876,20.345865,0.223692,5.982708,0.241907,5.982708,0.237161
1986-01-01,47.7,9.0,14.462224,0.248100,14.462224,0.228836,10.037110,0.132729,10.037110,0.132351,-0.132800,0.351744,-0.132800,0.334977,20.276878,0.237635,20.276878,0.206161,5.749771,0.172277,5.749771,0.179768
1986-01-01,47.7,9.1,14.613771,0.275597,14.613771,0.249408,10.159163,0.136263,10.159163,0.123785,-0.127978,0.306949,-0.127978,0.289142,20.515713,0.287676,20.515713,0.244613,5.848050,0.167173,5.848050,0.164890
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2085-01-01,47.7,9.1,16.738144,0.448282,18.435940,0.585367,12.292133,0.327352,13.986012,0.431585,2.179368,0.405776,3.832271,0.468495,22.783861,0.618649,24.897171,1.033617,8.019099,0.239511,9.714018,0.282824
2085-01-01,47.7,9.2,16.790102,0.435207,18.491148,0.565896,12.381050,0.327880,14.077535,0.429868,2.243479,0.388891,3.911699,0.444539,22.888561,0.642582,24.996714,1.047297,8.143120,0.246540,9.841093,0.286129
2085-01-01,47.8,9.0,15.870772,0.412527,17.584803,0.535101,11.779332,0.428921,13.486767,0.472268,1.970922,0.405307,3.617135,0.429932,21.863422,0.616352,23.961403,0.972253,7.831337,0.521602,9.535279,0.512175
2085-01-01,47.8,9.1,16.073100,0.512709,17.783950,0.576360,11.782446,0.395966,13.484940,0.431787,1.851335,0.424747,3.503872,0.483110,22.078798,0.567791,24.174349,0.896124,7.654391,0.361682,9.353566,0.355475


## 5. Export Dataset to GeoTIFF

### 5.1 Function to exort the Dataset as GeoTIFF File(s)

In [16]:
import numpy as np
from rasterio.transform import from_origin
import rasterio