# SIS Health Vector

Eignung für das Vorkommen und die saisonale Aktivität der Tigermücke (Aedes albopictus) in Europa

Dieses Skript verarbeitet den Datensatz **SIS Health Vector** aus dem Copernics Climate Data Store. Der Datensatz enthält Informationen zu der Eignung der Umweltbedingungen sowie der saisonalen Aktivität der Tigermücke. Der Datensatz wurde im Rahmen des C3S European Health Service entwickelt. Die Informationen sind für unterschiedliche zukünftige Zeiträume und Klimawandelszenarien verfügbar.

**Informationen zum Datensatz**: 

* Source: [SIS Health Vector](https://cds.climate.copernicus.eu/datasets/sis-health-vector?tab=overview)
* Author: T. Tewes (Stadt Konstanz) 
* Notebook-Version: 1.1 (Updated: December 02, 2024)

## 1. Specifying the paths and working directories

In [1]:
import os

''' ---- Hier die Verzeichnisse angeben ---- '''
download_folder = r".\data\sis-health-vector\download"
working_folder = r".\data\sis-health-vector\working"
geotiff_folder = r".\data\sis-health-vector\geotiff"
csv_folder = r".\data\sis-health-vector\csv"
output_folder = r".\data\sis-health-vector\output"
''' ----- Ende der Eingaben ---- '''

os.makedirs(download_folder, exist_ok=True)
os.makedirs(working_folder, exist_ok=True)
os.makedirs(geotiff_folder, exist_ok=True)
os.makedirs(csv_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

## 2. Download and Extract Dataset

### 2.1 API Authentication

In [2]:
import cdsapi

def main():
    api_key = "fdae60fd-35d4-436f-825c-c63fedab94a4"
    api_url = "https://cds.climate.copernicus.eu/api"
    client = cdsapi.Client(url=api_url, key=api_key)
    return client

### 2.2 Request Definition and Download

In [3]:
# Define additional request fields to ensure the request stays within the file size limit.
# These coordinates were obtained using the BBox Extractor tool:
# https://str-ucture.github.io/bbox-extractor/

bbox_wgs84_deutschland = [56.0, 5.8, 47.2, 15.0]
bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]

In [4]:
dataset = "sis-health-vector"
request = {
    "variable": [
        "suitability",
        "season_length"
    ],
    "experiment": [
        "rcp4_5",
        "rcp8_5"
    ],
    "ensemble_statistic": [
        "ensemble_members_average",
        "ensemble_members_standard_deviation"
    ],
    "area": bbox_wgs84_deutschland
}

In [5]:
# Uncomment and run this cell to download the dataset:

def main_retrieve():
    dataset_filename = f"{dataset}.zip"
    dataset_filepath = os.path.join(download_folder, dataset_filename)

    # Download the dataset only if the dataset has not been downloaded before
    if not os.path.isfile(dataset_filepath):
        # Download the dataset with the defined request parameters
        client.retrieve(dataset, request, dataset_filepath)
    else:
        print("Dataset already downloaded.")

if __name__ == "__main__":
    client = main()
    main_retrieve()

2024-12-11 10:55:41,755 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.
2024-12-11 10:55:41,755 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2024-12-11 10:55:41,756 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**


Dataset already downloaded.


### 2.3 Extract the Zip folder

In [6]:
import zipfile

dataset_filename = f"{dataset}.zip"
dataset_filepath = os.path.join(download_folder, dataset_filename)
extract_folder = working_folder

# Extract the zip file
try:
    os.makedirs(extract_folder, exist_ok=True)
    
    if not os.listdir(extract_folder):
        with zipfile.ZipFile(dataset_filepath, 'r') as zip_ref:
            zip_ref.extractall(extract_folder)
            print(f"Successfully extracted files to: {extract_folder}")
    else:
        print("Folder is not empty. Skipping extraction.")
except FileNotFoundError:
    print(f"Error: The file {dataset_filepath} was not found.")
except zipfile.BadZipFile:
    print(f"Error: The file {dataset_filepath} is not a valid zip file.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Folder is not empty. Skipping extraction.


## 3. Data Processing and Visualization

### 3.1 Recording of available RCP scenarios and statistics

In [7]:
import re
import pandas as pd

def meta(filename):
    match = re.search(r'mosquito_(suit|seas)_(rcp\d{2})_(\w+)_v(\d+\.\d+)\.', filename) # For subset area
    if not match:
        raise ValueError("the given filename does not fit the expected naming scheme")
        
    return dict(
        filename = filename,
        path = os.path.join(working_folder, filename),
        variable = match.group(1),
        rcp = match.group(2),
        statistic = match.group(3),
        version = match.group(4),
    )

nc_files = [meta(f) for f in os.listdir(working_folder) if f.endswith('.nc')]

df_nc_files = pd.DataFrame.from_dict(nc_files)
df_nc_files

Unnamed: 0,filename,path,variable,rcp,statistic,version
0,mosquito_seas_rcp45_mean_v1.0.area-subset.56.0...,.\data\sis-health-vector\working\mosquito_seas...,seas,rcp45,mean,1.0
1,mosquito_seas_rcp45_stdev_v1.0.area-subset.56....,.\data\sis-health-vector\working\mosquito_seas...,seas,rcp45,stdev,1.0
2,mosquito_seas_rcp85_mean_v1.0.area-subset.56.0...,.\data\sis-health-vector\working\mosquito_seas...,seas,rcp85,mean,1.0
3,mosquito_seas_rcp85_stdev_v1.0.area-subset.56....,.\data\sis-health-vector\working\mosquito_seas...,seas,rcp85,stdev,1.0
4,mosquito_suit_rcp45_mean_v1.0.area-subset.56.0...,.\data\sis-health-vector\working\mosquito_suit...,suit,rcp45,mean,1.0
5,mosquito_suit_rcp45_stdev_v1.0.area-subset.56....,.\data\sis-health-vector\working\mosquito_suit...,suit,rcp45,stdev,1.0
6,mosquito_suit_rcp85_mean_v1.0.area-subset.56.0...,.\data\sis-health-vector\working\mosquito_suit...,suit,rcp85,mean,1.0
7,mosquito_suit_rcp85_stdev_v1.0.area-subset.56....,.\data\sis-health-vector\working\mosquito_suit...,suit,rcp85,stdev,1.0


In [8]:
import os
import pandas as pd
from netCDF4 import Dataset, num2date
from tqdm import tqdm

def netcdf_to_dataframe(nc_file):
    
    dataset = Dataset(nc_file['path'], 'r')
    variable = nc_file['variable']

    if variable == 'suit':
        variable_name = 'suitability'
    elif variable == 'seas':
        variable_name = 'season_length'
    else:
        raise ValueError(f"Unexpected variable: {variable}")
    
    if variable_name in dataset.variables and 'time' in dataset.variables:
        temperature = dataset.variables[variable_name][:]
        time = dataset.variables['time'][:]
        lat = dataset.variables['lat'][:]
        lon = dataset.variables['lon'][:]
        
        time_units = dataset.variables['time'].units
        time_calendar = dataset.variables['time'].calendar if hasattr(dataset.variables['time'], 'calendar') else 'standard'
        time = num2date(time, units=time_units, calendar=time_calendar)
        
        variable_column_name = f"{variable}-{nc_file['rcp']}-{nc_file['statistic']}"
        
        rows = []
        for t in range(temperature.shape[0]):
            for i in range(temperature.shape[1]):
                for j in range(temperature.shape[2]):
                    rows.append({
                        'time': time[t],
                        'latitude': lat[i],
                        'longitude': lon[j],
                        variable_column_name: temperature[t, i, j]
                    })
        
        df = pd.DataFrame(rows)
        df['time'] = pd.to_datetime(df['time'].map(str))
        df['latitude'] = pd.to_numeric(df['latitude'])
        df['longitude'] = pd.to_numeric(df['longitude'])
        # df[variable_column_name] = pd.to_numeric(df[variable_column_name])
        
        # Setze den Index auf time, latitude und longitude
        return df.set_index(['time', 'latitude', 'longitude'])

    else:
        # Zugriff auf das nc_file Dictionary separat aufgelöst
        path = nc_file['path']
        raise ValueError(f"Variables not found in {path}")

dataframes = [netcdf_to_dataframe(nc_file) for nc_file in tqdm(nc_files)]

# Kombiniere alle Daten in eine Tabelle
df = pd.concat(dataframes, axis=1)
df

## Export and compile takes more than 30s, error in sphinx
## Runs without error locally

100%|██████████| 8/8 [01:09<00:00,  8.72s/it]


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,seas-rcp45-mean,seas-rcp45-stdev,seas-rcp85-mean,seas-rcp85-stdev,suit-rcp45-mean,suit-rcp45-stdev,suit-rcp85-mean,suit-rcp85-stdev
time,latitude,longitude,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1986-01-01,47.2,5.8,136.145752,6.466001,136.145752,6.855072,81.692055,1.9252,81.692055,2.159897
1986-01-01,47.2,5.9,133.019821,6.768153,133.019821,6.986507,80.040009,2.139028,80.040009,2.273399
1986-01-01,47.2,6.0,127.814919,10.154747,127.814919,8.994880,75.537628,2.213439,75.537628,2.110248
1986-01-01,47.2,6.1,118.103043,11.235444,118.103043,9.275740,68.944649,2.641829,68.944649,2.369284
1986-01-01,47.2,6.2,107.021515,12.717433,107.021515,10.117734,62.545948,3.177054,62.545948,2.793351
...,...,...,...,...,...,...,...,...,...,...
2085-01-01,56.0,14.6,125.408073,15.855171,145.052460,12.160140,67.942757,8.457479,85.176582,5.36737
2085-01-01,56.0,14.7,78.901611,8.216899,90.656075,5.875802,--,--,--,--
2085-01-01,56.0,14.8,75.737267,6.302417,83.190483,5.433066,--,--,--,--
2085-01-01,56.0,14.9,72.758110,6.329628,80.011604,5.470108,--,--,--,--


In [9]:
# csv_path = os.path.join(csv_folder, 'sis-health-vector.csv.zip')


# if not os.path.isfile(csv_path):
#     df.to_csv(csv_path, sep=',', encoding='utf8', compression='zip')
#     print(f"Summary data exported to {csv_path}")
# else:
#     print(f"{csv_path} file already exists. Skipping export.")
