# Exploring the effects of changing agroclimatological conditions on potential occurence of major winter wheat diseases: A spatio-temporal analysis for Germany from 1960 to today

## Step 1. Exploring the data

From the __[DWD website](https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/)__ we can access the following data:

1. air temperature
1. cloud types
3. cloudiness
4. dew point
5. extreme wind
6. moisture
7. precipitation
8. pressure
9. soil temperature
10. solar
11. sun
12. visibility
13. weather phenomena
14. wind
15. wind synop

In [1]:
import pandas as pd
import re
import requests
from requests_html import HTMLSession
from typing import List
from functions import get_date, hide_toggle
import random

In [13]:
def get_links(parameters:List[str], time:str = "hourly") -> dict:
    dwd_links = {key: None for key in parameters}
    for parameter in parameters:
        url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/' + str(time) + '/' + parameter + '/historical/'
        try:
            session = HTMLSession()
            response = session.get(url)
            dwd_links[parameter] = response.html.absolute_links

        except requests.exceptions.RequestException as e:
            print(e)
    return dwd_links

def count_datapoints(dwd_links:dict,parameter:str, start_year:int, end_year:int) -> int:
    i=0
    for link in dwd_links[parameter]:
        try:
            start_interval = int(get_date(link)[0])
            end_interval = int(get_date(link)[1])
            if ((start_interval <= start_year)  & (end_interval >= end_year)):
                i = i+1
        except:
            pass
    return i

In [14]:

def show_available_data(dwd_links:dict, parameters: List[str]):
    data_balance = pd.DataFrame(columns = parameters, index = [str(i) + "'s - present" for i in range(1950,2020,10)])
    for parameter in parameters:
        for i in range(1950,2020,10):
            data_balance[parameter][str(i)+ "'s - present"] = count_datapoints(dwd_links,parameter, i, 2020)
    print(data_balance)

### From this data, the most relevant for disease models are air temperature, dew point, moisture and precipitation. The following table summarizes the hourly data points we have for each parameter and each period.

In [16]:
parameters =["air_temperature","dew_point", "moisture", "precipitation"]
dwd_links = get_links(parameters)
show_available_data(dwd_links,parameters)

                 air_temperature dew_point moisture precipitation
1950's - present              29        48       48             0
1960's - present              67        57       57             0
1970's - present              75        59       59             0
1980's - present              97       118      118             0
1990's - present             129       149      149             0
2000's - present             155       185      185           144
2010's - present             481       481      481           927


In [30]:
def ids_datapoints(dwd_links:dict,parameter:str, start_year:int, end_year:int) -> List[str]:
    list = []
    for link in dwd_links[parameter]:
        try:
            start_interval = int(get_date(link)[0])
            end_interval = int(get_date(link)[1])
            if ((start_interval <= start_year)  & (end_interval >= end_year)):
                id = re.findall("_\d{5}_",str(link))[0]
                ##print("link")
                list.append(id)
        except:
            pass
    return list
def common_stations(paramer1:str, parameter2:str):
    print(f"{paramer1} and {parameter2} have the following number of stations that measured in:")
    for i in range(1950,2020,10):
        list1 = ids_datapoints(dwd_links,paramer1,i,2020)
        list2 = ids_datapoints(dwd_links,parameter2,i,2020)
        list1_as_set = set(list1)
        intersection = list1_as_set.intersection(list2)
        intersection_as_list = list(intersection)
        print(str(i) + " - present" + ": " + str(len(intersection_as_list)))


In [31]:
common_stations("dew_point", "air_temperature")

dew_point and air_temperature have the following number of stations that measured in:
1950 - present: 24
1960 - present: 46
1970 - present: 52
1980 - present: 87
1990 - present: 116
2000 - present: 143
2010 - present: 480


## Step 2. Downloading the data