# Download the Temperature Time Series for DWD Stations in NRW 

## 1. About the DWD Open Data Portal 

The data of the Climate Data Center (CDC) of the DWD (Deutscher Wetterdienst, German Weather Service) is provided on an **FTP server**. <br> **FTP** stands for _File Transfer Protocol_.

Open the FTP link ftp://opendata.dwd.de/climate_environment/CDC/ in your browser (copy-paste) and find our how it is structured hierarchically.

You can also open the link with **HTTPS** (Hypertext Transfer Protocol Secure): https://opendata.dwd.de/climate_environment/CDC/

**Download and read** the document https://opendata.dwd.de/climate_environment/CDC/Readme_intro_CDC_ftp.pdf


## 2. Download the Station Meta Data 

We are interested in observations with following properties:

1. The observations are taken in Germany.
1. It is temperature data.
1. The temporal resolution is annually (yearly).
1. Use historical data, not recent.


Download the corresonding station meta data file (description) from the FTP server. The file you have to download is named `KL_Jahreswerte_Beschreibung_Stationen.txt`. The elements of the file name denote:

* KL, Klima:     Ensemble of Climate Data, 
* Jahreswerte:   Annual Values, 
* Beschreibung:  Description, 
* Stationen:     Stations


## FTP Connection

This connection is used to download the metadata text file `KL_Jahreswerte_Beschreibung_Stationen.txt` listing the meteorological stations providing hourly precipitation measurements. The data in the text file is fixed width formatted, i.e. data is nicely arranged in columns. Several of the stations have been already abandoned. Whether the precipitation measurements are still active can be concluded from the column `bis_datum`. Of course you could have downloaded this single file directly by means of your browser or an FTP client but later it becomes clear how download processes can be automized with the ftplib.

### Directory Definition and Station Description Filename Pattern

In [None]:
# The topic of interest: historical annual temperature data (as part of the KL data ensemble). 
topic_dir = "/annual/kl/historical/"

# This is the search pattern common to ALL station description file names. 
station_desc_pattern = "_Beschreibung_Stationen.txt"

# Below this directory tree node all climate data are stored.
climate_data_dir = "/climate_environment/CDC/observations_germany/climate/"
ftp_dir =  climate_data_dir + topic_dir

# To keep the folders tidy the subdirectory tree of the FTP is replicated.
local_ts_dir = "data/DWD/" + topic_dir # TS stands for "time series". Better add a trailing "/" to make life easier ... 
local_station_dir = local_ts_dir # station info directory.

# Directory trees are created. Ignore errors if they already exist.
import os
os.makedirs(local_ts_dir,exist_ok = True) # it does not complain if the dir already exists.
os.makedirs(local_station_dir,exist_ok = True) # it does not complain if the dir already exists.

### FTP Connect

In [None]:
# Anonymous 
server = "opendata.dwd.de"
user   = "anonymous"
passwd = ""

In [None]:
# Open the FTP session. Log in. If the connection idles for too long it will time out.
import ftplib
ftp = ftplib.FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)

# Just check, whether the connection is still open (not having reached a timeout yes)
#ret = ftp.cwd(".")

# How to log out.
#ftp.quit()

### Generate Pandas Dataframe from FTP Directory Listing

In [None]:
import pandas as pd

In [None]:
from my_dwd import gen_df_from_ftp_dir_listing

In [None]:
# Generate a pandas dataframe from the FTP directory listing 
df_ftpdir = gen_df_from_ftp_dir_listing(ftp, ftp_dir)
df_ftpdir.head(10)

Read the following output carefully. <br>
Q: What does `station_id = -1` mean? <br>
Q: What does the field `ext` mean? <br>
Q: What is the name of the file describing the stations, i.e. lists the stations with their names, coordinates, and other attributes? 

### Dataframe with TS Zip Files

Create a dataframe with the names of the zip files only. These zip archives contain the real measurement data. The measured variable (precipitation, temoperature, etc.) is time dependent. A sequence of data over time is called **time series**.

In [None]:
#df_ftpdir["ext"]==".zip"
df_zips = df_ftpdir[df_ftpdir["ext"]==".zip"]
df_zips.set_index("station_id", inplace = True)
df_zips.head(10)

#### Excursion: How to Exctract Data and Series from a Dataframe 

In [None]:
# Extract a column: It is a series
print(df_zips["name"], "\n")
print("Type: ", type(df_zips["name"]))

In [None]:
# Extract a row: It is a series
print(df_zips.loc[1078], "\n")
print(type(df_zips.loc[1078]))

In [None]:
# Extract a value
print(df_zips["name"][1078])

### Download the Station Description File

#### Find the Station Description File in the FTP Directory Dataframe

In [None]:
station_fname = df_ftpdir[df_ftpdir['name'].str.contains(station_desc_pattern)]["name"].values[0]
print("Pattern matched: ", station_fname)

# ALternative
#station_fname2 = df_ftpdir[df_ftpdir["name"].str.match("^.*Beschreibung_Stationen.*txt$")]["name"].values[0]
#print(station_fname2)

#### Grab Station Description with FTP Grab File Function

In [None]:
from my_dwd import grabFile

In [None]:
print("grab file: " + station_fname + "\nfrom ftp dir: " + ftp_dir)
grabFile(ftp, ftp_dir + station_fname, local_station_dir + station_fname)

In [None]:
from my_dwd import station_desc_txt_to_csv

In [None]:
basename = os.path.splitext(station_fname)[0]
df_stations = station_desc_txt_to_csv(local_station_dir + station_fname, local_station_dir + basename + ".csv")
df_stations.head()

### Select Stations: located in NRW & still operational & with long time series

In [None]:
#station_ids_nrw = df_stations[df_stations['state'].str.contains("Nordrhein")].index
df_stations['state'].str.contains("Nordrhein")

In [None]:
max_date  = df_stations['date_to'].max()
print(max_date)

In [None]:
# do be continued ...