# Get DWD CDC Station List for Climate Data

## 1. About the DWD Open Data Portal 

The data of the Climate Data Center (CDC) of the DWD (Deutscher Wetterdienst, German Weather Service) is provided on an **FTP server**. <br> **FTP** stands for _File Transfer Protocol_.

Open the FTP link ftp://opendata.dwd.de/climate_environment/CDC/ in your browser (copy-paste) and find our how it is structured hierarchically.

You can also open the link with **HTTPS** (Hypertext Transfer Protocol Secure): https://opendata.dwd.de/climate_environment/CDC/

**Download and read** the document https://opendata.dwd.de/climate_environment/CDC/Readme_intro_CDC_ftp.pdf

**Q1:** In which temporal resolutions are the time series provided?

**Q2:** What is the difference between _historical_ and _recent_ data also with respect to quality control?

**Q3:** Are all meteorological parameters provided at the same temporal resolution?


## 2. Download the Station Meta Data 

We are interested in observations with following properties:

1. The observations are taken in Germany.
1. It is climate data.
1. The temporal resolution is annual.
1. Use historial data, nt recent.


Download the corresonding station meta data file (description) from the FTP server. The file you have to download is named `KL_Jahreswerte_Beschreibung_Stationen.txt`. The elements of the file name denote:

* KL: Klima, Climate, 
* Jahreswerte: Annual Values, 
* Beschreibung: Description, 
* Stationen: Stations

**Q1:** Under with path (directory, folder) on the FTP server do you find the file?

**Q2:** The Python FTP client we use is provided through the library _ftplib_: <br>
https://pythonprogramming.net/ftp-transfers-python-ftplib/ <br>
How to you use it?

**Q3:** Look at the code below. In which folder is the data stored locally? What is are relative and absolute paths?

In [None]:
server = "opendata.dwd.de"
user = "anonymous"
passwd = ""
# COMPLETE THE PATH: dir = "/climate_environment/CDC/observations_germany/..."
filename = "KL_Jahreswerte_Beschreibung_Stationen.txt"
localpath = "data"

In [None]:
from ftplib import FTP

In [None]:
#domain name or server ip:
ftp = FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)
res = ftp.cwd(dir)
print(res)
ftp.dir()

In [None]:
def grabFile(filename,localpath):
    localfile = open(localpath+"/"+filename, 'wb')
    ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
    localfile.close()

In [None]:
grabFile(filename,localpath)

In [None]:
# Finally disconnect from the FPT Server
res = ftp.quit()
print(res)

## 3. Read the Station Data into a Pandas Dataframe

The Station Data is in fixed column format. Pandas provides a reader for text files with fixed column width.  

Search the Pandas doc https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html for this fixed column reader. Learn how to use it and read the station data file into a dataframe.

Hint: Count the characters per column (column wdith) in a text editor.

### Extract column names and translate them from DE to EN.

In [None]:
# extract column names. They are in German (de)
file = open(localpath+"/"+filename,"r")
r = file.readline()
file.close()
colnames_de = r.split()
colnames_de

In [None]:
# translation dictionary
translate = \
{'Stations_id':'station_id',
 'von_datum':'date_from',
 'bis_datum':'date_to',
 'Stationshoehe':'altitude',
 'geoBreite': <fill in!>,
 'geoLaenge': <fill in!>,
 'Stationsname':'name',
 'Bundesland':'state'}

In [None]:
for h in colnames_de:
    print(translate[h])

In [None]:
# Pythonic
colnames_en = [translate[h] for h in colnames_de]
print(colnames_en)

### Read the formatted data with pd.read_fwf().

In [None]:
import pandas as pd

In [None]:
help(pd.read_fwf)

In [None]:
# Skip the first two rows and set the column names.
df = pd.read_fwf(localpath+"/"+filename,skip...<fill in!>,names=colnames_en)
df.head()

In [None]:
# Better parse dates! Column 0 should be treated as index. It makes the later export with pd.to_csv() easier.
df = pd.read_fwf(localpath+"/"+filename,skip...<fill in!>,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
df.head()

In [None]:
df.shape

## 4. Export the dataframe as CSV file

Use semicolons as field delimiters.

In [None]:
# extract basename (Filename) without extension
import os
fname = os.path.splitext(filename)[0]
csvname = fname + ".csv"
print(csvname)

df.to_csv(localpath+"/"+csvname, sep =";")

## 5. Import the CSV as point vector layer into QGIS.

## 6. Download the zip-Archive with the Digital Administrative Boundaries



https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/

dvg1_EPSG25832_Shape.zip

DVG: Digitale Verwaltungsgrenzen, DVG1 has more details than DVG2.

How to use the data: https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/Nutzerinformationen.pdf

Download the pdf and use Google Translate (GT) to translate the pdf (upload to GT).

https://www.bezreg-koeln.nrw.de/brk_internet/geobasis/topographie_sonderkarten/verwaltungsgrenzen/index.html

## Homework: Create a Map in QGIS

Follow the tutorial http://www.qgistutorials.com/en/docs/3/making_a_map.html

In class we created a vector data layer (point shape file) with the coordinates of the DWD CDC climate stations from a CSV file we generated from the meta data file downloaded from the open data DWD FTP archive (yearly values, temperature).

Create a map of the DWD climate stations located in NRW. Use a shapefile of the NRW administrative boundaries.

Use the EPSG:28532 coordinate reference system (projection). We will learn later what it is.
