# Read a DWD Station Decription File into a Pandas Data Frame


* FTP: ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/
* HTTPS: https://opendata.dwd.de/climate_environment/CDC/observations_germany/

The DWD iorganizes its open climate data on the FTP server according the following hierarchy:

temporal resolution -> variable -> time span 

```
./hourly/precipitation/recent/
./hourly/precipitation/historical/
```
## Some Data Source Examples: ##

**Hourly precipitation recent (RR data format)**
* FTP directory: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/recent/
* Dataset description: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/recent/DESCRIPTION_obsgermany_climate_hourly_precipitation_recent_en.pdf
* Station description: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/recent/RR_Stundenwerte_Beschreibung_Stationen.txt

**Hourly precipitation historical (RR data format)**
* FTP directory: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/historical/
* Dataset description: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/historical/DESCRIPTION_obsgermany_climate_hourly_precipitation_historical_en.pdf
* Station description: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/historical/RR_Stundenwerte_Beschreibung_Stationen.txt

**Hourly temperature recent and historical (TU data format)**
* https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/
* https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/

**Daily temperature recent (KL data format)**
* https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/recent/

**Daily precipitation recent (RR data format)**
* https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/more_precip/recent/

**Annual values** 
* FTP directory: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/annual/kl/historical/
* Dataset description:
* Station description: 



## FTP Connection

### Connection Parameters

In [1]:
server = "opendata.dwd.de"
user   = "anonymous"
passwd = ""

### FTP Directory Definition and Station Description Filename Pattern

In [2]:
# The topic of interest.
topic_dir = "/hourly/precipitation/recent/"
#topic_dir = "/annual/kl/historical/"

# This is the search pattern common to ALL station description file names 
station_desc_pattern = "_Beschreibung_Stationen.txt"

# Below this directory tree node all climate data are stored.
ftp_climate_data_dir = "/climate_environment/CDC/observations_germany/climate/"
ftp_dir =  ftp_climate_data_dir + topic_dir

### Local Directories

In [3]:
local_ftp_dir         = "../data/original/DWD/"      # Local directory to store local ftp data copies, the local data source or input data. 
local_ftp_station_dir = local_ftp_dir + topic_dir # Local directory where local station info is located
local_ftp_ts_dir      = local_ftp_dir + topic_dir # Local directory where time series downloaded from ftp are located

local_generated_dir   = "../data/generated/DWD/" # The generated of derived data in contrast to local_ftp_dir
local_station_dir     = local_generated_dir + topic_dir # Derived station data, i.e. the CSV file
local_ts_merged_dir   = local_generated_dir + topic_dir # Parallelly merged time series, wide data frame with one TS per column
local_ts_appended_dir = local_generated_dir + topic_dir # Serially appended time series, long data frame for QGIS TimeManager Plugin


In [4]:
print(local_ftp_dir)
print(local_ftp_station_dir)
print(local_ftp_ts_dir)
print()
print(local_generated_dir)
print(local_station_dir)
print(local_ts_merged_dir)
print(local_ts_appended_dir)

../data/original/DWD/
../data/original/DWD//hourly/precipitation/recent/
../data/original/DWD//hourly/precipitation/recent/

../data/generated/DWD/
../data/generated/DWD//hourly/precipitation/recent/
../data/generated/DWD//hourly/precipitation/recent/
../data/generated/DWD//hourly/precipitation/recent/


In [5]:
import os
os.makedirs(local_ftp_dir,exist_ok = True) # it does not complain if the dir already exists.
os.makedirs(local_ftp_station_dir,exist_ok = True)
os.makedirs(local_ftp_ts_dir,exist_ok = True)

os.makedirs(local_generated_dir,exist_ok = True)
os.makedirs(local_station_dir,exist_ok = True)
os.makedirs(local_ts_merged_dir,exist_ok = True)
os.makedirs(local_ts_appended_dir,exist_ok = True)

### FTP Connect

In [12]:
import ftplib
ftp = ftplib.FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)

230 Login successful.


In [13]:
ret = ftp.cwd(".")

In [14]:
#ftp.quit()

### Generate Pandas Dataframe from FTP Directory Listing

In [15]:
from my_dwd import gen_df_from_ftp_dir_listing

In [16]:
df_ftpdir = gen_df_from_ftp_dir_listing(ftp, ftp_dir)

In [17]:
df_ftpdir.head(5)

Unnamed: 0,station_id,name,ext,size,type
0,-1,BESCHREIBUNG_obsgermany_climate_hourly_precipi...,.pdf,68888,-
1,-1,DESCRIPTION_obsgermany_climate_hourly_precipit...,.pdf,68313,-
2,-1,RR_Stundenwerte_Beschreibung_Stationen.txt,.txt,209079,-
3,20,stundenwerte_RR_00020_akt.zip,.zip,43913,-
4,44,stundenwerte_RR_00044_akt.zip,.zip,44195,-


### Download the Station Description File

In [18]:
import pandas as pd

In [19]:
from my_dwd import grabFile

In [20]:
station_fname = df_ftpdir[df_ftpdir['name'].str.contains(station_desc_pattern)]["name"].values[0]
print("Station description file name:\n%s" % (station_fname))

# ALternative
#station_fname2 = df_ftpdir[df_ftpdir["name"].str.match("^.*Beschreibung_Stationen.*txt$")]["name"].values[0]
#print(station_fname2)

Station description file name:
RR_Stundenwerte_Beschreibung_Stationen.txt


In [21]:
src = ftp_dir + station_fname
dest = local_ftp_station_dir + station_fname
print("grabFile(ftp, src, dest):")
print("FTP source: " + src)
print("Local dest:   " + dest)
grabFile(ftp, src, dest)

grabFile(ftp, src, dest):
FTP source: /climate_environment/CDC/observations_germany/climate//hourly/precipitation/recent/RR_Stundenwerte_Beschreibung_Stationen.txt
Local dest:   ../data/original/DWD//hourly/precipitation/recent/RR_Stundenwerte_Beschreibung_Stationen.txt


In [22]:
# extract column names. They are in German (de)
# We have to use codecs because of difficulties with character encoding (German Umlaute)
import codecs

def read_station_desc_txt_translate(txtfile):
    file = codecs.open(txtfile,"r","utf-8")
    r = file.readline()
    file.close()
    colnames_de = r.split()
    colnames_de
    
    translate = \
    {'Stations_id':'station_id',
     'von_datum':'date_from',
     'bis_datum':'date_to',
     'Stationshoehe':'altitude',
     'geoBreite': 'latitude',
     'geoLaenge': 'longitude',
     'Stationsname':'name',
     'Bundesland':'state'}
    
    colnames_en = [translate[h] for h in colnames_de]
    
    # Skip the first two rows and set the column names.
    df = pd.read_fwf(txtfile,skiprows=2,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
    
    return(df)

In [23]:
df_stations = read_station_desc_txt_translate(local_ftp_station_dir + station_fname)
df_stations.head()

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3,1995-09-01,2011-04-01,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
20,2004-08-14,2021-02-02,432,48.922,9.9129,Abtsgmünd-Untergröningen,Baden-Württemberg
44,2007-04-01,2021-02-02,44,52.9336,8.237,Großenkneten,Niedersachsen
53,2005-10-01,2021-02-02,60,52.585,13.5634,Ahrensfelde,Brandenburg
71,2004-10-22,2020-01-01,759,48.2156,8.9784,Albstadt-Badkap,Baden-Württemberg


In [24]:
basename = os.path.splitext(station_fname)[0]
df_stations.to_csv(local_station_dir + basename + ".csv", sep=";")

### Select Stations Located in NRW and Operational 

In [25]:
#station_ids_selected = df_stations[df_stations['state'].str.contains("Nordrhein")].index
#station_ids_selected

In [26]:
# Create variable with TRUE if state is Nordrhein-Westfalen

# isNRW = df_stations['state'] == "Nordrhein-Westfalen"
isNRW = df_stations['state'].str.contains("Nordrhein")

# Create variable with TRUE if date_to is latest date (indicates operation up to now)
isOperational = df_stations['date_to'] == df_stations.date_to.max() 

#isBefore1950 = df_stations['date_from'] < '1950'
#dfNRW = df_stations[isNRW & isOperational & isBefore1950]

# select on both conditions

dfNRW = df_stations[isNRW & isOperational]

dfNRW.to_csv(local_station_dir + basename + "_NRW" + ".csv", sep=";")

dfNRW

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
216,2004-10-01,2021-02-02,298,51.1143,7.8807,Attendorn-Neulisternohl,Nordrhein-Westfalen
389,2009-11-01,2021-02-02,436,51.0148,8.4318,"Berleburg, Bad-Arfeld",Nordrhein-Westfalen
554,1995-09-01,2021-02-02,23,51.8293,6.5365,Bocholt-Liedern (Wasserwerk),Nordrhein-Westfalen
603,1999-03-03,2021-02-02,147,50.7293,7.2040,Königswinter-Heiderhof,Nordrhein-Westfalen
613,2004-11-01,2021-02-02,206,51.5677,9.2324,Borgentreich,Nordrhein-Westfalen
...,...,...,...,...,...,...,...
13671,2007-12-01,2021-02-02,221,50.9655,7.2753,Overath-Böke,Nordrhein-Westfalen
13696,2007-12-01,2021-02-02,60,51.5966,7.4048,Waltrop-Abdinghof,Nordrhein-Westfalen
13700,2008-05-01,2021-02-02,205,51.3329,7.3411,Gevelsberg-Oberbröking,Nordrhein-Westfalen
13713,2007-11-01,2021-02-02,386,51.0899,7.6289,Meinerzhagen-Redlendorf,Nordrhein-Westfalen


## Create a Geo Data Frame with Geopandas

In [90]:
import pandas as pd
from geopandas import GeoDataFrame
from shapely.geometry import Point
import fiona
from pyproj import CRS

#df = pd.read_csv('data.csv')
df = dfNRW

geometry = [Point(xy) for xy in zip(df.longitude, df.latitude)]
crs = CRS("epsg:4326") #http://www.spatialreference.org/ref/epsg/2263/
stations_gdf = GeoDataFrame(df, crs=crs, geometry=geometry)

stations_gdf.head(5)

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state,geometry
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
216,2004-10-01,2021-02-02,298,51.1143,7.8807,Attendorn-Neulisternohl,Nordrhein-Westfalen,POINT (7.88070 51.11430)
389,2009-11-01,2021-02-02,436,51.0148,8.4318,"Berleburg, Bad-Arfeld",Nordrhein-Westfalen,POINT (8.43180 51.01480)
554,1995-09-01,2021-02-02,23,51.8293,6.5365,Bocholt-Liedern (Wasserwerk),Nordrhein-Westfalen,POINT (6.53650 51.82930)
603,1999-03-03,2021-02-02,147,50.7293,7.204,Königswinter-Heiderhof,Nordrhein-Westfalen,POINT (7.20400 50.72930)
613,2004-11-01,2021-02-02,206,51.5677,9.2324,Borgentreich,Nordrhein-Westfalen,POINT (9.23240 51.56770)


In [None]:
# https://geopandas.org/io.html

stations_gdf.to_file(driver="GPKG",filename="stations.gpkg", layer='stations')

In [84]:


# -> DriverSupportError: ESRI Shapefile does not support datetime fields
# stations_gdf.to_file(driver='ESRI Shapefile', filename='data.shp')

stations_gdf_esri = stations_gdf.copy() 

stations_gdf_esri["date_to"]=stations_gdf_esri["date_to"].astype(str)
stations_gdf_esri["date_from"]=stations_gdf_esri["date_from"].astype(str)

stations_gdf_esri.to_file(driver='ESRI Shapefile', filename='data.shp')