Skip to content

Commit

Permalink
Add a run script for data_collection.py with fire and add a table of …
Browse files Browse the repository at this point in the history
…available data per TimeResolution
  • Loading branch information
meteoDaniel authored and meteoDaniel committed Jun 20, 2020
1 parent 90e96b7 commit 9a5928d
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 9 deletions.
2 changes: 1 addition & 1 deletion DWD_FTP_STRUCTURE.md
@@ -1,6 +1,6 @@
# Folder structure of dwd ftp server ###

[LINK TO DWD](ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/climate)
[LINK TO DWD](https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate)

| Timescale | Variable | Abbrevation | Period | Filename |
| --- | --- | --- | --- | --- |
Expand Down
4 changes: 3 additions & 1 deletion Dockerfile
Expand Up @@ -7,4 +7,6 @@ ENV TERM linux
COPY ./requirements.txt /opt/requirements.txt
RUN pip install -r /opt/requirements.txt

WORKDIR /app
WORKDIR /app

ENV PYTHONPATH /app/
38 changes: 34 additions & 4 deletions README.md
Expand Up @@ -98,19 +98,43 @@ metadata = python_dwd.metadata_for_dwd_data(parameter=Parameter.PRECIPITATION_MO
```


## 4. Listing server files
## 4. Availability table

It is also possible to use enumeration keywords. The availability table shows the enumeration keyword mapping the availability via python_dwd and on the CDC server.

|Paramater/Granularity |1_minute | 10_minutes |hourly | daily |monthly | annual|
|----------------|-------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
| `TEMPERATURE_SOIL = "soil_temperature"` | :x: | :x: | :heavy_check_mark:|:heavy_check_mark: |:x: | :x:|
| `TEMPERATURE_AIR = "air_temperature"` |:x: | :heavy_check_mark:| :heavy_check_mark:| :x:|:x: |:x: |
| `TEMPERATURE_AIR = "air_temperature"` |:x: | :x: |:x: |:x: | :x:|:x: |
| `PRECIPITATION = "precipitation"` | :heavy_check_mark: | :heavy_check_mark: |:x: | :x:| :x:|:x: |
| `TEMPERATURE_EXTREME = "extreme_temperature"` | :x:|:heavy_check_mark: | :x:|:x: | :x:|:x: |
| `WIND_EXTREME = "extreme_wind" ` |:x: | :heavy_check_mark: | :x:| :x:|:x: |:x: |
| `SOLAR = "solar"` | :x: | :heavy_check_mark: | :heavy_check_mark:| :heavy_check_mark:| :x:|:x: |
| `WIND = "wind" ` |:x: |:heavy_check_mark: | :heavy_check_mark:|:x: |:x: |:x: |
| `CLOUD_TYPE = "cloud_type"` |:x: | :x: | :heavy_check_mark:|:x: |:x: |:x: |
| `CLOUDINESS = "cloudiness" ` | :x: | :x: |:heavy_check_mark: | :x:| :x:| :x:|
| `SUNSHINE_DURATION = "sun"` |:x: |:x: | :heavy_check_mark:| :x:|:x:|:x: |
| `VISBILITY = "visibility"` | :x:| :x:|:heavy_check_mark: |:x: | :x:| :x:|
| `WATER_EQUIVALENT = "water_equiv"` | :x:| :x: |:x: |:heavy_check_mark: |:x: | :x:|
| `PRECIPITATION_MORE = "more_precip" ` | :x: | :x: |:x: | :heavy_check_mark:|:heavy_check_mark: | :heavy_check_mark:|
| `PRESSURE = "pressure"` | :x:| | :heavy_check_mark:|:x: |:x:|:x: |
| `CLIMATE_SUMMARY = "kl"` |:x: | :x: |:x: | :heavy_check_mark:|:heavy_check_mark: |:heavy_check_mark: |


## 5. Listing server files

The server is constantly updated to add new values. This happens in a way that existing station data is appended by newly measured data approxamitly once a year somewhere after new year. This occasion requires the toolset to retrieve a new **filelist**, which has to beinitiated by the user when getting an error about this. For this purpose a function is scanning the server folder for a given parameter set if requested.

The created filelist is also used for the metadata, namely the column **HAS_FILE**. This is due to the fact that not every station listed in the given metadata also has a corresponding file. With this information one can simply filter the metadata with **HAS_FILE == True** to only get those stations that really have a file on the server.

## 5. About the metadata
## 6. About the metadata

The metadata for a set of parameters is not stored in a usual .csv but instead put in a .txt file next to the stationdata. This file has to be parsed first, as unfortunately there's no regular seperator in those files. After parsing the text from those files, a .csv is created which then can be read easily. There's one exception for this case: For 1-minute precipitation data, the metadata is stored in seperate zipfiles, which contain more detailed information. For this reason, when calling metadata_dwd with those parameters will download and read-in all the files and store them in a similar DataFrame to provide a seemless functionality over all parameter types.

Also this data doesn't include the **STATE** information, which sometimes can be useful to filter the data for a certain region. To get this data into our metadata, we run another metadata request for the parameters of historical daily precipitation data, as we expect it to have the most information, because it is the most common station type in Germany. For some cases it still could happen that there's no STATE information as it might be that some stations are only run to individually measure the performance of some values at a special site.

## 6. Conclusion
## 7. Conclusion

Feel free to use the library if you want to automate the data access and analyze the german climate. Be aware that it could happen that the server is blocking the ftp client once in a while. It could be useful though to use a try-except-block and retry to get the data. For further examples of this library check the notebook **python_dwd_example.ipynb** in the **example** folder!

Expand All @@ -130,5 +154,11 @@ To run the tests in the given environment, just call
docker run -ti -v $(pwd):/app python_dwd:latest pytest tests/
```
from the main directory. To work in an iPython shell you just have to change the command `pytest tests/` to `ipython`.
Soon there will be a `fire` based command line script.

#### Command line script
You can download data as csv files after building docker container. Actually only the `collect_dwd_data` is supported from this service.

```
docker run -ti -v $(pwd):/app python_dwd:latest python3 python_dwd/run.py collect_dwd_data "[1048]" "kl" "daily" "historical" /app/dwd_data/ False False True False True True
```

11 changes: 8 additions & 3 deletions python_dwd/data_collection.py
@@ -1,7 +1,7 @@
""" Data collection pipeline """
import logging
from pathlib import Path
from typing import List, Union
from typing import List, Union, Optional
import pandas as pd

from python_dwd.constants.column_name_mapping import GERMAN_TO_ENGLISH_COLUMNS_MAPPING_HUMANIZED
Expand All @@ -26,7 +26,8 @@ def collect_dwd_data(station_ids: List[int],
parallel_download: bool = False,
write_file: bool = False,
create_new_filelist: bool = False,
humanize_column_names: bool = False) -> pd.DataFrame:
humanize_column_names: bool = False,
run_download_only: bool = False) -> Optional[pd.DataFrame]:
"""
Function that organizes the complete pipeline of data collection, either
from the internet or from a local file. It therefor goes through every given
Expand All @@ -45,6 +46,7 @@ def collect_dwd_data(station_ids: List[int],
write_file: boolean if to write data to local storage
create_new_filelist: boolean if to create a new filelist for the data selection
humanize_column_names: boolean to yield column names better for human consumption
run_download_only: boolean to run only the download and storing process
Returns:
a pandas DataFrame with all the data given by the station ids
Expand Down Expand Up @@ -86,7 +88,10 @@ def collect_dwd_data(station_ids: List[int],
station_data, station_id, parameter, time_resolution, period_type, folder)

data.append(station_data)


if run_download_only:
return None

data = pd.concat(data)

# Assign meaningful column names (humanized).
Expand Down
10 changes: 10 additions & 0 deletions python_dwd/run.py
@@ -0,0 +1,10 @@
""" entrypoints ro tun scripts via Docker or command line """
import fire

from python_dwd.data_collection import collect_dwd_data


if __name__ == '__main__':
fire.Fire({
'collect_dwd_data': collect_dwd_data
})
1 change: 1 addition & 0 deletions requirements.txt
Expand Up @@ -14,3 +14,4 @@ fire==0.3.1
docopt==0.6.2
munch==2.5.0
dateparser==0.7.4
fire==0.3.1

0 comments on commit 9a5928d

Please sign in to comment.