diff --git a/DWD_FTP_STRUCTURE.md b/DWD_FTP_STRUCTURE.md index 9dc448c7d..ac564b2f1 100644 --- a/DWD_FTP_STRUCTURE.md +++ b/DWD_FTP_STRUCTURE.md @@ -1,6 +1,6 @@ # Folder structure of dwd ftp server ### -[LINK TO DWD](ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/climate) +[LINK TO DWD](https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate) | Timescale | Variable | Abbrevation | Period | Filename | | --- | --- | --- | --- | --- | diff --git a/Dockerfile b/Dockerfile index 7598bf82e..237ddfde0 100644 --- a/Dockerfile +++ b/Dockerfile @@ -7,4 +7,6 @@ ENV TERM linux COPY ./requirements.txt /opt/requirements.txt RUN pip install -r /opt/requirements.txt -WORKDIR /app \ No newline at end of file +WORKDIR /app + +ENV PYTHONPATH /app/ \ No newline at end of file diff --git a/README.md b/README.md index e48f4c023..d015b96df 100644 --- a/README.md +++ b/README.md @@ -98,19 +98,43 @@ metadata = python_dwd.metadata_for_dwd_data(parameter=Parameter.PRECIPITATION_MO ``` -## 4. Listing server files +## 4. Availability table + +It is also possible to use enumeration keywords. The availability table shows the enumeration keyword mapping the availability via python_dwd and on the CDC server. + +|Paramater/Granularity |1_minute | 10_minutes |hourly | daily |monthly | annual| +|----------------|-------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------| +| `TEMPERATURE_SOIL = "soil_temperature"` | :x: | :x: | :heavy_check_mark:|:heavy_check_mark: |:x: | :x:| +| `TEMPERATURE_AIR = "air_temperature"` |:x: | :heavy_check_mark:| :heavy_check_mark:| :x:|:x: |:x: | +| `TEMPERATURE_AIR = "air_temperature"` |:x: | :x: |:x: |:x: | :x:|:x: | +| `PRECIPITATION = "precipitation"` | :heavy_check_mark: | :heavy_check_mark: |:x: | :x:| :x:|:x: | +| `TEMPERATURE_EXTREME = "extreme_temperature"` | :x:|:heavy_check_mark: | :x:|:x: | :x:|:x: | +| `WIND_EXTREME = "extreme_wind" ` |:x: | :heavy_check_mark: | :x:| :x:|:x: |:x: | +| `SOLAR = "solar"` | :x: | :heavy_check_mark: | :heavy_check_mark:| :heavy_check_mark:| :x:|:x: | +| `WIND = "wind" ` |:x: |:heavy_check_mark: | :heavy_check_mark:|:x: |:x: |:x: | +| `CLOUD_TYPE = "cloud_type"` |:x: | :x: | :heavy_check_mark:|:x: |:x: |:x: | +| `CLOUDINESS = "cloudiness" ` | :x: | :x: |:heavy_check_mark: | :x:| :x:| :x:| +| `SUNSHINE_DURATION = "sun"` |:x: |:x: | :heavy_check_mark:| :x:|:x:|:x: | +| `VISBILITY = "visibility"` | :x:| :x:|:heavy_check_mark: |:x: | :x:| :x:| +| `WATER_EQUIVALENT = "water_equiv"` | :x:| :x: |:x: |:heavy_check_mark: |:x: | :x:| +| `PRECIPITATION_MORE = "more_precip" ` | :x: | :x: |:x: | :heavy_check_mark:|:heavy_check_mark: | :heavy_check_mark:| +| `PRESSURE = "pressure"` | :x:| | :heavy_check_mark:|:x: |:x:|:x: | +| `CLIMATE_SUMMARY = "kl"` |:x: | :x: |:x: | :heavy_check_mark:|:heavy_check_mark: |:heavy_check_mark: | + + +## 5. Listing server files The server is constantly updated to add new values. This happens in a way that existing station data is appended by newly measured data approxamitly once a year somewhere after new year. This occasion requires the toolset to retrieve a new **filelist**, which has to beinitiated by the user when getting an error about this. For this purpose a function is scanning the server folder for a given parameter set if requested. The created filelist is also used for the metadata, namely the column **HAS_FILE**. This is due to the fact that not every station listed in the given metadata also has a corresponding file. With this information one can simply filter the metadata with **HAS_FILE == True** to only get those stations that really have a file on the server. -## 5. About the metadata +## 6. About the metadata The metadata for a set of parameters is not stored in a usual .csv but instead put in a .txt file next to the stationdata. This file has to be parsed first, as unfortunately there's no regular seperator in those files. After parsing the text from those files, a .csv is created which then can be read easily. There's one exception for this case: For 1-minute precipitation data, the metadata is stored in seperate zipfiles, which contain more detailed information. For this reason, when calling metadata_dwd with those parameters will download and read-in all the files and store them in a similar DataFrame to provide a seemless functionality over all parameter types. Also this data doesn't include the **STATE** information, which sometimes can be useful to filter the data for a certain region. To get this data into our metadata, we run another metadata request for the parameters of historical daily precipitation data, as we expect it to have the most information, because it is the most common station type in Germany. For some cases it still could happen that there's no STATE information as it might be that some stations are only run to individually measure the performance of some values at a special site. -## 6. Conclusion +## 7. Conclusion Feel free to use the library if you want to automate the data access and analyze the german climate. Be aware that it could happen that the server is blocking the ftp client once in a while. It could be useful though to use a try-except-block and retry to get the data. For further examples of this library check the notebook **python_dwd_example.ipynb** in the **example** folder! @@ -130,5 +154,11 @@ To run the tests in the given environment, just call docker run -ti -v $(pwd):/app python_dwd:latest pytest tests/ ``` from the main directory. To work in an iPython shell you just have to change the command `pytest tests/` to `ipython`. -Soon there will be a `fire` based command line script. + +#### Command line script +You can download data as csv files after building docker container. Actually only the `collect_dwd_data` is supported from this service. + +``` +docker run -ti -v $(pwd):/app python_dwd:latest python3 python_dwd/run.py collect_dwd_data "[1048]" "kl" "daily" "historical" /app/dwd_data/ False False True False True True +``` diff --git a/python_dwd/data_collection.py b/python_dwd/data_collection.py index b5ec8dfd6..f281fa305 100644 --- a/python_dwd/data_collection.py +++ b/python_dwd/data_collection.py @@ -1,7 +1,7 @@ """ Data collection pipeline """ import logging from pathlib import Path -from typing import List, Union +from typing import List, Union, Optional import pandas as pd from python_dwd.constants.column_name_mapping import GERMAN_TO_ENGLISH_COLUMNS_MAPPING_HUMANIZED @@ -26,7 +26,8 @@ def collect_dwd_data(station_ids: List[int], parallel_download: bool = False, write_file: bool = False, create_new_filelist: bool = False, - humanize_column_names: bool = False) -> pd.DataFrame: + humanize_column_names: bool = False, + run_download_only: bool = False) -> Optional[pd.DataFrame]: """ Function that organizes the complete pipeline of data collection, either from the internet or from a local file. It therefor goes through every given @@ -45,6 +46,7 @@ def collect_dwd_data(station_ids: List[int], write_file: boolean if to write data to local storage create_new_filelist: boolean if to create a new filelist for the data selection humanize_column_names: boolean to yield column names better for human consumption + run_download_only: boolean to run only the download and storing process Returns: a pandas DataFrame with all the data given by the station ids @@ -86,7 +88,10 @@ def collect_dwd_data(station_ids: List[int], station_data, station_id, parameter, time_resolution, period_type, folder) data.append(station_data) - + + if run_download_only: + return None + data = pd.concat(data) # Assign meaningful column names (humanized). diff --git a/python_dwd/run.py b/python_dwd/run.py new file mode 100644 index 000000000..107daf729 --- /dev/null +++ b/python_dwd/run.py @@ -0,0 +1,10 @@ +""" entrypoints ro tun scripts via Docker or command line """ +import fire + +from python_dwd.data_collection import collect_dwd_data + + +if __name__ == '__main__': + fire.Fire({ + 'collect_dwd_data': collect_dwd_data +}) \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 963df357c..1b67abb9d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -14,3 +14,4 @@ fire==0.3.1 docopt==0.6.2 munch==2.5.0 dateparser==0.7.4 +fire==0.3.1 \ No newline at end of file