Skip to content

Commit

Permalink
Emit dataframe to different data sinks
Browse files Browse the repository at this point in the history
- Export dataframe to different data sinks like
  SQLite, DuckDB, InfluxDB and CrateDB
- Query results with SQL, based on in-memory DuckDB
- Many features from cli.py now available through io.py
- Tests for cli.py, run.py and io.py
  • Loading branch information
amotl committed Sep 22, 2020
1 parent 624b56e commit a5d80ef
Show file tree
Hide file tree
Showing 17 changed files with 1,612 additions and 691 deletions.
2 changes: 0 additions & 2 deletions .coveragerc
Expand Up @@ -7,7 +7,5 @@ source =
show_missing = true
fail_under = 0
omit =
wetterdienst/cli.py
wetterdienst/run.py
tests/*
noxfile.py
2 changes: 1 addition & 1 deletion .flake8
Expand Up @@ -2,4 +2,4 @@
select = B,B9,BLK,C,E,F,W,S
ignore = E203,W503
max-line-length = 88
per-file-ignores = tests/*:S101, wetterdienst/__init__.py:F401, wetterdienst/cli.py:E501,B950
per-file-ignores = tests/*:S101, wetterdienst/__init__.py:F401, wetterdienst/cli.py:E501,B950, wetterdienst/io.py:E501,B950
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Expand Up @@ -6,6 +6,8 @@ Development

- Add TTL-based persistent caching using dogpile.cache
- Add ``example/radolan.py`` and adjust documentation
- Export dataframe to different data sinks like SQLite, DuckDB, InfluxDB and CrateDB
- Query results with SQL, based on in-memory DuckDB

0.7.0 (16.09.2020)
==================
Expand Down
10 changes: 6 additions & 4 deletions README.rst
Expand Up @@ -64,13 +64,15 @@ Details
- Get metadata for a set of Parameter, PeriodType and TimeResolution.
- Get station(s) nearby a selected location for a given set.
- Store/recover collected data.
- Docker image to run the library dockerized.
- Client to run the library from command line.
- Command line interface.
- Run SQL queries on the results.
- Export results to databases and other data sinks.
- Public Docker image on ghcr.io.


Setup
*****
Run the following to make ``wetterdienst`` available in your current environment:
Run this to make ``wetterdienst`` available in your current environment:

.. code-block:: bash
Expand Down Expand Up @@ -115,7 +117,7 @@ Documentation
We strongly recommend reading the full documentation, which will be updated continuously
as we make progress with this library:

- https://wetterdienst.readthedocs.io/
- https://wetterdienst.readthedocs.io/

For the whole functionality, check out the `Wetterdienst API`_ section of our
documentation, which will be constantly updated. To stay up to date with the
Expand Down
88 changes: 83 additions & 5 deletions docs/pages/api.rst
@@ -1,19 +1,30 @@
###
API
###
The API offers access to different data products. They are
outlined in more detail within the :ref:`data-coverage` chapter.

.. contents::
:local:
:depth: 1

************
Introduction
************
The API offers access to different data products. They are
outlined in more detail within the :ref:`data-coverage` chapter.
Please also check out complete examples about how to use the API in the
`example <https://github.com/earthobservations/wetterdienst/tree/master/example>`_
folder.
In order to explore all features interactively,
you might want to try the :ref:`cli`.


************
Observations
************
Acquire historical weather data through requesting by
*parameter*, *time resolution* and *period type*.


Request arguments
=================
The options *parameter*, *time resolution* and *period type* can be used in three ways:
Expand Down Expand Up @@ -146,9 +157,70 @@ can be used to download the observation data:
Et voila: We just got the data we wanted for our location and are ready to analyse the
temperature on historical developments.

Please also check out more advanced examples in the
`example <https://github.com/earthobservations/wetterdienst/tree/master/example>`_
folder on Github.

SQL support
===========
Querying data using SQL is provided by an in-memory DuckDB_ database.
In order to explore what is possible, please have a look at the `DuckDB SQL introduction`_.

The result data is provided through a virtual table called ``data``.

.. code-block:: python
from wetterdienst import DWDStationRequest, DataPackage
from wetterdienst import Parameter, PeriodType, TimeResolution
request = DWDStationRequest(
station_ids=[1048],
parameter=[Parameter.TEMPERATURE_AIR],
time_resolution=TimeResolution.HOURLY,
start_date="2019-01-01",
end_date="2020-01-01",
tidy_data=True,
humanize_column_names=True,
prefer_local=True,
write_file=True,
)
data = DataPackage(request=request)
data.lowercase_fieldnames()
df = data.filter_by_sql("SELECT * FROM data WHERE element='temperature_air_200' AND value < -7.0;")
print(df)
Data export
===========
Data can be exported to SQLite_, DuckDB_, InfluxDB_, CrateDB_ and more targets.
A target is identified by a connection string.

Examples:

- sqlite:///dwd.sqlite?table=weather
- duckdb:///dwd.duckdb?table=weather
- influxdb://localhost/?database=dwd&table=weather
- crate://localhost/?database=dwd&table=weather

.. code-block:: python
from wetterdienst import DWDStationRequest, DataPackage
from wetterdienst import Parameter, PeriodType, TimeResolution
request = DWDStationRequest(
station_ids=[1048],
parameter=[Parameter.TEMPERATURE_AIR],
time_resolution=TimeResolution.HOURLY,
start_date="2019-01-01",
end_date="2020-01-01",
tidy_data=True,
humanize_column_names=True,
prefer_local=True,
write_file=True,
)
data = DataPackage(request=request)
data.lowercase_fieldnames()
data.export("influxdb://localhost/?database=dwd&table=weather")
******
MOSMIX
Expand Down Expand Up @@ -193,3 +265,9 @@ For a more thorough example, please have a look at `example/radolan.py`_.
.. _wradlib: https://wradlib.org/
.. _example/radolan.py: https://github.com/earthobservations/wetterdienst/blob/master/example/radolan.py

.. _SQLite: https://www.sqlite.org/
.. _DuckDB: https://duckdb.org/docs/sql/introduction
.. _DuckDB SQL introduction: https://duckdb.org/docs/sql/introduction
.. _InfluxDB: https://github.com/influxdata/influxdb
.. _CrateDB: https://github.com/crate/crate
60 changes: 57 additions & 3 deletions docs/pages/cli.rst
@@ -1,3 +1,5 @@
.. _cli:

######################
Command line interface
######################
Expand All @@ -7,10 +9,11 @@ Command line interface
$ wetterdienst --help

Usage:
wetterdienst stations --parameter=<parameter> --resolution=<resolution> --period=<period> [--station=] [--latitude=] [--longitude=] [--number=] [--distance=] [--persist] [--format=<format>]
wetterdienst readings --parameter=<parameter> --resolution=<resolution> --period=<period> --station=<station> [--persist] [--date=<date>] [--format=<format>]
wetterdienst readings --parameter=<parameter> --resolution=<resolution> --period=<period> --latitude= --longitude= [--number=] [--distance=] [--persist] [--date=<date>] [--format=<format>]
wetterdienst stations --parameter=<parameter> --resolution=<resolution> --period=<period> [--station=] [--latitude=] [--longitude=] [--number=] [--distance=] [--persist] [--sql=] [--format=<format>]
wetterdienst readings --parameter=<parameter> --resolution=<resolution> --period=<period> --station=<station> [--persist] [--date=<date>] [--sql=] [--format=<format>] [--target=<target>]
wetterdienst readings --parameter=<parameter> --resolution=<resolution> --period=<period> --latitude= --longitude= [--number=] [--distance=] [--persist] [--date=<date>] [--sql=] [--format=<format>] [--target=<target>]
wetterdienst about [parameters] [resolutions] [periods]
wetterdienst about coverage [--parameter=<parameter>] [--resolution=<resolution>] [--period=<period>]
wetterdienst --version
wetterdienst (-h | --help)

Expand All @@ -26,7 +29,9 @@ Command line interface
--persist Save and restore data to filesystem w/o going to the network
--date=<date> Date for filtering data. Can be either a single date(time) or
an ISO-8601 time interval, see https://en.wikipedia.org/wiki/ISO_8601#Time_intervals.
--sql=<sql> SQL query to apply to DataFrame.
--format=<format> Output format. [Default: json]
--target=<target> Output target for storing data into different data sinks.
--version Show version information
--debug Enable debug messages
-h --help Show this screen
Expand Down Expand Up @@ -87,3 +92,52 @@ Command line interface
# Acquire stations and readings by geoposition, request stations within specific radius.
wetterdienst stations --resolution=daily --parameter=kl --period=recent --lat=49.9195 --lon=8.9671 --distance=25
wetterdienst readings --resolution=daily --parameter=kl --period=recent --lat=49.9195 --lon=8.9671 --distance=25 --date=2020-06-30

Examples using SQL filtering:

# Find stations by state.
wetterdienst stations --parameter=kl --resolution=daily --period=recent --sql="SELECT * FROM data WHERE state='Sachsen'"

# Find stations by name (LIKE query).
wetterdienst stations --parameter=kl --resolution=daily --period=recent --sql="SELECT * FROM data WHERE lower(station_name) LIKE lower('%dresden%')"

# Find stations by name (regexp query).
wetterdienst stations --parameter=kl --resolution=daily --period=recent --sql="SELECT * FROM data WHERE regexp_matches(lower(station_name), lower('.*dresden.*'))"

# Filter measurements: Display daily climate observation readings where the maximum temperature is below two degrees.
wetterdienst readings --station=1048,4411 --parameter=kl --resolution=daily --period=recent --sql="SELECT * FROM data WHERE element='temperature_air_max_200' AND value < 2.0;"

Examples for inquiring metadata:

# Display list of available parameters (air_temperature, precipitation, pressure, ...)
wetterdienst about parameters

# Display list of available resolutions (10_minutes, hourly, daily, ...)
wetterdienst about resolutions

# Display list of available periods (historical, recent, now)
wetterdienst about periods

# Display coverage/correlation between parameters, resolutions and periods.
# This can answer questions like ...
wetterdienst about coverage

# Tell me all periods and resolutions available for 'air_temperature'.
wetterdienst about coverage --parameter=air_temperature

# Tell me all parameters available for 'daily' resolution.
wetterdienst about coverage --resolution=daily

Examples for exporting data to databases:

# Shortcut command for fetching readings from DWD
alias fetch="wetterdienst readings --station=1048,4411 --parameter=kl --resolution=daily --period=recent"

# Store readings to DuckDB
fetch --target="duckdb://database=dwd.duckdb&table=weather"

# Store readings to InfluxDB
fetch --target="influxdb://localhost/?database=dwd&table=weather"

# Store readings to CrateDB
fetch --target="crate://localhost/?database=dwd&table=weather"
2 changes: 1 addition & 1 deletion docs/pages/development.rst
Expand Up @@ -50,6 +50,6 @@ This will inform you in case of problems with tests and your code format.
In order to run the tests more **quickly**::

poetry install --extras=excel
poetry install --extras=sql --extras=excel
poetry shell
pytest -vvvv -m "not (remote or slow)"
54 changes: 54 additions & 0 deletions example/sql.py
@@ -0,0 +1,54 @@
"""
=====
About
=====
Acquire measurement information from DWD and filter using SQL.
=====
Setup
=====
::
pip install wetterdienst[sql]
"""
import logging

from wetterdienst import DWDStationRequest, DataPackage
from wetterdienst import Parameter, PeriodType, TimeResolution

log = logging.getLogger()


def sql_example():

request = DWDStationRequest(
station_ids=[1048],
parameter=[Parameter.TEMPERATURE_AIR],
time_resolution=TimeResolution.HOURLY,
start_date="2019-01-01",
end_date="2020-01-01",
tidy_data=True,
humanize_column_names=True,
prefer_local=True,
write_file=True,
)

sql = "SELECT * FROM data WHERE element='temperature_air_200' AND value < -7.0;"
log.info(f"Invoking SQL query '{sql}'")

data = DataPackage(request=request)
data.lowercase_fieldnames()
df = data.filter_by_sql(sql)

print(df)


def main():
logging.basicConfig(level=logging.INFO)
sql_example()


if __name__ == "__main__":
main()
13 changes: 10 additions & 3 deletions noxfile.py
Expand Up @@ -11,15 +11,21 @@
@nox.session(python=["3.6", "3.7", "3.8"])
def tests(session):
"""Run tests."""
session.run("poetry", "install", "--no-dev", "--extras=excel", external=True)
install_with_constraints(session, "pytest", "pytest-notebook", "matplotlib", "mock")
session.run(
"poetry", "install", "--no-dev", "--extras=sql", "--extras=excel", external=True
)
install_with_constraints(
session, "pytest", "pytest-notebook", "matplotlib", "mock", "surrogate"
)
session.run("pytest")


@nox.session(python=["3.7"])
def coverage(session: Session) -> None:
"""Run tests and upload coverage data."""
session.run("poetry", "install", "--no-dev", "--extras=excel", external=True)
session.run(
"poetry", "install", "--no-dev", "--extras=sql", "--extras=excel", external=True
)
install_with_constraints(
session,
"coverage[toml]",
Expand All @@ -28,6 +34,7 @@ def coverage(session: Session) -> None:
"matplotlib",
"pytest-cov",
"mock",
"surrogate",
)
session.run("pytest", "--cov=wetterdienst", "tests/")
session.run("coverage", "xml")
Expand Down

0 comments on commit a5d80ef

Please sign in to comment.