In [None]:
from rich import print
import logging

logging.basicConfig(level=INFO)

# Combining data + CLI

Here we show how to combine multiple datasets, export them to recipes, and run
these from the command line.

## Combining datasets

In the previous section we've seen that one of the main goals of springtime is
to harmonize datasets from different sources, such that we can easily combine
them. Here, we walk through an example with data from PEP725 and EOBS to show
how this is done.

We start with basic observations from PEP725.


In [None]:
from springtime.datasets import PEP725Phenor
from springtime.utils import germany

pep725 = PEP725Phenor(
    species="Syringa vulgaris",
    years=[2000, 2002],
    area=germany,
)

df_pep725 = pep725.load()

Next, we want to find matching meteo data from E-OBS:


In [None]:
from springtime.datasets import EOBS
from springtime.utils import PointsFromOther

eobs = EOBS(
    area=germany,
    years=["2000", "2002"],
    variables=["mean_temperature", "minimum_temperature"],
    resample={"frequency": "M", "operator": "mean"},
    points=PointsFromOther(source="pep725"),
)

Notice that we're using a special object called `PointsFromOther`. This helper object can retrieve the records from our pep725 dataset, and use those to subselect the E-OBS data. To this end, we call the `get_points` method with the pep725 dataframe as input. This seems convoluted, but as we will see later, it will help to write very succinct recipes.


In [None]:
eobs.points.get_points(df_pep725)
df_eobs = eobs.load()

Now, we're ready to join our dataframes.


In [18]:
from springtime.utils import join_dataframes

join_dataframes([df_pep725, df_eobs])

Unnamed: 0_level_0,Unnamed: 1_level_0,day,mean_temperature|31,mean_temperature|59,mean_temperature|60,mean_temperature|90,mean_temperature|91,mean_temperature|120,mean_temperature|121,mean_temperature|151,mean_temperature|152,...,minimum_temperature|243,minimum_temperature|244,minimum_temperature|273,minimum_temperature|274,minimum_temperature|304,minimum_temperature|305,minimum_temperature|334,minimum_temperature|335,minimum_temperature|365,minimum_temperature|366
year,geometry,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2000,POINT (10.00000 49.48330),129,0.323548,,3.664483,,5.358709,,9.966999,,14.801293,...,,12.718388,,9.747333,,7.286451,,2.840333,,0.369677
2000,POINT (10.00000 50.85000),120,0.943226,,3.795517,,5.660645,,10.001336,,14.421612,...,,11.400322,,9.931665,,6.439678,,2.687333,,-0.179032
2000,POINT (10.00000 51.71670),116,1.694194,,4.053448,,5.399354,,10.563000,,14.321937,...,,12.211291,,10.457333,,6.996452,,3.725999,,1.026129
2000,POINT (10.00000 52.10000),120,2.531935,,4.937242,,5.771289,,10.993333,,14.817741,...,,12.660645,,11.131998,,8.224839,,5.002000,,2.268064
2000,POINT (10.00000 53.08330),121,2.119677,,3.988276,,4.812258,,9.906999,,14.663547,...,,11.897419,,10.317667,,7.165806,,3.831666,,0.932581
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2002,POINT (9.96667 50.15000),120,0.033871,5.121071,,5.777419,,8.711332,,13.648706,,...,14.005805,,7.406667,,4.896451,,3.722333,,-1.399677,
2002,POINT (9.96667 50.95000),131,0.667097,5.050714,,4.759677,,7.534667,,13.160967,,...,13.421612,,6.840666,,4.556774,,3.135667,,-2.176774,
2002,POINT (9.96667 52.81670),131,2.531290,4.775357,,4.818065,,7.735334,,14.055484,,...,15.073547,,9.269666,,4.023226,,2.149333,,-3.318065,
2002,POINT (9.98333 49.76670),118,0.221613,5.803572,,6.415483,,9.338666,,14.178065,,...,14.762579,,8.864333,,5.864517,,4.532332,,-0.294839,


## From datasets to workflow

We've already had a sneak preview of yaml for indivual datasets. We can also combine the two datasets into what we call a "workflow".


In [None]:
from springtime.main import Workflow, Session

workflow = Workflow(datasets={"pep725": pep725, "eobs": eobs})
print(workflow)

To execute the workflow we first create a session and set the log level to info. This will provide a bit more info about the progress and the data will automatically be stored in a dedicated output folder.


In [21]:
session = Session()
workflow.execute(session)

INFO:springtime.main:Dataset pep725 loaded with 4723 rows
INFO:springtime.datasets.meteo.eobs:Locating data
INFO:springtime.datasets.meteo.eobs:Looking for variable mean_temperature in period 2000-2002...
INFO:springtime.datasets.meteo.eobs:Found /home/peter/.cache/springtime/e-obs/tg_ens_mean_0.1deg_reg_1995-2010_v26.0e.nc
INFO:springtime.datasets.meteo.eobs:Looking for variable minimum_temperature in period 2000-2002...
INFO:springtime.datasets.meteo.eobs:Found /home/peter/.cache/springtime/e-obs/tn_ens_mean_0.1deg_reg_1995-2010_v26.0e.nc


INFO:springtime.main:Dataset eobs loaded with 4723 rows
INFO:springtime.main:Datasets joined to shape: (4729, 47)
INFO:springtime.main:Data saved to: /tmp/output/data.csv


Workflows can also be represented in recipes.


In [20]:
recipe = workflow.to_recipe()
print(recipe)

## Springtime's command line interface

Springtime recipes can also be executed from the command line. If we saved the recipe above as `recipe_pep_eobs.yaml`, we could execute it as follows:

```bash
springtime recipe_pep_eobs.yaml
```

The springtime command is available after (pip) installing springtime in your python environment.

Executing recipes from the command line makes it easy to automate tasks or submit them as long-running compute jobs.
