# Pastas Projects using PyStore<a id="top"></a>

This notebook shows how [Pastas](https://pastas.readthedocs.io/en/latest/) timeseries and models can be managed and stored using Pystore. 

## Content
1. [Getting started](#1)
2. [The PystorePastas object](#2)
3. [Managing timeseries](#3)
  1. [Accessing timeseries and metadata](#3.1)
  2. [Adding oseries and stresses](#3.2)
  3. [Overview of oseries and stresses](#3.3)
4. [Managing Pastas models](#4)
  1. [Creating a model](#4.1)
  2. [Storing a model](#4.2)
  3. [Loading a model](#4.3)
  4. [Overview of models](#4.4)
5. [Bulk operations](#5)

<hr>

## [1. Getting started](#top)<a id="1"></a>

Use the following steps to get your PC ready for this notebook:

1. Install Snappy (see the [Pystore Github](https://github.com/ranaroussi/pystore) for for instructions). For Windows users see [this page](https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-snappy)
2. Install Pystore using pip: `pip install pystore`.

<hr>

## [2. The PystorePastas object](#top)<a id="2"></a>
This sections shows how to initialize an `PystorePastas` object to create a new store or link to an existing one.

Import `PystorePastas` and some other modules:

In [1]:
import pandas as pd
import pastas as ps

import sys
sys.path.insert(1, "../..")

from pastas_projects import PystorePastas

  from pandas import DataFrame, Series, Panel


Set the pastas logger to be a little more quiet

In [2]:
ps.set_log_level("ERROR")

Path to the pystore. This is the directory in which the pystore is located, or where it will be created.

In [3]:
path = "C:/Github/traval/extracted_data/pystore"

Initialize the PystorePastas object. In this case we are linking to existing stores. If the stores do not yet exist they will be created.

In [4]:
pr = PystorePastas("aaenmaas", path, oseries_store="aaenmaas", stresses_store="knmi2", model_store="models")

Let's take a look at `pr`. This shows us how many oseries, stresses and models are contained in the stores:

In [5]:
pr

<PystorePastas object> 'aaenmaas': 789 oseries, 27 stresses, 0 models

The PystorePastas object consists of three stores:
- oseries
- stresses
- models

The store handles can be obtained through `pr.get_store()` or through the attributes `pr.lib_<storename>`:


In [6]:
oseries_store = pr.get_store("oseries")
oseries_store

PyStore.datastore <C:\Github\traval\extracted_data\pystore\aaenmaas>

In [7]:
pr.lib_oseries

PyStore.datastore <C:\Github\traval\extracted_data\pystore\aaenmaas>

The oseries and stresses stores consist of collections which each represent a location. A collection is essentially a folder containing timeseries data. In a collection there are one or multiple items. Each item represents a DataFrame of Series.

<hr>

## [3. Managing timeseries](#top)<a id="3"></a>

This section explains how timeseries can be added or loaded from the database. 

### [3.1 Accessing timeseries and metadata](#top)<a id="3.1"></a>

Timeseries metadata can be accessed through `pr.get_metadata()`. Provide the store to load the data from with the `kind` parameter.

In [8]:
ts = pr.get_metadata(['103JVM_boven_O','103KBC_beneden_O', '103SZB_beneden_O', '114DZS_O', '253GC_boven_O'], 
                     item="GW.meting.totaalcorrectie", kind="oseries")

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1007.71it/s]


Timeseries can be accessed through `pr.get_oseries()` or `pr.get_stresses()`. These methods accept single names or lists of names. In the latter case a dictionary of dataframes is returned.

In [9]:
oseries = pr.get_oseries(['103JVM_boven_O','103KBC_beneden_O', '103SZB_beneden_O', '114DZS_O', '253GC_boven_O'], 
                         item="GW.meting.totaalcorrectie")

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.42it/s]


In [10]:
stresses = pr.get_stresses(["356", "896"], item=None)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 90.97it/s]


### [3.2 Adding oseries and stresses](#top)<a id="3.2"></a>

Since this notebook is using an existing store, I'm using an existing timeseries to illustrate the adding of new timeseries.

In [11]:
pr.add_oseries(oseries['103JVM_boven_O'], collection="test", item="test", metadata={})

And to delete that same timeseries

In [12]:
pr.del_oseries("test")

The same methods exist for stresses: `pr.add_stress` and `pr.del_stress`.

### [3.3 Overview of oseries and stresses](#top)<a id="3.3"></a>

An overview of the oseries and stresses is available through `pr.oseries` and `pr.stresses`. These are dataframes containing the metadata of all the timeseries. These dataframes are cached for performance. The cache is cleared when a timeseries is added or modified in the database. 

In [13]:
pr.oseries.head(3)

Unnamed: 0,type,locationId,parameterId,timeStep,startDate,endDate,missVal,longName,x,y,units,sourceOrganisation,sourceSystem,fileDescription,region,_updated,moduleInstanceId,stationName,lat,lon
103JVM_boven_O,instantaneous,103JVM_boven_O,G.meting.inhangdiepte,{'unit': 'nonequidistant'},"{'date': '2015-09-10', 'time': '18:00:00'}","{'date': '2019-06-21', 'time': '08:00:00'}",,,162831.0,417973.0,m,Artesia,art_diver03.582b,,,2019-09-12 13:01:27.856126,,,,
103KBC_beneden_O,instantaneous,103KBC_beneden_O,G.meting.inhangdiepte,{'unit': 'nonequidistant'},"{'date': '2015-09-10', 'time': '18:00:00'}","{'date': '2019-06-21', 'time': '09:00:00'}",,,166688.0,416940.0,m,Artesia,art_diver03.582b,,,2019-09-12 13:01:28.235103,,,,
103SZB_beneden_O,instantaneous,103SZB_beneden_O,G.meting.inhangdiepte,{'unit': 'nonequidistant'},"{'date': '2015-09-10', 'time': '18:00:00'}","{'date': '2019-06-21', 'time': '08:00:00'}",,,164713.0,418134.0,m,Artesia,art_diver03.582b,,,2019-09-12 13:01:28.571129,,,,


In [14]:
pr.stresses.head()

Unnamed: 0,x,y,station,name,kind,group,_updated
356,138628,428968,356,EV24 Herwijnen,evap,knmi_data,2019-11-22 18:06:08.878006
370,157018,384446,370,EV24 Eindhoven,evap,knmi_data,2019-11-22 18:06:08.896000
375,176615,406739,375,EV24 Volkel,evap,knmi_data,2019-11-22 18:06:08.916002
377,181488,356705,377,EV24 Ell,evap,knmi_data,2019-11-22 18:06:08.933001
380,182824,325194,380,EV24 Maastricht,evap,knmi_data,2019-11-22 18:06:08.951008


<hr>

## [4. Managing Pastas models](#top)<a id="4"></a>

This section shows how Pastas models can be created, stored, and loaded from the pystore.

### [4.1 Creating a model](#top)<a id="4.1"></a>

Creating a new model is straightforward using `pr.create_model()`. The `add_recharge` keyword argument allows the user to choose (default is True) whether recharge is automatically added to the model using the closest precipitation and evaporation stations in the stresses library. Pass both the location (collection) and timeseries (item) to specify which oseries to use to build the model.

In [15]:
ml = pr.create_model('103JVM_boven_O', "GW.meting.totaalcorrectie", add_recharge=True)
ml

Model(oseries=103JVM_boven_O, name=103JVM_boven_O, constant=True, noisemodel=True)

### [4.2 Storing a model](#top)<a id="4.2"></a>
The model that was created in the previous step is not automatically stored in the model store. Use `pr.add_model()` to store the model. The overwrite keyword argument allows the user to overwrite an existing model with the same name.

**Note:**
The model is stored without the timeseries. It is assumed the timeseries are already stored in the oseries or stresses stores, making it redundant to store these again in most cases. Obviously this has the potential downside that modifications to a timeseries prior to using it in a model will not be saved. In this implementation, the user is expected to add a new timeseries under a new name or version to the oseries and stresses stores and create a new model using that data.

In [16]:
pr.add_model(ml)

As we can see, the project now contains one model:

In [17]:
pr

<PystorePastas object> 'aaenmaas': 789 oseries, 27 stresses, 1 models

### [4.3 Loading a model](#top)<a id="4.3"></a>

Loading a stored model is simple using `pr.get_models()`.

The model is stored as a dictionary in the metadata of an item (see `ml.to_dict()`) without the timeseries data. The timeseries in the model are picked up based on the names of those series from the respective stores (oseries or stresses). The data that is stored with the model metadata is a dummy empty dataframe to allow the storage of the model with the Pystore library. This slightly 'hacky' method allows everything to be stored within the same directory.

In [18]:
ml2 = pr.get_models(ml.name)
ml2

Model(oseries=103JVM_boven_O, name=103JVM_boven_O, constant=True, noisemodel=True)

### [4.4 Overview of models](#top)<a id="4.4"></a>

An overview of the models is available through `pr.models` which lists the names of all the models:

In [19]:
pr.models

['103JVM_boven_O']

Deleting a model is done using `pr.del_model()`:

In [20]:
pr.del_model(ml2.name)

<hr>

## [5. Bulk operations](#top)<a id="5"></a>

The following bulk operations are available:
- `create_models`: create models for all oseries in database
- `solve_models`: solve all or selection of models in database
- `model_results`: get results for all or selection of models in database

The `pr.create_models()` method allows the user to get models for all or a selection of oseries in the database. Options include:
- selecting specific oseries to create models for
- automatically adding recharge based on nearest precipitation and evaporation stresses
- solving the models
- storing the models in the models library

These methods work virtually the same as the ArcticPastas methods. The only difference is that the item name must be passed to determine which timeseries should be loaded for each location.

**Note**: when using the progressbar (default), for a prettier result the pastas log level should be turned off or set to ERROR using: `ps.set_log_level("ERROR")`.

In [21]:
mls = pr.create_models(oseries=["WIJB020_G", "STRA001_G"], item="GW.meting.totaalcorrectie", store=True)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.11it/s]


To solve all or a selection of models use `pr.solve_models()`. Options for this method include:
- selecting models to solve
- store results in models library
- raise error (or not) when solving fails
- print solve reports

In [22]:
pr.solve_models(mls=["WIJB020_G", "STRA001_G"], store_result=True)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.09it/s]


Obtaining the model results (parameters, EVP and some other statistics) requires the art_tools module. Results can be obtained for all or a selection of models. The result is a DataFrame with the results:

In [23]:
results = pr.model_results(mls=["WIJB020_G", "STRA001_G"])
results.T

flopy is installed in C:\Users\dbrak\Anaconda3\lib\site-packages\flopy


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.58it/s]


Unnamed: 0,WIJB020_G,STRA001_G
recharge_A,50.8946,779.352
recharge_n,1.70619,1.38515
recharge_a,2.95011,80.6056
recharge_f,-1.05884,-0.720349
constant_d,6.98767,17.1878
noise_alpha,164.205,4998.12
recharge_A_stderr,0.472425,25.5664
recharge_n_stderr,0.00646726,0.00399895
recharge_a_stderr,0.0296395,1.9844
recharge_f_stderr,0.0292283,0.0171425


In [24]:
for m in results.index:
    pr.del_model(m)

In [25]:
pr

<PystorePastas object> 'aaenmaas': 789 oseries, 27 stresses, 0 models