# Managing timeseries and Pastas models with PastasProject<a id="top"></a>

This notebook shows how [Pastas](https://pastas.readthedocs.io/en/latest/) timeseries and models can be managed and stored on disk. Two storage systems are currently implemented:
- [Arctic](https://arctic.readthedocs.io/en/latest/) is a timeseries/dataframe database that sits atop [MongoDB](https://www.mongodb.com). Arctic supports pandas.DataFrames.
- [PyStore](https://github.com/ranaroussi/pystore) is a datastore (inspired by Arctic) created for storing pandas dataframes (especially timeseries) on disk. Data is stored using fastparquet and compressed with Snappy.

## Content
1. [Getting started](#1)
2. [The Connector objects](#2)
   1. [ArcticConnector](#2.1)
   2. [PystoreConnector](#2.2)
   3. [Database structure](#2.3)
3. [Initializing a PastasProject](#3)
4. [Managing timeseries](#4)
   1. [Adding oseries and stresses](#4.1)
   2. [Accessing timeseries and metadata](#4.2)
   3. [Deleting oseries and stresses](#4.3)
   4. [Overview of oseries and stresses](#4.4)
5. [Managing Pastas models](#5)
   1. [Creating a model](#5.1)
   2. [Storing a model](#5.2)
   3. [Loading a model](#5.3)
   4. [Overview of models](#5.4)
   5. [Deleting models](#5.5)
6. [Bulk operations](#6)
7. [Deleting database](#7)

<hr>


## [1. Getting started](#top)<a id="1"></a>

Use the following steps to get your PC ready for this notebook if you haven't done so already:

### Getting ready for Arctic
1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop).
2. Run `docker-compose up -d` in a terminal from the `./dockerfiles` directory.

### Getting ready for Pystore
1. Install Snappy (see the [Pystore github](https://github.com/ranaroussi/pystore#dependencies) page for for instructions). For Windows users see [this page](https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-snappy)
2. Install Pystore using pip: `pip install pystore`.


If no errors were encountered, you're all set. 

<hr>

## [2. The Connector objects](#top)<a id="2"></a>
This sections shows how to initialize a connection to a new database (connecting to an existing database works the same way).

Import `pastastore` and some other modules:

In [1]:
import os
import pandas as pd
import pastas as ps

import sys
sys.path.insert(1, "../..")

import pastastore as pst

### [2.1 ArcticConnector](#top)<a id="2.1"></a>

Provide information about the database. The connection string tells Arctic where the database is running (by default, if running locally the address is `mongodb://localhost:<port number>`. The project name is the user specified name for the database. 

If the database already exists, Arctic will connect to that existing database. In this case we're using a new database.

In [2]:
connstr = "mongodb://localhost:27017/"  # for docker container with name 'mongodb' running mongodb
name = "my_connector"

Initialize an ArcticConnector object. In this case the object initializes a new database and provides the connection to that database.

_Note: You can ignore the warnings arctic throws at you about enabling sharding._

In [3]:
conn = pst.ArcticConnector(name, connstr)



Let's take a look at `conn`. This shows us how many oseries, stresses and models are contained in the database:

In [4]:
conn

<ArcticConnector object> 'my_connector': 0 oseries, 0 stresses, 0 models

As you can see, the database is empty.

### [2.2 PystoreConnector](#top)<a id="2.2"></a>

The PystoreConnector requires the path to the directory containing the stores and a name for the connector. If the store already exists, pystore will link to that existing store. In this case we're creating a new store.

In [5]:
path = "./pystore"
name = "my_second_connector"

Initialize the PystoreConnector object:

In [6]:
conn2 = pst.PystoreConnector(name, path)

Let's take a look at `conn2`. This shows us how many oseries, stresses and models are contained in the store:

In [7]:
conn2

<PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 2 models

### [2.3 Database structure](#top)<a id="2.3"></a>

Regardless of the type of Connector that is used, the database/store contains 3 libraries or collections. Each of these contains specific data related to the project. The three libraries are:
- oseries
- stresses
- models

These libraries can be accessed through `conn.get_library()`:

In [8]:
# using the ArcticConnector
conn.get_library("oseries")

<VersionStore at 0x276dc6c2d68>
    <ArcticLibrary at 0x276dc3ff940, arctic_my_connector.oseries>
        <Arctic at 0x276d68506a0, connected to MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, maxpoolsize=4, sockettimeoutms=600000, connecttimeoutms=2000, serverselectiontimeoutms=30000)>

In [9]:
# using the PystoreConnector
conn2.get_library("stresses")

PyStore.collection <stresses>

The library handles are not generally used directly but internally they manage the reading, writing and deleting of data from the database/store. The two handles to the libraries above are completely different objects from two different packages (`arctic` and `pystore`). To understand what they're capable of and how they work please refer to the documentation of their respective packages.

<hr>

## [3. Initializing a PastaStore object](#top)<a id="3"></a>

The `PastaStore` object is used process and use the data in the database. The connector objects only manage the reading/writing/deleting of data. The `PastaStore` contains all kinds of methods to actually _do_ stuff with that data. 

In order to access the data the `PastaStore` object must be initialized with a Connector object. In this example, I'm using the `PystoreConnector`, but I could just as easily have used the `ArcticConnector`.

In [10]:
store = pst.PastaStore("my_first_project", conn2)

Let's take a look at the object:

In [11]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 2 models

In [12]:
store.conn.stresses

Unnamed: 0_level_0,x,y,kind,_updated
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
prec2,100000.0,400000.0,prec,2020-03-17 12:12:52.878861
evap2,100000.0,400000.0,evap,2020-03-17 12:12:52.896896
evap1,100300.0,400400.0,evap,2020-03-17 12:12:50.761864
prec1,100300.0,400400.0,prec,2020-03-17 12:12:50.728865


The most important thing to remember about the `PastaStore` is that the connector is accessible through `store.conn`. So all of the methods defined in the connector objects can be accessed through e.g. `store.conn.get_library`.

<hr>

## [4. Managing timeseries](#top)<a id="4"></a>

This section explains how timeseries can be added, retrieved or deleted from the database. We'll be using the `PastaStore` object we created before.

### [4.1 Adding oseries and stresses](#top)<a id="4.1"></a>

Let's read some data to put into the database as an oseries. The data we are using is in the `tests/data` directory.

In [13]:
datadir = "../../tests/data/"  # relative path to data directory
oseries1 = pd.read_csv(os.path.join(datadir, "head_nb1.csv"), index_col=0, parse_dates=True)
oseries1.head()

Unnamed: 0_level_0,head
date,Unnamed: 1_level_1
1985-11-14,27.61
1985-11-28,27.73
1985-12-14,27.91
1985-12-28,28.13
1986-01-13,28.32


Add the timeseries to the oseries library using `store.conn.add_oseries`. Metadata can be optionally be provided as a dictionary. In this example a dictionary x and y coordinates is passed as metadata which is convenient later for automatically creating Pastas models. 

In [14]:
store.conn.add_oseries(oseries1, "oseries1", metadata={"x": 100300, "y": 400400})

The series was added to the oseries library. Let's confirm by looking at the `store` object:

In [15]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 2 models

Stresses can be added similarly using `store.conn.add_stress`. The only thing to keep in mind when adding stresses is to pass the `kind` argument so that different types of stresses (i.e. precipitation or evaporation) can be distinguished. The code below reads the precipitation and evaporation csv-files and adds them to our project:

In [16]:
# prec 2
s = pd.read_csv(os.path.join(datadir, "rain_nb1.csv"), index_col=0, parse_dates=True)
store.conn.add_stress(s, "prec1", kind="prec", metadata={"x": 100300,
                                                    "y": 400400})

# evap 2
s = pd.read_csv(os.path.join(datadir, "evap_nb1.csv"), index_col=0, parse_dates=True)
store.conn.add_stress(s, "evap1", kind="evap", metadata={"x": 100300,
                                                    "y": 400400})

In [17]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 2 models

### [4.2 Accessing timeseries and metadata](#top)<a id="4.2"></a>

Timeseries can be accessed through `store.conn.get_oseries()` or `store.conn.get_stresses()`. These methods accept just a name or a list of names. In the latter case a list of dataframes is returned.

In [18]:
ts = store.conn.get_oseries("oseries1")
ts.head()

Unnamed: 0_level_0,head
date,Unnamed: 1_level_1
1985-11-14,27.61
1985-11-28,27.73
1985-12-14,27.91
1985-12-28,28.13
1986-01-13,28.32


Using a list of names:

In [19]:
stresses = store.conn.get_stresses(['prec1', 'evap1'])
stresses

{'prec1':               rain
 date              
 1980-01-01  0.0033
 1980-01-02  0.0025
 1980-01-03  0.0003
 1980-01-04  0.0075
 1980-01-05  0.0080
 ...            ...
 2016-10-27  0.0000
 2016-10-28  0.0000
 2016-10-29  0.0003
 2016-10-30  0.0000
 2016-10-31  0.0000
 
 [13454 rows x 1 columns], 'evap1':               evap
 date              
 1980-01-01  0.0002
 1980-01-02  0.0003
 1980-01-03  0.0002
 1980-01-04  0.0001
 1980-01-05  0.0001
 ...            ...
 2016-11-18  0.0004
 2016-11-19  0.0003
 2016-11-20  0.0005
 2016-11-21  0.0003
 2016-11-22  0.0005
 
 [13476 rows x 1 columns]}

The metadata of a timeseries can be accessed through `store.conn.get_metadata()`. Provide the library and the name to load the metadata for an oseries...

In [20]:
meta = store.conn.get_metadata('oseries', "oseries1")
meta

Unnamed: 0_level_0,x,y,_updated
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
oseries1,100300,400400,2020-03-17 13:01:52.854806


or for multiple stresses:

In [21]:
meta = store.conn.get_metadata('stresses', ["prec1", "evap1"])
meta

Unnamed: 0_level_0,x,y,kind,_updated
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
prec1,100300.0,400400.0,prec,2020-03-17 13:01:52.918807
evap1,100300.0,400400.0,evap,2020-03-17 13:01:52.958770


### [4.3 Deleting oseries and stresses](#top)<a id="4.3"></a>

Deleting timeseries can be done using `store.conn.del_oseries` or `store.conn.del_stresses`. These functions accept a single name or list of names of timeseries to delete.

### [4.4 Overview of oseries and stresses](#top)<a id="4.4"></a>

An overview of the oseries and stresses is available through `store.conn.oseries` and `store.conn.stresses`. These are dataframes containing the metadata of all the timeseries. These dataframes are cached for performance. The cache is cleared when a timeseries is added or modified in the database. 

In [22]:
store.conn.oseries

Unnamed: 0_level_0,x,y,_updated
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
oseries2,100000.0,400000.0,2020-03-17 12:12:52.860898
oseries1,100300.0,400400.0,2020-03-17 13:01:52.854806


In [23]:
store.conn.stresses

Unnamed: 0_level_0,x,y,kind,_updated
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
prec2,100000.0,400000.0,prec,2020-03-17 12:12:52.878861
evap2,100000.0,400000.0,evap,2020-03-17 12:12:52.896896
evap1,100300.0,400400.0,evap,2020-03-17 13:01:52.958770
prec1,100300.0,400400.0,prec,2020-03-17 13:01:52.918807


<hr>

## [5. Managing Pastas models](#top)<a id="5"></a>

This section shows how Pastas models can be created, stored, and loaded from the database.

### [5.1 Creating a model](#top)<a id="5.1"></a>
Creating a new model is straightforward using `pr.create_model()`. The `add_recharge` keyword argument allows the user to choose (default is True) whether recharge is automatically added to the model using the closest precipitation and evaporation stations in the stresses library.

In [24]:
ml = store.create_model("oseries1", add_recharge=True)
ml

INFO: Cannot determine frequency of series oseries1
INFO:pastas.timeseries:Cannot determine frequency of series oseries1
INFO: Inferred frequency from time series prec1: freq=D 
INFO:pastas.timeseries:Inferred frequency from time series prec1: freq=D 
INFO: Inferred frequency from time series evap1: freq=D 
INFO:pastas.timeseries:Inferred frequency from time series evap1: freq=D 


Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

### [5.2 Storing a model](#top)<a id="5.2"></a>
The model that was created in the previous step is not automatically stored in the models library. Use `store.conn.add_model()` to store the model. If the model already exists, an Exception is raised warning the user the model is already in the library. Use `add_version=True` to add the model anyway.

**Note:**
The model is stored without the timeseries. It is assumed the timeseries are already stored in the oseries or stresses libraries, making it redundant to store these again in most cases. Obviously this has the potential downside that modifications to a timeseries prior to using it in a model will not be saved. In this implementation, the user is expected to add a new timeseries under a new name or version to the oseries and stresses libraries and create a new model using that data.

In [25]:
store.conn.add_model(ml, add_version=True)

### [5.3 Loading a model](#top)<a id="5.3"></a>

Loading a stored model is simple using `store.conn.get_models()`.

The model is stored as a dictionary (see `ml.to_dict()`) without the timeseries data. The timeseries in the model are picked up based on the names of those series from the respective libraries (oseries or stresses).

In [26]:
ml2 = store.conn.get_models("oseries1")
ml2

INFO: Cannot determine frequency of series oseries1
INFO:pastas.timeseries:Cannot determine frequency of series oseries1


Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

### [5.4 Overview of models](#top)<a id="5.4"></a>

An overview of the models is available through `store.conn.models` which lists the names of all the models:

In [27]:
store.conn.models

{'oseries1', 'oseries2'}

### [5.5 Deleting models](#top)<a id="5.5"></a>

Deleting the model is done with `store.conn.del_models`:

In [28]:
store.conn.del_models("oseries1")

Checking to see if it was indeed deleted:

In [29]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 1 models

In [30]:
store.conn.models

{'oseries2'}

<hr>

## [6. Bulk operations](#top)<a id="5"></a>

The following bulk operations are available:
- `create_models`: create models for all or a selection of oseries in database
- `solve_models`: solve all or selection of models in database
- `model_results`: get results for all or selection of models in database. Requires the `art_tools` module!

Let's add some more data to the pystore to show how the bulk operations work.

In [31]:
# oseries 2
o = pd.read_csv(os.path.join(datadir, "obs.csv"), index_col=0, parse_dates=True)
o.index.name = "oseries2"
store.conn.add_oseries(o, "oseries2", metadata={"x": 100000,
                                           "y": 400000})

# prec 2
s = pd.read_csv(os.path.join(datadir, "rain.csv"), index_col=0, parse_dates=True)
store.conn.add_stress(s, "prec2", kind="prec", metadata={"x": 100000,
                                                    "y": 400000})

# evap 2
s = pd.read_csv(os.path.join(datadir, "evap.csv"), index_col=0, parse_dates=True)
store.conn.add_stress(s, "evap2", kind="evap", metadata={"x": 100000,
                                                    "y": 400000})

Let's take a look at our `PastaStore`:

In [32]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 1 models


Let's try using the bulk methods on our database. The `pr.create_models()` method allows the user to get models for all or a selection of oseries in the database. Options include:
- selecting specific oseries to create models for
- automatically adding recharge based on nearest precipitation and evaporation stresses
- solving the models
- storing the models in the models library

**Note**: when using the progressbar, for a prettier result the pastas log level should be set to ERROR using: `ps.set_log_level("ERROR")` or `ps.logger.setLevel("ERROR")`.

In [33]:
# to suppress most of the log messages
ps.logger.setLevel("ERROR")

In [34]:
mls = store.create_models(store=True)

100%|██████████| 2/2 [00:00<00:00,  6.77it/s]


To solve all or a selection of models use `pr.solve_models()`. Options for this method include:
- selecting models to solve
- store results in models library
- raise error (or not) when solving fails
- print solve reports

In [35]:
store

<PastasProject> my_first_project: 
 - <PystoreConnector object> 'my_second_connector': 2 oseries, 4 stresses, 2 models

In [36]:
store.solve_models(store_result=True, report=False)

100%|██████████| 2/2 [00:01<00:00,  1.53it/s]


Obtaining the model results (parameters, EVP and some other statistics) requires the `art_tools` module. Results can be obtained for all or a selection of models. The result is a DataFrame with the results:

In [37]:
results = store.model_results()
results.transpose()

flopy is installed in c:\github\flopy\flopy
100%|██████████| 2/2 [00:00<00:00,  4.72it/s]


Unnamed: 0,oseries2,oseries1
recharge_A,600.68,683.214
recharge_n,1.02046,1.01644
recharge_a,143.071,151.557
recharge_f,-1.3681,-1.27579
constant_d,28.0242,27.8888
noise_alpha,65.2511,49.721
recharge_A_stderr,125.12,35.7075
recharge_n_stderr,0.0404976,0.0181703
recharge_a_stderr,30.5517,11.4023
recharge_f_stderr,0.174908,0.0608557


## [7. Deleting databases](#top)<a id="7"></a>

The `pystore_pastas.util` submodule contains functions for deleting databases:

In [38]:
pst.util.delete_arctic(conn.connstr, conn.name)

Deleting database: 'my_connector' ...
 - deleted: my_connector.oseries
 - deleted: my_connector.stresses
 - deleted: my_connector.models


In [40]:
pst.util.delete_pystore(conn2.path, conn2.name)

Deleting pystore: 'my_second_connector' ... Done!
