# Parallel processing with Pastastore

This notebook shows parallel processing capabilities of `PastaStore`.


<div class="alert alert-warning">

<strong>Note:</strong> Parallel processing is platform dependent and may not
always work. The current implementation works well for Linux users, though this
will likely change with Python 3.13 and higher. For Windows users, parallel
solving does not work when called directly from Jupyter Notebooks or IPython.
To use parallel solving on Windows, the following code should be used in a
Python file:

<pre><code class="python">
from multiprocessing import freeze_support

if __name__ == "__main__":
    freeze_support()
    pstore.apply("models", some_func, parallel=True)
</code></pre>

</div>

In [1]:
import pastas as ps

import pastastore as pst
from pastastore.datasets import example_pastastore

ps.logger.setLevel("ERROR")  # silence Pastas logger for this notebook
pst.show_versions()

Pastastore version : 1.7.2

Python version     : 3.11.10
Pandas version     : 2.2.2
Matplotlib version : 3.9.2
Pastas version     : 1.7.0
PyYAML version     : 6.0.2





## Example pastastore

Load some example data, create models and solve them to showcase parallel processing.

In [2]:
# get the example pastastore
conn = pst.PasConnector("my_connector", "./temp")
# conn = pst.ArcticDBConnector("my_connector", "lmdb://./temp")
pstore = example_pastastore(conn)
pstore.create_models_bulk();

PasConnector: library 'oseries' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/oseries'
PasConnector: library 'stresses' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/stresses'
PasConnector: library 'models' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/models'
PasConnector: library 'oseries_models' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/oseries_models'


Bulk creation models:   0%|          | 0/5 [00:00<?, ?it/s]

## Solving models

The `PastaStore.solve_models()` method supports parallel processing.

In [3]:
pstore.solve_models(parallel=True)

Solving models (parallel):   0%|          | 0/5 [00:00<?, ?it/s]

## Parallel processing using `.apply()`

Define some function that takes a name as input and returns some result. In this case,
return the $R^2$ value for each model.

In [4]:
def rsq(model_name: str) -> float:
    """Compute the R-squared value of a Pastas model."""
    ml = pstore.get_models(model_name)
    return ml.stats.rsq()

We can apply this function to all models in the pastastore using `pstore.apply()`. 
By default this function is run sequentially. 

In [5]:
pstore.apply("models", rsq, progressbar=True)

Applying rsq:   0%|          | 0/5 [00:00<?, ?it/s]

head_mw     0.159352
head_nb5    0.438129
oseries2    0.931883
oseries1    0.904480
oseries3    0.030468
dtype: float64

In order to run this function in parallel, set `parallel=True` in the keyword arguments.

In [6]:
pstore.apply("models", rsq, progressbar=True, parallel=True)

Applying rsq (parallel):   0%|          | 0/5 [00:00<?, ?it/s]

head_mw     0.159352
head_nb5    0.438129
oseries2    0.931883
oseries1    0.904480
oseries3    0.030468
dtype: float64

## Get model statistics

The function `pstore.get_statistics` also supports parallel processing.

In [7]:
pstore.get_statistics(["rsq", "mae"])

Unnamed: 0,rsq,mae
head_mw,0.159352,0.631499
head_nb5,0.438129,0.318361
oseries2,0.931883,0.08707
oseries1,0.90448,0.091339
oseries3,0.030468,0.106254


In [8]:
pstore.get_statistics(["rsq", "mae"], parallel=True)

Unnamed: 0_level_0,rsq,mae
_get_statistics,Unnamed: 1_level_1,Unnamed: 2_level_1
head_mw,0.159352,0.631499
head_nb5,0.438129,0.318361
oseries2,0.931883,0.08707
oseries1,0.90448,0.091339
oseries3,0.030468,0.106254


## Compute prediction intervals

Let's try using a more complex function and passing that to apply to use
parallel processing. In this case we want to compute the prediction interval,
and pass along the $\alpha$ value via the keyword arguments.

In [9]:
def prediction_interval(model_name, **kwargs):
    """Compute the prediction interval for a Pastas model."""
    ml = pstore.get_models(model_name)
    return ml.solver.prediction_interval(**kwargs)

In [10]:
pstore.apply("models", prediction_interval, kwargs={"alpha": 0.05})

Applying prediction_interval:   0%|          | 0/5 [00:00<?, ?it/s]

Unnamed: 0_level_0,head_mw,head_mw,head_nb5,head_nb5,oseries2,oseries2,oseries1,oseries1,oseries3,oseries3
Unnamed: 0_level_1,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975
1960-04-29,6.255135,9.433007,,,,,,,,
1960-04-30,6.269678,9.418478,,,,,,,,
1960-05-01,6.269093,9.446798,,,,,,,,
1960-05-02,6.300421,9.496691,,,,,,,,
1960-05-03,6.238175,9.458558,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
2020-01-17,,,7.958785,9.637916,,,,,,
2020-01-18,,,7.945845,9.633597,,,,,,
2020-01-19,,,7.960407,9.672532,,,,,,
2020-01-20,,,7.956232,9.653112,,,,,,


In [11]:
pstore.apply("models", prediction_interval, kwargs={"alpha": 0.05}, parallel=True)

Applying prediction_interval (parallel):   0%|          | 0/5 [00:00<?, ?it/s]

Unnamed: 0_level_0,head_mw,head_mw,head_nb5,head_nb5,oseries2,oseries2,oseries1,oseries1,oseries3,oseries3
Unnamed: 0_level_1,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975
1960-04-29,6.240644,9.460150,,,,,,,,
1960-04-30,6.349329,9.506166,,,,,,,,
1960-05-01,6.247266,9.401046,,,,,,,,
1960-05-02,6.175220,9.274749,,,,,,,,
1960-05-03,6.127692,9.413533,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
2020-01-17,,,7.920101,9.642716,,,,,,
2020-01-18,,,7.909466,9.597625,,,,,,
2020-01-19,,,7.962732,9.637139,,,,,,
2020-01-20,,,7.870152,9.619891,,,,,,


## Load models

Load models in parallel.

In [12]:
pstore.apply("models", pstore.get_models, fancy_output=True)

Applying get_models:   0%|          | 0/5 [00:00<?, ?it/s]

{'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False)}

The `max_workers` keyword argument sets the number of workers that are spawned. The default value is often fine, but it can be set explicitly.

The following works for `PasConnector`. See alternative code below for `ArcticDBConnector`.  

In [13]:
pstore.apply(
    "models", pstore.get_models, fancy_output=True, parallel=True, max_workers=5
)

Applying get_models (parallel):   0%|          | 0/5 [00:00<?, ?it/s]

{'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False)}

For `ArcticDBConnector` the underlying objects that manage the database connection cannot be pickled. Therefore, passing a method directly from the `PastaStore` or `ArcticDBConnector` classes will not work in parallel mode. 

The solution is to write a simple function that assumes there is global connector object `conn` and use that to obtain data from the database.

In [14]:
# Simple function to get models from database
def get_model(model_name):
    return conn.get_model(model_name)

In [15]:
pstore.apply("models", get_model, fancy_output=True, parallel=True, max_workers=5)

Applying get_model (parallel):   0%|          | 0/5 [00:00<?, ?it/s]

{'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False)}

Clean up temporary pastastore.

In [16]:
pst.util.delete_pastastore(pstore)

Deleting PasConnector database: 'my_connector' ...  Done!
