# Parallel processing with Pastastore

This notebook shows parallel processing capabilities of `PastaStore`.


<div class="alert alert-warning">

<strong>Note</strong> 

Parallel processing is platform dependent and may not
always work. The current implementation works well for Linux users, though this
will likely change with Python 3.13 and higher. For Windows users, parallel
solving does not work when called directly from Jupyter Notebooks or IPython.
To use parallel solving on Windows, the following code should be used in a
Python file. 

</div>

```python
from multiprocessing import freeze_support

if __name__ == "__main__":
    freeze_support()
    pstore.apply("models", some_func, parallel=True)
```

In [1]:
import pastas as ps

import pastastore as pst
from pastastore.datasets import example_pastastore

ps.logger.setLevel("ERROR")  # silence Pastas logger for this notebook
pst.show_versions()

Pastastore version : 1.11.0.dev0

Python version     : 3.13.5
Pandas version     : 2.3.1
Matplotlib version : 3.10.5
Pastas version     : 1.10.0
PyYAML version     : 6.0.2





## Example pastastore

Load some example data, create models and solve them to showcase parallel processing.

In [2]:
# get the example pastastore
conn = pst.PasConnector("my_connector", "./temp")
# conn = pst.ArcticDBConnector("my_connector", "lmdb://./temp")
pstore = example_pastastore(conn)
pstore.create_models_bulk();

PasConnector: library 'oseries' already exists. Linking to existing directory: '/home/vonkm/repos/pastastore/examples/notebooks/temp/my_connector/oseries'
PasConnector: library 'stresses' already exists. Linking to existing directory: '/home/vonkm/repos/pastastore/examples/notebooks/temp/my_connector/stresses'
PasConnector: library 'models' already exists. Linking to existing directory: '/home/vonkm/repos/pastastore/examples/notebooks/temp/my_connector/models'
PasConnector: library 'oseries_models' already exists. Linking to existing directory: '/home/vonkm/repos/pastastore/examples/notebooks/temp/my_connector/oseries_models'


Bulk creation models: 100%|██████████| 5/5 [00:00<00:00, 21.43it/s]


## Solving models

The `PastaStore.solve_models()` method supports parallel processing.

In [3]:
pstore.solve_models(parallel=True)

Solving models (parallel): 100%|██████████| 5/5 [00:00<00:00, 17.32it/s]


## Parallel processing using `.apply()`

Define some function that takes a name as input and returns some result. In this case,
return the $R^2$ value for each model.

In [4]:
def rsq(model_name: str) -> float:
    """Compute the R-squared value of a Pastas model."""
    ml = pstore.get_models(model_name)
    return ml.stats.rsq()

We can apply this function to all models in the pastastore using `pstore.apply()`. 
By default this function is run sequentially. 

In [5]:
pstore.apply("models", rsq, progressbar=True)

Applying rsq: 100%|██████████| 5/5 [00:00<00:00, 28.98it/s]


head_nb5    0.438129
oseries1    0.904480
oseries3    0.030468
head_mw     0.159352
oseries2    0.931883
dtype: float64

In order to run this function in parallel, set `parallel=True` in the keyword arguments.

In [6]:
pstore.apply("models", rsq, progressbar=True, parallel=True)

Applying rsq (parallel): 100%|██████████| 5/5 [00:00<00:00, 66.07it/s]


head_nb5    0.438129
oseries1    0.904480
oseries3    0.030468
head_mw     0.159352
oseries2    0.931883
dtype: float64

## Get model statistics

The function `pstore.get_statistics` also supports parallel processing.

In [7]:
pstore.get_statistics(["rsq", "mae"])

Unnamed: 0,rsq,mae
head_nb5,0.438129,0.318361
oseries1,0.90448,0.091339
oseries3,0.030468,0.106254
head_mw,0.159352,0.631499
oseries2,0.931883,0.08707


In [8]:
pstore.get_statistics(["rsq", "mae"], parallel=True)

Unnamed: 0_level_0,rsq,mae
_get_statistics,Unnamed: 1_level_1,Unnamed: 2_level_1
head_nb5,0.438129,0.318361
oseries1,0.90448,0.091339
oseries3,0.030468,0.106254
head_mw,0.159352,0.631499
oseries2,0.931883,0.08707


## Compute prediction intervals

Let's try using a more complex function and passing that to apply to use
parallel processing. In this case we want to compute the prediction interval,
and pass along the $\alpha$ value via the keyword arguments.

In [9]:
def prediction_interval(model_name, **kwargs):
    """Compute the prediction interval for a Pastas model."""
    ml = pstore.get_models(model_name)
    return ml.solver.prediction_interval(**kwargs)

In [10]:
pstore.apply("models", prediction_interval, kwargs={"alpha": 0.05})

Applying prediction_interval: 100%|██████████| 5/5 [00:09<00:00,  2.00s/it]


Unnamed: 0_level_0,head_nb5,head_nb5,oseries1,oseries1,oseries3,oseries3,head_mw,head_mw,oseries2,oseries2
Unnamed: 0_level_1,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975
1960-04-29,,,,,,,6.313584,9.462565,,
1960-04-30,,,,,,,6.228746,9.437007,,
1960-05-01,,,,,,,6.174325,9.421278,,
1960-05-02,,,,,,,6.409471,9.478547,,
1960-05-03,,,,,,,6.176680,9.544383,,
...,...,...,...,...,...,...,...,...,...,...
2020-01-17,7.925080,9.679467,,,,,,,,
2020-01-18,7.983536,9.645777,,,,,,,,
2020-01-19,7.906361,9.603561,,,,,,,,
2020-01-20,7.892349,9.672112,,,,,,,,


In [11]:
pstore.apply("models", prediction_interval, kwargs={"alpha": 0.05}, parallel=True)

Applying prediction_interval (parallel): 100%|██████████| 5/5 [00:05<00:00,  1.10s/it]


Unnamed: 0_level_0,head_nb5,head_nb5,oseries1,oseries1,oseries3,oseries3,head_mw,head_mw,oseries2,oseries2
Unnamed: 0_level_1,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975,0.025,0.975
1960-04-29,,,,,,,6.363005,9.496189,,
1960-04-30,,,,,,,6.221109,9.525906,,
1960-05-01,,,,,,,6.252436,9.402250,,
1960-05-02,,,,,,,6.094721,9.480736,,
1960-05-03,,,,,,,6.198057,9.318729,,
...,...,...,...,...,...,...,...,...,...,...
2020-01-17,7.863516,9.600071,,,,,,,,
2020-01-18,7.962156,9.610467,,,,,,,,
2020-01-19,7.895273,9.660795,,,,,,,,
2020-01-20,7.940898,9.655595,,,,,,,,


## Get signatures

The function `pstore.get_signatures` does not explicitly support parallel processing but can be used in combination with `pstore.apply`

In [12]:
signatures = [
    "cv_period_mean",
    "cv_date_min",
    "cv_date_max",
    "cv_fall_rate",
    "cv_rise_rate",
]

In [13]:
pstore.get_signatures(signatures=signatures)

Unnamed: 0,head_nb5,oseries1,oseries3,head_mw,oseries2
cv_period_mean,0.061879,0.013066,0.029168,0.145062,0.015199
cv_date_min,0.246021,0.145884,1.394852,0.254627,0.128636
cv_date_max,1.262425,0.300328,0.444442,1.083929,0.722945
cv_fall_rate,-1.13645,-0.744797,-1.032837,-1.4302,-0.722718
cv_rise_rate,1.25945,0.862981,0.931181,1.097257,0.836678


In [14]:
pstore.apply(
    "oseries", pstore.get_signatures, kwargs={"signatures": signatures}, parallel=True
)

Applying get_signatures (parallel): 100%|██████████| 5/5 [00:00<00:00, 36.23it/s]


get_signatures,head_nb5,oseries1,oseries3,head_mw,oseries2
cv_period_mean,0.061879,0.013066,0.029168,0.145062,0.015199
cv_date_min,0.246021,0.145884,1.394852,0.254627,0.128636
cv_date_max,1.262425,0.300328,0.444442,1.083929,0.722945
cv_fall_rate,-1.13645,-0.744797,-1.032837,-1.4302,-0.722718
cv_rise_rate,1.25945,0.862981,0.931181,1.097257,0.836678


## Load models

Load models in parallel.

In [15]:
pstore.apply("models", pstore.get_models, fancy_output=True)

Applying get_models: 100%|██████████| 5/5 [00:00<00:00, 13.07it/s]


{'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False),
 'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False)}

The `max_workers` keyword argument sets the number of workers that are spawned. The default value is often fine, but it can be set explicitly.

The following works for `PasConnector`. See alternative code below for `ArcticDBConnector`.  

In [16]:
pstore.apply(
    "models", pstore.get_models, fancy_output=True, parallel=True, max_workers=5
)

Applying get_models (parallel): 100%|██████████| 5/5 [00:00<00:00, 25.60it/s]


{'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False),
 'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False)}

For `ArcticDBConnector` the underlying objects that manage the database connection cannot be pickled. Therefore, passing a method directly from the `PastaStore` or `ArcticDBConnector` classes will not work in parallel mode. 

The solution is to write a simple function that assumes there is global connector object `conn` and use that to obtain data from the database.

In [17]:
# Simple function to get models from database
def get_model(model_name):
    """ArcticDBConnector alternative for getting models from database."""
    return conn.get_model(model_name)

In [18]:
pstore.apply("models", get_model, fancy_output=True, parallel=True, max_workers=5)

Applying get_model (parallel): 100%|██████████| 5/5 [00:00<00:00, 26.68it/s]


{'head_nb5': Model(oseries=head_nb5, name=head_nb5, constant=True, noisemodel=False),
 'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False),
 'oseries3': Model(oseries=oseries3, name=oseries3, constant=True, noisemodel=False),
 'head_mw': Model(oseries=head_mw, name=head_mw, constant=True, noisemodel=False),
 'oseries2': Model(oseries=oseries2, name=oseries2, constant=True, noisemodel=False)}

Clean up temporary pastastore.

In [19]:
pst.util.delete_pastastore(pstore)

Deleting PasConnector database: 'my_connector' ...  Done!
