## Batch Learning of Models

You can batch learn models from data by means of a specification dictionary and with the utility functions in `fit_models.py`.

Here is the docs:

In [3]:
import pandas as pd
from mb_modelbase.utils import fit_models

print(fit_models.__doc__)
# it should also pop up at the bottom on execution of this cell:
fit_models??

Fits models according to provided specs and returns a dict of the learned models.

    Args:
        spec (dict): Dictionary of <name> to model specifications. A single model specification may either be a dict or
            a callable (no arguments) that returns a dict. Either way, the configuration dict is as follows:
                * 'class': Usually <class-object of model> but can be any function that returns a model when called.
                * 'data': Optional. The data frame of data to use for fitting. If not spefified the 'class' is expected
                    to return a fitted model.
                * 'classopts': Optional. A dict passed as keyword-arguments to 'class'.
                * 'fitopts': Optional. A dict passed as keyword-arguments to the .fit method of the created model
                    instance.
            The idea of the callable is that delay data acquisition until model selection.
        verbose (bool): Optional. Defaults to False. More verbose loggin

Here, we will specify to learn 3 models from the data in `./data`, as follows:

In [4]:
# import various model types
from mb_modelbase.models_core.mixable_cond_gaussian import MixableCondGaussianModel
from mb_modelbase.models_core.spnmodel import SPNModel
from mb_modelbase.models_core.empirical_model import EmpiricalModel

# titanic.py provides preprocessing of the titanic data set
import data.titanic as titanic

# actual specifications
specs = {
    'emp_iris': {'class': EmpiricalModel, 'data': pd.read_csv('./data/iris.csv')},
    'mcg_iris': {'class': MixableCondGaussianModel, 'data': pd.read_csv('./data/iris.csv'), 'fitopts': {'fit_algo': 'map'}},    
    'spn_titanic': lambda: ({'class': SPNModel, 'data': titanic.continuous(), 'fitopts': {'iterations': 1}}),
}

Now we learn the models using fit_models:

In [5]:
models = fit_models(specs)

16:03:03.621 INFO :: Fitted 3 models in total: {'emp_iris', 'spn_titanic', 'mcg_iris'}


`models` is a dict that contains the learned model and some additional status information about the fitting process:

In [6]:
models

{'emp_iris': {'model': <mb_modelbase.models_core.empirical_model.EmpiricalModel at 0x7fa6890cabe0>,
  'status': 'SUCCESS'},
 'mcg_iris': {'model': <mb_modelbase.models_core.mixable_cond_gaussian.MixableCondGaussianModel at 0x7fa6890ca438>,
  'status': 'SUCCESS'},
 'spn_titanic': {'model': <mb_modelbase.models_core.spnmodel.SPNModel at 0x7fa6890feba8>,
  'status': 'SUCCESS'}}

Apparently everything went well and the fitted models are available under the key `model`.
We can now save the models in a common directory via another utility function `save_models`:

In [7]:
from mb_modelbase.utils import save_models

save_models(models, './models')

16:03:03.647 INFO :: Files under ./models are watched for changes


That directory now contains a new `.mdl` file for each learned model:
(It also contains another model `Allbus_CondGauss.mdl` which shipped with lumen already.)

In [8]:
%ls models

16:03:04.160 INFO :: Loaded model from file emp_iris.mdl
16:03:04.166 INFO :: Loaded model from file emp_iris.mdl
16:03:04.177 INFO :: Loaded model from file mcg_iris.mdl
16:03:04.185 INFO :: Loaded model from file mcg_iris.mdl
16:03:04.194 INFO :: Loaded model from file spn_titanic.mdl


Allbus_CondGauss.mdl  emp_iris.mdl  mcg_iris.mdl  spn_titanic.mdl


## Example: Set specification to also learn a PCI graph

This is an example that illustrate how the specification can take arguments.
In this particular case we enable the learning of the PCI (pair-wise conditional independence graph, see pci_graph.py).
It's really quite simple, we just have to set the flag to true:

In [10]:
spec_with_pci_graph_enabled = {
    'mcg_iris': {'class': MixableCondGaussianModel,
                 'data': pd.read_csv('./data/iris.csv'),
                 'fitopts': {'fit_algo': 'map', 'pci_graph': True}},
}

Now we use it to learn the model.
Here, we also directly extract the model from the resulting dict.

In [13]:
iris_model =  fit_models(spec_with_pci_graph_enabled)['mcg_iris']
print(iris_model)

Traceback (most recent call last):
  File "/home/luca_ph/Documents/projects/graphical_models/code/modelbase/mb_modelbase/utils/fit_models.py", line 172, in fit_models
    model.fit(df, **config['fitopts'])
  File "/home/luca_ph/Documents/projects/graphical_models/code/modelbase/mb_modelbase/models_core/models.py", line 660, in fit
    return self.set_data(df, **kwargs).fit(auto_extend=auto_extend, **kwargs)
  File "/home/luca_ph/Documents/projects/graphical_models/code/modelbase/mb_modelbase/models_core/models.py", line 520, in set_data
    raise NotImplementedError("Cannot compute PCI graph for now. See https://github.com/lumen-org/modelbase/issues/93")
NotImplementedError: Cannot compute PCI graph for now. See https://github.com/lumen-org/modelbase/issues/93

16:06:25.740 INFO :: Fitted 0 models in total: <none>
16:06:25.746 ERROR :: I did not fit a single model! Something must be wrong!


{'model': None, 'status': 'FAIL', 'message': 'Unexpected error: \nmcg_iris'}


As we can see, it now has learned a suitable PCI graph and stored it with the model
ISSUE: as of July 2020 the PCI feature is currently disabled, see https://github.com/lumen-org/modelbase/issues/93 .
Hence, it returns None.

In [14]:
print(iris_model.pci_graph)

AttributeError: 'dict' object has no attribute 'pci_graph'