## Batch Learning of Models

You can batch learn models from data by means of a specification dictionary and with the utility functions in `fit_models.py`.

Here is the docs:

In [10]:
import pandas as pd
from mb_modelbase.utils import fit_models

print(fit_models.__doc__)
# it should also pop up at the bottom on execution of this cell:
fit_models??

Fits models according to provided specs and returns a dict of the learned models.

    Args:
        spec (dict): Dictionary of <name> to model specifications. A single model specification may either be a dict or
            a callable (no arguments) that returns a dict. Either way, the configuration dict is as follows:
                * 'class': Usually <class-object of model> but can be any function that returns a model when called.
                * 'data': Optional. The data frame of data to use for fitting. If not spefified the 'class' is expected to return a fitted model.
                * 'classopts': Optional. A dict passed as keyword-arguments to 'class'.
                * 'fitopts': Optional. A dict passed as keyword-arguments to the .fit method of the created model instance.
            The idea of the callable is that delay data acquisition until model selection.
        verbose (bool): Optional. Defaults to False. More verbose logging iff set to true.
        include (list

Here, we will specify to learn 3 models from the data in `./data`, as follows:

In [17]:
# import various model types
from mb_modelbase.models_core.mixable_cond_gaussian import MixableCondGaussianModel
from mb_modelbase.models_core.spnmodel import SPNModel
from mb_modelbase.models_core.empirical_model import EmpiricalModel

# titanic.py provides preprocessing of the titanic data set
import data.titanic as titanic

# actual specifications
specs = {
    'emp_iris': {'class': EmpiricalModel, 'data': pd.read_csv('./data/iris.csv')},
    'mcg_iris': {'class': MixableCondGaussianModel, 'data': pd.read_csv('./data/iris.csv'), 'fitopts': {'fit_algo': 'map'}},    
    'spn_titanic': lambda: ({'class': SPNModel, 'data': titanic.continuous(), 'fitopts': {'iterations': 1}}),
}

Now we learn the models using fit_models:

In [18]:
models = fit_models(specs)

16:35:44.187 INFO :: Fitted 3 models in total: {'mcg_iris', 'emp_iris', 'spn_titanic'}


`models` is a dict that contains the learned model and some additional status information about the fitting process:

In [19]:
models

{'emp_iris': {'model': <mb_modelbase.models_core.empirical_model.EmpiricalModel at 0x7f964bb31fd0>,
  'status': 'SUCCESS'},
 'mcg_iris': {'model': <mb_modelbase.models_core.mixable_cond_gaussian.MixableCondGaussianModel at 0x7f964dbcfd30>,
  'status': 'SUCCESS'},
 'spn_titanic': {'model': <mb_modelbase.models_core.spnmodel.SPNModel at 0x7f964dd1a4a8>,
  'status': 'SUCCESS'}}

Apparently everything went well and the fitted models are available under the key `model`.
We can now save the models in a common directory via another utility function `save_models`:

In [20]:
from mb_modelbase.utils import save_models

save_models(models, './models')

That directory now contains a new `.mdl` file for each learned model:
(It also contains another model `Allbus_CondGauss.mdl` which shipped with lumen already.)

In [24]:
%ls models

Allbus_CondGauss.mdl  emp_iris.mdl  mcg_iris.mdl  spn_titanic.mdl
