This is an introduction on how to use the Sum Product Networks from mb_modelbase/models_core/spflow.py. 
They are implemented using the Tensorflow-based Library SPFlow (https://github.com/SPFlow/SPFlow)

We begin by loading some data to model and train the SPNs on.

In [11]:
import pandas as pd
dataset = pd.read_csv('data/allbus2016.csv', index_col=0)
dataset.head()

Unnamed: 0,age,sex,educ,income,eastwest,lived_abroad,spectrum
0,47,Female,3,1800,East,No,1
1,52,Male,3,2000,East,No,5
2,61,Male,2,2500,West,No,6
3,54,Female,2,860,West,Yes,1
4,49,Male,3,2500,West,No,6


In [12]:
dataset.dtypes


age              int64
sex             object
educ             int64
income           int64
eastwest        object
lived_abroad    object
spectrum         int64
dtype: object

In [23]:
iris = pd.read_csv('data/iris.csv')
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


We construct a SPN and a Mixed SPN to be trained on the dataset

In [13]:
from mb_modelbase.models_core.spflow import SPNModel

spn_model = SPNModel(
    'Allbus SPN',
    spn_type='spn'
)

mspn_model = SPNModel(
    'Allbus MSPN',
    spn_type='mspn'
)


To train a Sum Product Network we have to choose types of distributions to model the features of our dataset.

In [14]:
import spn.structure.leaves.parametric.Parametric as spn_parameter_types

spn_types = {
        'age': spn_parameter_types.Poisson,
        'sex': spn_parameter_types.Bernoulli,
        'educ': spn_parameter_types.Categorical,
        'income': spn_parameter_types.Poisson,
        'eastwest': spn_parameter_types.Bernoulli,
        'lived_abroad': spn_parameter_types.Bernoulli,
        'spectrum': spn_parameter_types.Categorical
    }
spn_types


{'age': spn.structure.leaves.parametric.Parametric.Poisson,
 'sex': spn.structure.leaves.parametric.Parametric.Bernoulli,
 'educ': spn.structure.leaves.parametric.Parametric.Categorical,
 'income': spn.structure.leaves.parametric.Parametric.Poisson,
 'eastwest': spn.structure.leaves.parametric.Parametric.Bernoulli,
 'lived_abroad': spn.structure.leaves.parametric.Parametric.Bernoulli,
 'spectrum': spn.structure.leaves.parametric.Parametric.Categorical}

To train our MSPN we only have to specify the Metatype of the variable.

In [24]:
import spn.structure.StatisticalTypes as spn_statistical_types

mspn_metatypes = {
        'age': spn_statistical_types.MetaType.DISCRETE,
        'sex': spn_statistical_types.MetaType.DISCRETE,
        'educ': spn_statistical_types.MetaType.DISCRETE,
        'income': spn_statistical_types.MetaType.DISCRETE,
        'eastwest': spn_statistical_types.MetaType.DISCRETE,
        'lived_abroad': spn_statistical_types.MetaType.DISCRETE,
        'spectrum': spn_statistical_types.MetaType.DISCRETE
    }

mspn_metatypes

{'sepal_width': <MetaType.REAL: 1>,
 'sepal_length': <MetaType.REAL: 1>,
 'petal_width': <MetaType.REAL: 1>,
 'petal_length': <MetaType.REAL: 1>,
 'species': <MetaType.DISCRETE: 3>}

After associating the variables with types we can train our models

In [26]:
spn_model.fit(
    df=dataset,
    var_types=spn_types
)

mspn_model.fit(
    df=dataset,
    var_types=mspn_metatypes
)


<mb_modelbase.models_core.spflow.SPNModel at 0x7fde48186780>

And finally use the model interface. We can sample from the distribution

In [20]:
spn_model.sample()

[[1.000e+00 1.000e+00 0.000e+00 4.500e+01 3.000e+00 1.266e+03 6.000e+00]]
[0, 1, 2, 3, 4, 5, 6]
[1.0, 1.0, 0.0, 45.0, 3.0, 1266.0, 6.0]


Unnamed: 0,sex,eastwest,lived_abroad,age,educ,income,spectrum
0,Male,West,No,45.0,3.0,1266.0,6.0


In [21]:
mspn_model.sample()


AssertionError: No lambda function associated with type: Histogram