# Machine Learning

In general, _machine learning_ is the automatic creation of a model using training data.
A common kind of model is a _classifier_ which provides a predicted value for a _target_ random variable, given values for some _source_ random variables. Within CK the most common kind of model is a probabilistic graphical model (PGM).

Considering probabilistic graphical models, two kinds of learning are possible: parameter learning and structure learning.

# Parameter Learning

Parameter learning describes a process where the values of the parameters of a model are set using training data.

To perform parameter learning, first a model structure needs to be defined. This can be done in CK by creating a `PGM` object with random variables and factors. There is no need to set the potential functions of the factors - just leave the default potential functions, which is the `ZeroPotentialFunction`.

The following code creates a PGM with the structure of the Student Bayesian network.

In [1]:
from ck.pgm import PGM, RVMap

pgm = PGM('Student')

difficult = pgm.new_rv('difficult', ['y', 'n'])
intelligent = pgm.new_rv('intelligent', ['y', 'n'])
grade = pgm.new_rv('grade', ['low', 'medium', 'high'])
award = pgm.new_rv('award', ['y', 'n'])
letter = pgm.new_rv('letter', ['y', 'n'])

pgm.new_factor(difficult)
pgm.new_factor(intelligent)
pgm.new_factor(grade, intelligent, difficult)
pgm.new_factor(award, intelligent)
pgm.new_factor(letter, grade)

pgm.dump()

PGM id=2611469627472
  name: Student
  number of random variables: 5
  number of indicators: 11
  number of states: 48
  log 2 of states: 5.585
  number of factors: 5
  number of functions: 5
  number of non-zero functions: 0
  number of parameters: 26
  number of functions (excluding ZeroPotentialFunction): 0
  number of parameters (excluding ZeroPotentialFunction): 0
  Bayesian structure: True
  CPT factors: True
random variables (5)
    0 'difficult' (2) ['y', 'n']
    1 'intelligent' (2) ['y', 'n']
    2 'grade' (3) ['low', 'medium', 'high']
    3 'award' (2) ['y', 'n']
    4 'letter' (2) ['y', 'n']
factors (5)
    0 rvs=('difficult') function=<ZeroPotentialFunction>
    1 rvs=('intelligent') function=<ZeroPotentialFunction>
    2 rvs=('grade', 'intelligent', 'difficult') function=<ZeroPotentialFunction>
    3 rvs=('award', 'intelligent') function=<ZeroPotentialFunction>
    4 rvs=('letter', 'grade') function=<ZeroPotentialFunction>
functions, excluding ZeroPotentialFunction (0)
en

Next we create an example training dataset, using the PGM random variables as the random variables of the dataset.

In [2]:
from ck.dataset.dataset_from_csv import hard_dataset_from_csv

rvs = (difficult, intelligent, grade, award, letter)

csv = """
0,1,2,0,1
1,1,2,0,1
1,1,2,0,1
0,0,2,0,0
0,1,1,1,0
1,1,1,1,1
1,1,0,0,0
1,1,0,0,1
1,0,0,0,0
"""

dataset = hard_dataset_from_csv(rvs, csv.splitlines())

dataset.dump()

rvs: [difficult, intelligent, grade, award, letter]
instances (9, with total weight 9.0):
(0, 1, 2, 0, 1) * 1.0
(1, 1, 2, 0, 1) * 1.0
(1, 1, 2, 0, 1) * 1.0
(0, 0, 2, 0, 0) * 1.0
(0, 1, 1, 1, 0) * 1.0
(1, 1, 1, 1, 1) * 1.0
(1, 1, 0, 0, 0) * 1.0
(1, 1, 0, 0, 1) * 1.0
(1, 0, 0, 0, 0) * 1.0


## Bayesian Network Maximum-likelihood Training

Parameter training for a PGM involves determining the parameter values for its potential functions.

In particular, `train_generative_bn` will assume the PGM represents a Bayesian network and provides
parameter values representing conditional probability tables (CPTs).

The returned parameter values can then be used to update the PGMs potential functions.

In [3]:
from ck.learning.train_generative_bn import train_generative_bn

# Learn parameters values for `pgm` using the training data `dataset`.
# This updates the PGMs potential functions.
train_generative_bn(pgm, dataset)


Here is the updated PGM and parameter values...

In [4]:
for factor in pgm.factors:
    potential_function = factor.function
    print(f'Factor: {factor} {type(potential_function)}')
    for instance, _, param_value in potential_function.keys_with_param:
        print(f'Factor{instance} = {param_value}')
    print()

Factor: ('difficult') <class 'ck.pgm.DensePotentialFunction'>
Factor(0,) = 0.3333333333333333
Factor(1,) = 0.6666666666666666

Factor: ('intelligent') <class 'ck.pgm.DensePotentialFunction'>
Factor(0,) = 0.2222222222222222
Factor(1,) = 0.7777777777777778

Factor: ('grade', 'intelligent', 'difficult') <class 'ck.pgm.DensePotentialFunction'>
Factor(0, 0, 0) = 0.0
Factor(0, 0, 1) = 1.0
Factor(0, 1, 0) = 0.0
Factor(0, 1, 1) = 0.4
Factor(1, 0, 0) = 0.0
Factor(1, 0, 1) = 0.0
Factor(1, 1, 0) = 0.5
Factor(1, 1, 1) = 0.2
Factor(2, 0, 0) = 1.0
Factor(2, 0, 1) = 0.0
Factor(2, 1, 0) = 0.5
Factor(2, 1, 1) = 0.4

Factor: ('award', 'intelligent') <class 'ck.pgm.DensePotentialFunction'>
Factor(0, 0) = 1.0
Factor(0, 1) = 0.7142857142857143
Factor(1, 0) = 0.0
Factor(1, 1) = 0.2857142857142857

Factor: ('letter', 'grade') <class 'ck.pgm.DensePotentialFunction'>
Factor(0, 0) = 0.6666666666666666
Factor(0, 1) = 0.5
Factor(0, 2) = 0.25
Factor(1, 0) = 0.3333333333333333
Factor(1, 1) = 0.5
Factor(1, 2) = 0.75

Here is an example of using the resulting trained model. (Don't try to interpret the probabilities as per the real world. The training data is fictitious.)

In [5]:
from ck.pgm_circuit.wmc_program import WMCProgram
from ck.pgm_compiler import DEFAULT_PGM_COMPILER

wmc = WMCProgram(DEFAULT_PGM_COMPILER(pgm))

print('Probabilities from trained PGM:')
for i in intelligent.indicators:
    for d in difficult.indicators:
        w = wmc.marginal_distribution(grade, condition=(d, i))
        print(f'Pr({grade} | {pgm.indicator_str(d, i)}) = {w}')


Probabilities from trained PGM:
Pr(grade | difficult=y, intelligent=y) = [0. 0. 1.]
Pr(grade | difficult=n, intelligent=y) = [1. 0. 0.]
Pr(grade | difficult=y, intelligent=n) = [0.  0.5 0.5]
Pr(grade | difficult=n, intelligent=n) = [0.4 0.2 0.4]


# Structure Learning

Structure learning describes a process where the structure of a model is learned from training data.
Typically, structure learning methods will also learn parameter values for the model.


Presently, the only structure learning method available in CK is learning from a collection of cross-tables.

The following example runs `model_from_cross_tables` to learn a structure and parameter values, using data sampled from the example `Student` pgm.

In [6]:
from ck.dataset import HardDataset
from ck.dataset.sampled_dataset import dataset_from_sampler
from ck import example

# Create a dataset based on the "Student" example PGM
number_of_samples: int = 10000  # How many instances to make for the model dataset
model: PGM = example.Student()
model_dataset: HardDataset = dataset_from_sampler(
    WMCProgram(DEFAULT_PGM_COMPILER(model)).sample_direct(),
    number_of_samples,
)

# Clone the model, without factors, and transport the dataset to the new PGM
pgm = PGM()
dataset = HardDataset(weights=model_dataset.weights)
for model_rv in model.rvs:
    rv = pgm.new_rv(model_rv.name, model_rv.states)
    dataset.add_rv_from_state_idxs(rv, model_dataset.state_idxs(model_rv))

For this example, we make some cross-tables from the dataset.

In [7]:
from ck.dataset.cross_table import cross_table_from_hard_dataset

rvs = RVMap(pgm)

cross_tables = [
    cross_table_from_hard_dataset(dataset, [rvs.grade, rvs.difficult, rvs.intelligent]),
    cross_table_from_hard_dataset(dataset, [rvs.sat, rvs.intelligent]),
    cross_table_from_hard_dataset(dataset, [rvs.letter, rvs.grade]),
]

Now add structure and parameters to the PGM from the cross-tables.

In [8]:
from ck.learning.model_from_cross_tables import model_from_cross_tables

model_from_cross_tables(pgm, cross_tables)

pgm.dump()

PGM id=2611474270640
  name: PGM_2611474270640
  number of random variables: 5
  number of indicators: 11
  number of states: 48
  log 2 of states: 5.585
  number of factors: 5
  number of functions: 5
  number of non-zero functions: 5
  number of parameters: 26
  number of functions (excluding ZeroPotentialFunction): 5
  number of parameters (excluding ZeroPotentialFunction): 26
  Bayesian structure: True
  CPT factors: True
random variables (5)
    0 'difficult' (2) ['Yes', 'No']
    1 'intelligent' (2) ['Yes', 'No']
    2 'grade' (3) ['1', '2', '3']
    3 'sat' (2) ['High', 'Low']
    4 'letter' (2) ['Yes', 'No']
factors (5)
    0 rvs=('difficult') function=2611490966128: DensePotentialFunction
    1 rvs=('intelligent') function=2611998865168: DensePotentialFunction
    2 rvs=('grade', 'difficult', 'intelligent') function=2611998865984: DensePotentialFunction
    3 rvs=('sat', 'intelligent') function=2611998864544: DensePotentialFunction
    4 rvs=('letter', 'grade') function=261199

Below we compile the true model and the trained PGM, showing the trained PGM accuracy.

In [9]:
from ck.probability import divergence

model_probabilities = WMCProgram(DEFAULT_PGM_COMPILER(model))
pgm_probabilities = WMCProgram(DEFAULT_PGM_COMPILER(pgm))
print('HI', divergence.hi(model_probabilities, pgm_probabilities))
print('KL', divergence.kl(model_probabilities, pgm_probabilities))

HI 0.982727503778387
KL 0.0016500159561904108
