# Machine Learning

In general, _machine learning_ is the automatic creation of a model using training data.
A common kind of model is a _classifier_ which provides a predicted value for a _target_ random variable, given values for some _source_ random variables. Within CK the most common kind of model is a probabilistic graphical model (PGM).

Considering probabilistic graphical models, two kinds of learning are possible: parameter learning and structure learning.

# <a name="ParameterLearning"></a>Parameter Learning

Parameter learning describes a process where the values of the parameters of a model are set using training data.

To perform parameter learning, first a model structure needs to be defined. This can be done in CK by creating a `PGM` object with random variables and factors. There is no need to set the potential functions of the factors - just leave the default potential functions, which is the `ZeroPotentialFunction`.

The following code creates a PGM with the structure of the Student Bayesian network.

In [1]:
from ck.pgm import PGM

pgm = PGM('Student')

difficult = pgm.new_rv('difficult', ['y', 'n'])
intelligent = pgm.new_rv('intelligent', ['y', 'n'])
grade = pgm.new_rv('grade', ['low', 'medium', 'high'])
award = pgm.new_rv('award', ['y', 'n'])
letter = pgm.new_rv('letter', ['y', 'n'])

pgm.new_factor(difficult)
pgm.new_factor(intelligent)
pgm.new_factor(grade, intelligent, difficult)
pgm.new_factor(award, intelligent)
pgm.new_factor(letter, grade)

pgm.dump()

PGM id=2830421507440 name='Student'
  name: Student
  number of random variables: 5
  number of indicators: 11
  number of states: 48
  log 2 of states: 5.585
  number of factors: 5
  number of functions: 5
  number of non-zero functions: 0
  number of parameters: 26
  number of functions (excluding ZeroPotentialFunction): 0
  number of parameters (excluding ZeroPotentialFunction): 0
  Bayesian structure: True
  CPT factors: True
random variables (5)
    0 'difficult' (2) ['y', 'n']
    1 'intelligent' (2) ['y', 'n']
    2 'grade' (3) ['low', 'medium', 'high']
    3 'award' (2) ['y', 'n']
    4 'letter' (2) ['y', 'n']
factors (5)
    0 rvs=[0] function=<zero>
    1 rvs=[1] function=<zero>
    2 rvs=[2, 1, 0] function=<zero>
    3 rvs=[3, 1] function=<zero>
    4 rvs=[4, 2] function=<zero>
functions (5)
end PGM id=2830421507440


Next we create an example training dataset, using the PGM random variables as the random variables of the dataset.

In [2]:
from ck.dataset.dataset_from_csv import hard_dataset_from_csv

rvs = (difficult, intelligent, grade, award, letter)

csv = """
0,1,2,0,1
1,1,2,0,1
1,1,2,0,1
0,0,2,0,0
0,1,1,1,0
1,1,1,1,1
1,1,0,0,0
1,1,0,0,1
1,0,0,0,0
"""

dataset = hard_dataset_from_csv(rvs, csv.splitlines())

dataset.dump()

rvs: [difficult, intelligent, grade, award, letter]
instances (9, with total weight 9.0):
(0, 1, 2, 0, 1) * 1.0
(1, 1, 2, 0, 1) * 1.0
(1, 1, 2, 0, 1) * 1.0
(0, 0, 2, 0, 0) * 1.0
(0, 1, 1, 1, 0) * 1.0
(1, 1, 1, 1, 1) * 1.0
(1, 1, 0, 0, 0) * 1.0
(1, 1, 0, 0, 1) * 1.0
(1, 0, 0, 0, 0) * 1.0


## Bayesian Network Maximum-likelihood Training

Parameter training for a PGM involves determining the parameter values for its potential functions.

In particular, `train_generative_bn` will assume the PGM represents a Bayesian network and provides
parameter values representing conditional probability tables (CPTs).

The returned parameter values can then be used to update the PGMs potential functions.

In [3]:
from ck.learning.train_generative import train_generative_bn, ParameterValues

# Learn parameters values for `pgm` using the training data `dataset`.
parameter_values: ParameterValues = train_generative_bn(pgm, dataset)

# Use the resulting parameter values to update the PGM potential functions.
parameter_values.set_sparse()


Here is the updated PGM and parameter values...

In [4]:
pgm.dump()
print()

for factor in pgm.factors:
    function = factor.function
    for instance, _, param_value in function.keys_with_param:
        print(f'{factor}{instance} = {param_value}')

PGM id=2830421507440 name='Student'
  name: Student
  number of random variables: 5
  number of indicators: 11
  number of states: 48
  log 2 of states: 5.585
  number of factors: 5
  number of functions: 5
  number of non-zero functions: 5
  number of parameters: 20
  number of functions (excluding ZeroPotentialFunction): 5
  number of parameters (excluding ZeroPotentialFunction): 20
  Bayesian structure: True
  CPT factors: True
random variables (5)
    0 'difficult' (2) ['y', 'n']
    1 'intelligent' (2) ['y', 'n']
    2 'grade' (3) ['low', 'medium', 'high']
    3 'award' (2) ['y', 'n']
    4 'letter' (2) ['y', 'n']
factors (5)
    0 rvs=[0] function=2830422134416: SparsePotentialFunction
    1 rvs=[1] function=2830422144736: SparsePotentialFunction
    2 rvs=[2, 1, 0] function=2830422144784: SparsePotentialFunction
    3 rvs=[3, 1] function=2830422144832: SparsePotentialFunction
    4 rvs=[4, 2] function=2830422144880: SparsePotentialFunction
functions (5)
  2830422134416: SparsePo

Here is an example of using the resulting trained model. (Don't try to interpret the probabilities as per the real world. The training data is fictitious.)

In [5]:
from ck.pgm_circuit.wmc_program import WMCProgram
from ck.pgm_compiler import DEFAULT_PGM_COMPILER

wmc = WMCProgram(DEFAULT_PGM_COMPILER(pgm))

print('Probabilities from trained PGM:')
for i in intelligent.indicators:
    for d in difficult.indicators:
        w = wmc.marginal_distribution(grade, condition=(d, i))
        print(f'Pr({grade} | {pgm.indicator_str(d, i)}) = {w}')


Probabilities from trained PGM:
Pr(grade | difficult=y, intelligent=y) = [0. 0. 1.]
Pr(grade | difficult=n, intelligent=y) = [1. 0. 0.]
Pr(grade | difficult=y, intelligent=n) = [0.  0.5 0.5]
Pr(grade | difficult=n, intelligent=n) = [0.4 0.2 0.4]
