Bayesian Networks in Scala

Major features:

Model representation
- discrete factor
- cluster graph
Inference
- Loopy Belief Propagation in a Cluster Graph
Learning parameters
- Expectation Maximisation - CPT Factors, Complete/Incomplete data, BN and Unrolled DBN

Roadmap:

Bayesian Inference in hybrid networks (mix of discrete and continuous variables)
EM learning in hybrid networks with hidden continuous variables

Getting Started - Loopy Belief Propagation in a Cluster Graph 1

Consider the following example of a Bayesian Network 1, created with SamIam tool 2.

Example 1: Create cluster graph and compute marginal for Grade variable (source code)

Create cluster graph:

//Create variables
val difficultyVar = Var(1, 2)
val intelliVar = Var(2, 2)
val gradeVar = Var(3, 3)
val satVar = Var(4, 2)
val letterVar = Var(5, 2)

//Create factors
val difficultyFactor = Factor(difficultyVar, Array(0.6, 0.4))
val intelliFactor = Factor(intelliVar, Array(0.7, 0.3))
val gradeFactor = Factor(intelliVar, difficultyVar, gradeVar, Array(0.3, 0.4, 0.3, 0.05, 0.25, 0.7, 0.9, 0.08, 0.02, 0.5, 0.3, 0.2))
val satFactor = Factor(intelliVar, satVar, Array(0.95, 0.05, 0.2, 0.8))
val letterFactor = Factor(gradeVar, letterVar, Array(0.1, 0.9, 0.4, 0.6, 0.99, 0.01))

//Create cluster graph
val clusterGraph = GenericClusterGraph()
clusterGraph.addCluster(1, difficultyFactor)
clusterGraph.addCluster(2, intelliFactor)
clusterGraph.addCluster(3, gradeFactor)
clusterGraph.addCluster(4, satFactor)
clusterGraph.addCluster(5, letterFactor)

//Add edges between clusters in a cluster graph
clusterGraph.addEdges((1, 3), (2, 3), (2, 4), (3, 5))

Calibrate cluster graph and get Grade marginal:

//Calibrate cluster graph
val loopyBP = LoopyBP(clusterGraph)
loopyBP.calibrate()
 
//Get marginal for Grade variable
val gradeMarginal = loopyBP.marginal(gradeVar.id)
gradeMarginal.getVariables() // Var(3,3)
gradeMarginal.getValues() // List(0.3620, 0.2884, 0.3496)

Example 2: Compute marginal for Grade variable given SAT test is high (source code)

Set evidence for SAT variable and compute marginal for Grade variable:

loopyBP.setEvidence(satVar.id, 0)
loopyBP.calibrate()

val gradeMarginal = loopyBP.marginal(gradeVar.id)
gradeMarginal.getVariables() // Var(3,3)
gradeMarginal.getValues() // List(0.2446, 0.3257, 0.4295)

Getting Started - Learning parameters with Expectation Maximisation in Bayesian Networks from incomplete data 1

In this example we learn parameters of a Sprinkler Bayesian Network 3. (source code)

Create Sprinkler Network with table CPT parameters:

//Create variables
val winterVar = Var(1, 2)
val sprinklerVar = Var(2, 2)
val rainVar = Var(3, 2)
val wetGrassVar = Var(4, 2)
val slipperyRoadVar = Var(5, 2)

//Create factors
val winterFactor = Factor(winterVar, Array(0.2, 0.8))
val sprinklerFactor = Factor(winterVar, sprinklerVar, Array(0.6, 0.4, 0.55, 0.45))
val rainFactor = Factor(winterVar, rainVar, Array(0.1, 0.9, 0.3, 0.7))
val wetGrassFactor = Factor(sprinklerVar, rainVar, wetGrassVar, Array(0.85, 0.15, 0.3, 0.7, 0.35, 0.65, 0.55, 0.45))
val slipperyRoadFactor = Factor(rainVar, slipperyRoadVar, Array(0.5, 0.5, 0.4, 0.6))

//Create cluster graph	
val sprinklerGraph = ClusterGraph()

sprinklerGraph.addCluster(winterVar.id, winterFactor)
sprinklerGraph.addCluster(sprinklerVar.id, sprinklerFactor)
sprinklerGraph.addCluster(rainVar.id, rainFactor)
sprinklerGraph.addCluster(wetGrassVar.id, wetGrassFactor)
sprinklerGraph.addCluster(slipperyRoadVar.id, slipperyRoadFactor)

sprinklerGraph.addEdges((1, 2), (1, 3), (2, 4), (3, 4), (3, 5))

Learn parameters of Sprinkler Network from samples (sprinkler_10k_samples_5pct_missing_values.dat):

val maxIterNum = 5
val variableIds = Array(winterVar.id, rainVar.id, sprinklerVar.id, slipperyRoadVar.id, wetGrassVar.id)
val dataSet = DataSet.fromFile("src/test/resources/sprinkler_data/sprinkler_10k_samples_5pct_missing_values.dat", variableIds)

GenericEMLearn.learn(sprinklerGraph, dataSet, maxIterNum)

sprinklerGraph.getCluster(winterVar.id).getFactor() //Factor(winterVar, Array(0.6086, 0.3914))
sprinklerGraph.getCluster(sprinklerVar.id).getFactor() //Factor(winterVar, sprinklerVar, Array(0.2041, 0.7958, 0.7506, 0.2493))
sprinklerGraph.getCluster(rainVar.id).getFactor() //Factor(winterVar, rainVar, Array(0.8066, 0.1933, 0.0994, 0.9005))
sprinklerGraph.getCluster(wetGrassVar.id).getFactor() //Factor(sprinklerVar, rainVar, wetGrassVar, Array(0.9481, 0.0518, 0.9052, 0.0947, 0.7924, 0.2075, 0.00001, 0.9999))
sprinklerGraph.getCluster(slipperyRoadVar.id).getFactor() //Factor(rainVar, slipperyRoadVar, Array(.6984, 0.3015, 0.00001, 0.9999))

Getting Started - Learning parameters with Expectation Maximisation in Unrolled Dynamic Bayesian Networks from incomplete data 1

For this scenario we learn parameters in a Dynamic Bayesian Network designed for predicting outcomes of tennis matches.

There are two types of variables in this network:

</tr>
<tr>
  <td>Player Skill</td>
  <td>High, Medium, Low</td>
  <td>How good player is at tennis</td>
</tr>
 <tr>
  <td>Match Outcome</td>
  <td>Win, Loss</td>
  <td>Outcome of tennis match between two players</td>
</tr>

Variable Type	Values	Description

We name Player Skills as Hidden variables, because those are never observed and we try to infer their values over the time for all tennis players. Match Outcome variables are always observed, given historical tennis results are available to us.

Tennis network is sliced by time. e.g. weeks. Within a single time slice, tennis matches are represented by Match Outcome variables, whereas players are characterised by Player Skill variables. For better understanding of this structure, look at diagram below, which reflects the following sequence of tennis matches.

player_id_1, player_id_2, player_1_won (won,lost), time
1,2,won,0
1,2,won,1
2,3,lost,1
1,2,?,2 //This game has not been played yet and we would like to infer the likelihood of its outcome. 
1,3,lost,2
2,3,won,2

Probability distribution of tennis data is modelled with prior, emission and transition parameters:

Parameter Type	Description
Prior	Initial player skill, at the time of his very first tennis game included in tennis training data
Emission	Probability of winning a tennis game given skills of both players
Transition	Probability of changing player skills between time slices

Those parameters are shared by corresponding variables, for example emission parameter is shared by all Match Outcome variables. The following figure presents initial guess about network parameters, which we will learn from historical data applying Expectation Maximisation algorithm.

In the reminder of this tutorial we build cluster graph for Tennis Network and learn its prior, emission and transition parameters.

Create cluster graph for Tennis Network (source code):

val tennisClusterGraph = createTennisClusterGraph()

Learn parameters of Tennis Network from samples (source code, tennis_3_players_network.dat):

//Prepare training set
val variableIds = Array(
player1Time0Var.id, player1Time1Var.id, player1Time2Var.id,
player2Time0Var.id, player2Time1Var.id, player2Time2Var.id,
player3Time1Var.id, player3Time2Var.id,
match1v2Time0Var.id, match1v2Time1Var.id, match2v3Time1Var.id, match1v2Time2Var.id, match1v3Time2Var.id, match2v3Time2Var.id)

val dataSet = DataSet.fromFile("src/test/resources/tennis_data/tennis_3_players_network.dat", variableIds)

//Learn parameters
val maxIterNum = 5
GenericEMLearn.learn(tennisClusterGraph, dataSet, maxIterNum)

//Prior parameter - Array(0.4729, 0.2323, 0.2947)
tennisClusterGraph.getCluster(player1Time0Var.id).getFactor()

//Transition parameter - Array(0.9998, 0.0001, 0.0001, 0.0083, 0.9720, 0.0197, 0.0020, 0.0091, 0.9890)
tennisClusterGraph.getCluster(player2Time1Var.id).getFactor()

//Emission parameter - Array(0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.9930, 0.0070, 0.9198, 0.0802, 0.8337, 0.1663, 0.9980, 0.0020, 0.9956, 0.0044, 0.9960, 0.0040)
tennisClusterGraph.getCluster(match1v2Time2Var.id).getFactor()

References

Daphne Koller, Nir Friedman. Probabilistic Graphical Models, Principles and Techniques, 2009
Automated Reasoning Group of Professor Adnan Darwiche at UCLA. SamIam: Sensitivity Analysis, Modelling, Inference and More, version 3.0
Adnan Darwiche. Modeling and Reasoning with Bayesian Networks, 2009
Stuart Russell, Peter Norvig. Artificial Intelligence - A Modern Approach, Third Edition, 2010

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
doc		doc
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Networks in Scala

Getting Started - Loopy Belief Propagation in a Cluster Graph 1

Example 1: Create cluster graph and compute marginal for Grade variable (source code)

Example 2: Compute marginal for Grade variable given SAT test is high (source code)

Getting Started - Learning parameters with Expectation Maximisation in Bayesian Networks from incomplete data 1

Getting Started - Learning parameters with Expectation Maximisation in Unrolled Dynamic Bayesian Networks from incomplete data 1

References

About

Releases

Packages

License

netconstructor/bayes-scala

Folders and files

Latest commit

History

Repository files navigation

Bayesian Networks in Scala

Getting Started - Loopy Belief Propagation in a Cluster Graph 1

Example 1: Create cluster graph and compute marginal for Grade variable (source code)

Example 2: Compute marginal for Grade variable given SAT test is high (source code)

Getting Started - Learning parameters with Expectation Maximisation in Bayesian Networks from incomplete data 1

Getting Started - Learning parameters with Expectation Maximisation in Unrolled Dynamic Bayesian Networks from incomplete data 1

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages