Bayesian Modeling

Author: Hari Ravichandran

Bayesian Network Analysis

Factor Model

A factor is a function that maps each variable in a set to a number greater than or equal to zero.

Given two sets of variables A and B , and the set of shared variables, , the instantiation of the aforementioned variables a and b are consistent, if the instantiation of the shared variables c in a and b are the same.

At the beginning, the Bayesian Network for the sprinkler was shown. For this network, the factors are as follows:

Factor A over Rain, Sprinkler:

Rain	Sprinkler	Value
FALSE	FALSE	0.45
FALSE	TRUE	0.55
TRUE	FALSE	0.95
TRUE	TRUE	0.05

Factor B over Rain:

Rain	Value
TRUE	0.3
FALSE	0.7

Factor C over Sprinkler, Rain, Grass Wet:

Sprinkler	Rain	Grass Wet	Value
FALSE	FALSE	TRUE	0
FALSE	TRUE	TRUE	0.95
TRUE	FALSE	TRUE	0.85
TRUE	TRUE	TRUE	0.97
FALSE	FALSE	FALSE	1
FALSE	TRUE	FALSE	0.05
TRUE	FALSE	FALSE	0.15
TRUE	TRUE	FALSE	0.03

This repository implements two factor operations that each produce a new factor:

Summing-out operation: Where is a factor over variables , and is a variable in . By summing out variable from factor , we get a result that is another factor over the set of variables with removed.

For example, if we sum out the "Sprinkler" Variable from Factor C, we get the following resulting factor:

Rain	Grass Wet	Value
FALSE	FALSE	1.15
FALSE	TRUE	0.85
TRUE	FALSE	0.08
TRUE	TRUE	1.92

In the above example, we essentially sum the probabilities while disregarding the summed-out variable.

Multiplication operation: When we multiply two factors , we get another factor that has the variables . In the equation above, is consistent with and . Here is the result of multiplying the example factors A and B:

Rain	Sprinkler	Value
FALSE	FALSE	0.315
FALSE	TRUE	0.385
TRUE	FALSE	0.285
TRUE	TRUE	0.015

Repo Overview

Description

As defined in the problem statement, a factor 'is a function over a set of variables'. The 'Factor' class implemented in 'factor.py' provides a framework that facilitates factor operations. These operations include:

constructor – create Factor object from CSV file or Pandas DataFrame
save – save Factor to CSV file
get_data – fetch data from Factor object
print_data – print factor data to console
get_vars – get set of all variables in Factor
summation – summing over operation
multiplication – multiplication operation

In addition to the class, 'Factor.py' includes some driver code with examples, and the unit tests. Running the code saves the results from the summation and multiplication examples into the files 'summation.csv' and 'multiplication.csv' respectively. All of the example factors (A, B, and C) are stored in the main directory in CSV format.

Data Structure Choice

The Factor data itself is stored in a Pandas DataFrame within the object. This facilitates easy array level operations on the Factors. Data frames can only be as large as the memory that the system has. In case of massive datasets that exceed the memory capacity, Apache Spark can be used. Each row in a factor will be a key-value pair. Each key could be an immutable tuple that has one to many factors. These keys can be processed across many compute nodes in a parallel fashion. Apache Data Frames can be potentially employed for this purpose, and the operations have to be parallelized.

Summing-Out Algorithm – O(N) Time Complexity, O(N) Space Complexity

The summing out algorithm works as follows: getting the desired column list by removing the 'Value' column and Value of Interest, dropping the column with the variable of interest, and grouping by the desired columns to find the aggregated sum. Since the aggregation operation has to look at all the elements in the dataframe once, the time complexity is O(N). The space complexity is O(N) since a new data structure is made to store the results.

Multiplying Algorithm – O(N) Time Complexity, O(N) Space Complexity

The multiplication algorithm assumes that when multiplying Factor A and Factor B, the variables in Factor B are a subset of the variables in Factor A. If Factor B is a superset of Factor A, then the order of arguments can be switched inside the multiplication operation as a first step. Using this assumption, we can get the set of variables in both factors, as well as their unions and intersections. Then, we can iterate over all the rows in Factor A and match the relevant variables with those in Factor B to get the product. Since each row is examined only once, the time complexity is O(N). The space complexity is O(N) since we make a new DataFrame to store the result.

Unit Tests

Included in 'factor.py' are unit tests that test the summation and multiplication algorithms. Unit tests for the Factor class constructor for additional code coverage are also included. Together, these tests cover nearly all of the lines of code.

Materials Used

Python 'pandas' documentation: https://pandas.pydata.org/docs/
Python 'unittest' documentation: https://docs.python.org/3/library/unittest.html

Instructions for Running Code

Building the Code

The following libraries are required to run the code, as listed in requirements.txt:

pandas==1.2.1

The program was developed using Python 3.8.3.

To install all the needed packages to run this code using pip, go to the program directory and run the following command:

python3 -m pip install -r requirements.txt

More information can be found here.

Running the Code

To run the code, navigate to the directory in the command line, and run the following command:

python ./factor.py

Alternatively, if Python 3 is accessible through the 'python3' command:

python3 ./factor.py

The code can also be run in an IDE. For this project, Spyder 4 was used. Simply open the 'factor.py' file and click the run button in Spyder to run the code.

Files

The files in the directory are as follows:

README: Readme file for the code

requirements.txt: Contains required Python libraries

factor.py: code for the factor class, driver code, and unit tests

FactorA.csv: stored form of factor A (CSV format)

FactorB.csv: stored form of factor B (CSV format)

FactorC.csv: stored form of factor C (CSV format)

summationResult.csv: manually computed result for summing Factor C over the Sprinkler variable, for testing purposes (CSV format)

multiplicationResult.csv: manually computed result for multiplying Factors A and B, for testing purposes (CSV format)

SampleBayesianNetwork.png: shows the example Bayesian Network image

Output

Once the code has finished running, it will print out the results of the summing out and multiplying operation on the examples given. It will also print whether all the unit tests passed or failed. The driver code also stores the results of the summation operation on Factor C over the Sprinkler variable insummation.csv. In addition, the results of the multiplication operation on Factor A and Factor B from the driver code are stored in multiplication.csv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Modeling

Table of Contents

Factor Model

Repo Overview

Description

Data Structure Choice

Summing-Out Algorithm – O(N) Time Complexity, O(N) Space Complexity

Multiplying Algorithm – O(N) Time Complexity, O(N) Space Complexity

Unit Tests

Materials Used

Instructions for Running Code

Building the Code

Running the Code

Files

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
FactorA.csv		FactorA.csv
FactorB.csv		FactorB.csv
FactorC.csv		FactorC.csv
README.md		README.md
SampleBayesianNetwork.PNG		SampleBayesianNetwork.PNG
factor.py		factor.py
multiplicationResult.csv		multiplicationResult.csv
requirements.txt		requirements.txt
summationResult.csv		summationResult.csv

hariravichandran/bayesian-modeling

Folders and files

Latest commit

History

Repository files navigation

Bayesian Modeling

Table of Contents

Factor Model

Repo Overview

Description

Data Structure Choice

Summing-Out Algorithm – O(N) Time Complexity, O(N) Space Complexity

Multiplying Algorithm – O(N) Time Complexity, O(N) Space Complexity

Unit Tests

Materials Used

Instructions for Running Code

Building the Code

Running the Code

Files

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages