New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian Parameter Estimation #696

Merged
merged 3 commits into from Jun 16, 2016

Conversation

Projects
None yet
3 participants
@chrisittner
Contributor

chrisittner commented Jun 10, 2016

1. Adds basic support for working with missing data.

I added a state_count-method to BaseEstimator, that is used in both MLE and Bayesian estimation.
It can deal with missing data in two basic ways:

  • If complete_samples_only=True it just ignores incomplete samples in the data (rows where at least one value is np.NaN)
  • if complete_samples_only=False it counts every row where at least the variable itself AND all its parents are not np.NaN. This comes with the assumption that data values are missing randomly (not conditional on parents states), otherwise a more sophisticated apporach is needed.

complete_samples_only can be set when initializing the (MLE/Bayesian) Estimator-objects, where you also pass the data (or explicitly, if you call state_count manually). It currently defaults to True. Another option would be to throw an error on missing values by default.

(There are better ways to deal with missing data (you can be Bayesian about it and consider possible ways to complete the data etc), but I think it can wait until use-cases come up where it is needed. I used this as an intro: http://www.ismll.uni-hildesheim.de/lehre/bn-15s/script/bn-10-paramlearning_missing_values-2up.pdf)

2. Adds class for Bayesian Parameter Estimation.

I added Bayesian Parameter Estimation with Dirichlet priors similar to how it was in the book-branch. I'm open to suggestions for how the parameters should work. Please see docstring and let me know.

>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.estimators import BayesianEstimator
>>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]})
>>> model = BayesianModel([('A', 'C'), ('B', 'C')])
>>> estimator = BayesianEstimator(model, data)
>>> print(estimator.estimate_cpd('A', prior_type="dirichlet", pseudo_counts=[1, 1]))
╒══════╤═════╕
│ A(0) │ 0.6 │
├──────┼─────┤
│ A(1) │ 0.4 │
╘══════╧═════╛
>>> print(estimator.estimate_cpd('A', prior_type="dirichlet", pseudo_counts=[2, 2]))
╒══════╤══════════╕
│ A(0) │ 0.571429 │
├──────┼──────────┤
│ A(1) │ 0.428571 │
╘══════╧══════════╛
>>> print(estimator.estimate_cpd('A', prior_type="dirichlet", pseudo_counts=[5, 5]))
╒══════╤══════════╕
│ A(0) │ 0.538462 │
├──────┼──────────┤
│ A(1) │ 0.461538 │
╘══════╧══════════╛
>>> for cpd in estimator.get_parameters(prior_type="BDeu", equivalent_sample_size=10):
...     print(cpd)
... 
╒══════╤══════════╕
│ A(0) │ 0.538462 │
├──────┼──────────┤
│ A(1) │ 0.461538 │
╘══════╧══════════╛
╒══════╤═════════════════════╤═════════════════════╤═════════════════════╤══════╕
│ A    │ A(0)                │ A(0)                │ A(1)                │ A(1) │
├──────┼─────────────────────┼─────────────────────┼─────────────────────┼──────┤
│ B    │ B(0)                │ B(1)                │ B(0)                │ B(1) │
├──────┼─────────────────────┼─────────────────────┼─────────────────────┼──────┤
│ C(0) │ 0.357142857142857150.357142857142857150.64285714285714290.5  │
├──────┼─────────────────────┼─────────────────────┼─────────────────────┼──────┤
│ C(1) │ 0.64285714285714290.64285714285714290.357142857142857150.5  │
╘══════╧═════════════════════╧═════════════════════╧═════════════════════╧══════╛
╒══════╤══════════╕
│ B(0) │ 0.538462 │
├──────┼──────────┤
│ B(1) │ 0.461538 │
╘══════╧══════════╛

3. Update to BayesianModel.fit()-method to pass extra arguments for BayesianEstimator and for missing data values.

Again, happy about API suggestions.

@coveralls

This comment has been minimized.

coveralls commented Jun 10, 2016

Coverage Status

Coverage increased (+0.07%) to 96.17% when pulling d3f6dcb on chrisittner:bayesian_parameter_estimation into 00e38fd on pgmpy:dev.

def __init__(self, model, data, **kwargs):
"""
Class used to compute parameters for a model using Bayesian Parameter Estimation.
See `MaximumLikelihoodEstimator` for constructor parameters.

This comment has been minimized.

@ankurankan

ankurankan Jun 16, 2016

Member

@chrisittner Please complete this docstring. Add Parameters and Examples section.

@ankurankan ankurankan merged commit d3f6dcb into pgmpy:dev Jun 16, 2016

3 of 4 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
code-quality/landscape Code quality decreased by -0.13%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.07%) to 96.17%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment