New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HC structure search #718

Merged
merged 2 commits into from Aug 18, 2016

Conversation

Projects
None yet
3 participants
@chrisittner
Contributor

chrisittner commented Aug 17, 2016

This adds the HillClimbSearch structure estimator for BayesianModels.

Usage:

import pandas as pd
import numpy as np
from pgmpy.estimators import HillClimbSearch, BicScore

# create data sample with 9 random variables:
data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI'))
# add 10th dependent variable
data['J'] = data['A'] * data['B']

est = HillClimbSearch(data, scoring_method=BicScore(data))
best_model = est.estimate()

print(sorted(best_model.nodes()))
print(sorted(best_model.edges()))

Output:

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
[('A', 'J'), ('B', 'J')]

HC search starts from some DAG and proceeds by iteratively modifying the graph to maximise its score. Legal modifications are "add one edge", "flip one edge" and "remove one edge". See Section 18.4.3 and A.4.3 in the Koller&Fridman PGM book. Implementation is like Algorithm A5 (page 1155) and A6 (page 1157) in the book.

The estimate-method has three optional parameters:

  • start and optional starting point. Default: disconnected graph.
  • tabu_length if set to n, the last n modifications cannot be reversed by a search step. Default: 0
  • max_indegree if set to n, search space is restricted to networks where each node has at most n parents.
@coveralls

This comment has been minimized.

coveralls commented Aug 17, 2016

Coverage Status

Changes Unknown when pulling 7abb37b on chrisittner:hc-structure-search into * on pgmpy:dev*.

@chrisittner

This comment has been minimized.

Contributor

chrisittner commented Aug 17, 2016

Results depend on scoring_method used:

import pandas as pd
import numpy as np
from pgmpy.estimators import HillClimbSearch

# create data sample with 9 random variables:
data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI'))
# add 10th dependent variable
data['J'] = data['A'] * data['B']

est = HillClimbSearch(data)  # use default K2Score
best_model = est.estimate()

print(best_model.edges()))

Output:

[('B', 'A'), ('J', 'A'), ('J', 'B')]

@ankurankan ankurankan merged commit 7abb37b into pgmpy:dev Aug 18, 2016

3 checks passed

code-quality/landscape Code quality decreased by -0.28%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on dev at 96.414%
Details

@chrisittner chrisittner deleted the chrisittner:hc-structure-search branch Aug 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment