HC structure search #718

Merged
merged 2 commits into from Aug 18, 2016

Conversation

Projects
None yet
3 participants
Contributor

chrisittner commented Aug 17, 2016 • edited

 This adds the `HillClimbSearch` structure estimator for BayesianModels. Usage: ```import pandas as pd import numpy as np from pgmpy.estimators import HillClimbSearch, BicScore # create data sample with 9 random variables: data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI')) # add 10th dependent variable data['J'] = data['A'] * data['B'] est = HillClimbSearch(data, scoring_method=BicScore(data)) best_model = est.estimate() print(sorted(best_model.nodes())) print(sorted(best_model.edges()))``` Output: ``````['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] [('A', 'J'), ('B', 'J')] `````` HC search starts from some DAG and proceeds by iteratively modifying the graph to maximise its score. Legal modifications are "add one edge", "flip one edge" and "remove one edge". See Section 18.4.3 and A.4.3 in the Koller&Fridman PGM book. Implementation is like Algorithm A5 (page 1155) and A6 (page 1157) in the book. The `estimate`-method has three optional parameters: `start` and optional starting point. Default: disconnected graph. `tabu_length` if set to `n`, the last `n` modifications cannot be reversed by a search step. Default: 0 `max_indegree` if set to `n`, search space is restricted to networks where each node has at most `n` parents.

chrisittner added some commits Aug 17, 2016

``` Added HillClimbSearch structure estimator for discrete BNs ```
``` a85ea11 ```
``` Tests for HillClimbSearch ```
``` 7abb37b ```

coveralls commented Aug 17, 2016 • edited

 Changes Unknown when pulling 7abb37b on chrisittner:hc-structure-search into * on pgmpy:dev*.
Contributor

chrisittner commented Aug 17, 2016

 Results depend on `scoring_method` used: ```import pandas as pd import numpy as np from pgmpy.estimators import HillClimbSearch # create data sample with 9 random variables: data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI')) # add 10th dependent variable data['J'] = data['A'] * data['B'] est = HillClimbSearch(data) # use default K2Score best_model = est.estimate() print(best_model.edges()))``` Output: ``````[('B', 'A'), ('J', 'A'), ('J', 'B')] ``````

ankurankan merged commit `7abb37b` into pgmpy:dev Aug 18, 2016 3 checks passed

3 checks passed

code-quality/landscape Code quality decreased by -0.28%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on dev at 96.414%
Details