chrisittner commented Aug 17, 2016 • edited

 This adds the `HillClimbSearch` structure estimator for BayesianModels. Usage: ```import pandas as pd import numpy as np from pgmpy.estimators import HillClimbSearch, BicScore # create data sample with 9 random variables: data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI')) # add 10th dependent variable data['J'] = data['A'] * data['B'] est = HillClimbSearch(data, scoring_method=BicScore(data)) best_model = est.estimate() print(sorted(best_model.nodes())) print(sorted(best_model.edges()))``` Output: ``````['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] [('A', 'J'), ('B', 'J')] `````` HC search starts from some DAG and proceeds by iteratively modifying the graph to maximise its score. Legal modifications are "add one edge", "flip one edge" and "remove one edge". See Section 18.4.3 and A.4.3 in the Koller&Fridman PGM book. Implementation is like Algorithm A5 (page 1155) and A6 (page 1157) in the book. The `estimate`-method has three optional parameters: `start` and optional starting point. Default: disconnected graph. `tabu_length` if set to `n`, the last `n` modifications cannot be reversed by a search step. Default: 0 `max_indegree` if set to `n`, search space is restricted to networks where each node has at most `n` parents.

``` Added HillClimbSearch structure estimator for discrete BNs ```
``` Tests for HillClimbSearch ```
 Results depend on `scoring_method` used: ```import pandas as pd import numpy as np from pgmpy.estimators import HillClimbSearch # create data sample with 9 random variables: data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI')) # add 10th dependent variable data['J'] = data['A'] * data['B'] est = HillClimbSearch(data) # use default K2Score best_model = est.estimate() print(best_model.edges()))``` Output: ``````[('B', 'A'), ('J', 'A'), ('J', 'B')] ``````

