Feature selection is a critical problem in the machine learning domain. Although a lot of algorithms can select a subset of features properly, most of them are local search methods, i.e. , sequentially delete redundant features. One obvious problem of these methods is that it can not well deal with the situation when two important features work together. Thus, a global search method is much better when we have enough computation resources.

In this tutorial, we demonstrate an example of using the evolutionary parameter grid to solve the feature selection problem.

Firstly, we need to load the experimental dataset and to define the search space. In the feature selection context, the search space is defined as the all possible combination of features.

In [1]:
import random
import numpy as np
from sklearn.datasets import load_boston
from sklearn.feature_selection import RFE, RFECV
from sklearn.model_selection import cross_val_score, ParameterGrid, RandomizedSearchCV
from sklearn.tree import DecisionTreeRegressor
from evolutionary_grid.EvolutionaryParameterGrid import EAParameterGrid

random.seed(0)
np.random.seed(0)

X, y = load_boston(return_X_y=True)
param_grid = {i: [0, 1] for i in range(0, X.shape[1])}

After defining the search space, we can use the evolutionary parameter grid to search for the best feature combination.

In [2]:
grid = EAParameterGrid()
for param in grid.grid(param_grid):
    tree = DecisionTreeRegressor(random_state=0)
    x_selected = np.delete(X, [k for k, v in param.items() if v == 0], axis=1)
    score = cross_val_score(tree, x_selected, y, cv=5)
    grid.set_fitness(np.mean(score))

gen	nevals	avg      	std     	min     	max     
0  	50    	-0.471925	0.854812	-2.70684	0.343539
1  	50    	0.12145  	0.594852	-3.5049 	0.399196
2  	50    	0.305183 	0.116072	-0.0294227	0.451824
3  	50    	0.35572  	0.158297	-0.44833  	0.451824
4  	50    	0.403926 	0.10041 	0.107902  	0.451824
5  	50    	0.434792 	0.0715176	0.0374705 	0.451824
6  	50    	0.440331 	0.051978 	0.129272  	0.451824
7  	50    	0.381472 	0.314724 	-1.74677  	0.451824
8  	50    	0.423004 	0.0759862	0.0994232 	0.451824
9  	50    	0.419499 	0.119304 	-0.265396 	0.465544
10 	50    	0.363552 	0.177005 	-0.265396 	0.465544
11 	50    	0.407677 	0.245077 	-1.23243  	0.465544
12 	50    	0.386496 	0.188441 	-0.40633  	0.465544
13 	50    	0.436505 	0.0866881	0.0185817 	0.465544
14 	50    	0.416819 	0.184955 	-0.489309 	0.465544
15 	50    	0.425154 	0.132511 	-0.40633  	0.465544
16 	50    	0.452015 	0.0541419	0.194809  	0.465544
17 	50    	0.433233 	0.0933886	0.0249828 	0.465544
18 	50    	0.407202 	0.119377 	0.0384408 	0

Finally, we can see that 7 features are selected as the best feature.

In [3]:
selected_features_num = sum(grid.best_individuals()[0].values())
selected_features_num

7

Is it a good performance? We can compare our results with training on the full dataset. We can see that after feature selection, the performance has improved obviously.

In [4]:
tree = DecisionTreeRegressor(random_state=0)
score = cross_val_score(tree, X, y, cv=5)
print(np.mean(score))

0.15017814401747404


Moreover, we also can compare the evolutionary feature selection method with other feature selection methods. In this tutorial, we will compare it with the recursive feature selection method.

In [5]:
rfe = RFECV(estimator=DecisionTreeRegressor(random_state=0),
            min_features_to_select=selected_features_num,
            cv=5,
            step=1)
rfe.fit(X, y)
tree = DecisionTreeRegressor(random_state=0)
score = cross_val_score(tree, rfe.transform(X), y, cv=5)
print(np.mean(score))

0.16566806168325074


Finally, we can find that, although recursive feature selection has slightly improved the performance, it is still much worse than the evolutionary based feature selection method.