In this tutorial, we will focus on the advanced usage of the evolutionary parameter grid. The contents included in this tutorial are as follows:
* Time limitations
* Incremental evolution
* Multiple processors

In [1]:
import importlib
import multiprocessing
import random
import time
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score, ParameterGrid, RandomizedSearchCV
from sklearn.tree import DecisionTreeRegressor
from evolutionary_grid.EvolutionaryParameterGrid import EAParameterGrid

random.seed(0)
np.random.seed(0)

X, y = load_boston(return_X_y=True)
param_grid = {
    "max_depth": np.random.randint(1, (X.shape[1] * .85), 20),
    "max_features": np.random.randint(1, X.shape[1], 20),
    "min_samples_leaf": [2, 3, 4, 5, 6],
    "criterion": ["mse", "mae"],
}

## Time limitations
Firstly, we focus on the time limitation problem. As is well known that in many auto-ml tasks, time limitation is a useful restriction. For example, if we want our auto-ml task to be finished in 1 day, then we need to specify the time restriction.
Due to the unique design of the evolutionary parameter grid, this function can be easily implemented.

In [2]:
%%time
# 2 seconds
time_limit = 2
end_time = time.time() + time_limit
grid = EAParameterGrid()
for param in grid.grid(param_grid):
    if time.time() > end_time:
        break
    tree = DecisionTreeRegressor(**param, random_state=0)
    score = cross_val_score(tree, X, y, cv=5)
    grid.set_fitness(np.mean(score))

gen	nevals	avg     	std     	min     	max     
0  	50    	0.172936	0.427611	-2.23543	0.521039
1  	50    	0.397736	0.11816 	0.0286067	0.522334
2  	50    	0.476128	0.101181	0.156791 	0.522334
Wall time: 2.01 s


Although we have not finished the experiment, we still can get the parameter up to now.

In [3]:
best_param = grid.best_individuals()[0]
tree = DecisionTreeRegressor(**best_param, random_state=0)
score = cross_val_score(tree, X, y, cv=5)
print(np.mean(score))

0.5223341948032781


## Incremental evolution
After we finish the grid search, we may find that we are unsatisfied with the search result. Thus, we need to search on the parameter grid again. However, it is inadvisable to search from scratch because we have explored some search space. Thus, a more appropriate method is incremental evolution, i.e., evolving the new proposals based on evolution history. For the evolutionary parameter grid, this is rather simple, because all of the histories have been recorded.

In [4]:
history_grid = grid.history_dict
grid = EAParameterGrid()
grid.history_dict.update(history_grid)
for param in grid.grid(param_grid):
    tree = DecisionTreeRegressor(**param, random_state=0)
    score = cross_val_score(tree, X, y, cv=5)
    grid.set_fitness(np.mean(score))



gen	nevals	avg     	std     	min      	max    
0  	50    	0.246747	0.167262	-0.220486	0.50912
1  	50    	0.407083	0.130203	-0.220486	0.522334
2  	50    	0.471337	0.130051	-0.126144	0.522334
3  	50    	0.5188  	0.0158054	0.41169  	0.522334
4  	50    	0.500592	0.0812368	0.0234607	0.522334
5  	50    	0.498683	0.0720047	0.11605  	0.522334
6  	50    	0.508421	0.0605207	0.116765 	0.522334
7  	50    	0.469123	0.155112 	-0.15608 	0.522334
8  	50    	0.517625	0.0240139	0.371285 	0.522334
9  	50    	0.485604	0.136243 	-0.364595	0.522334
10 	50    	0.502044	0.0700165	0.217832 	0.522334
11 	50    	0.495371	0.125475 	-0.290536	0.522334
12 	50    	0.466801	0.170676 	-0.290536	0.522334
13 	50    	0.480139	0.101222 	0.116765 	0.522334
14 	50    	0.505413	0.0592415	0.232679 	0.522334
15 	50    	0.480048	0.127543 	-0.290536	0.522334
16 	50    	0.493504	0.12166  	-0.290536	0.522334
17 	50    	0.502109	0.104936 	-0.15608 	0.522334
18 	50    	0.502689	0.0785047	0.116765 	0.522334
19 	50    	0.517424	0.0252

## Multiple processor

Nowadays, multiple processor support is the basic requirements of any machine learning packages. Most of them handle the multiple processor problem internally. This may cause some inconvenience if we want to manually dispatch the resources. Such as we want to dispatch the task to different remote computers. In contrast, the evolutionary parameter grid allows users to implement this function by user themselves. Thus, users can choose the appropriate multiple processor algorithm based on the specific property of their tasks.

To use the multiple processor support, firstly, we need to state that evaluation process is processed based on batch. Subsequently, in contrast to single processor evolutionary grid, in this case, the proposal generator generates multiple proposals, and then we can send these proposals to a processor pool to simultaneously evaluate those proposals. After doing that, we need to map those evaluated results to parameters, and send them to the evolutionary grid.

It should be noted that the number of proposals at each generation is equal to the population size. However, due to evolutionary grid automatically eliminating repetitive individuals, thus the number of proposals may be less than the population size.

In [5]:
import multiprocessing
p = multiprocessing.pool.ThreadPool(processes=4)


def evaluation_fun(param):
    tree = DecisionTreeRegressor(**param, random_state=0)
    score = cross_val_score(tree, X, y, cv=5)
    return np.mean(score)


grid = EAParameterGrid(batch_evaluate=True)
for params in grid.grid(param_grid):
    tasks = []
    for param in params:
        tasks.append((p.apply_async(evaluation_fun, args=(param, )), param))
    score_dict = {}
    for task, param in tasks:
        score_dict[str(param)] = task.get()
    grid.set_fitness(score_dict)



gen	nevals	avg      	std     	min     	max    
0  	50    	0.0664927	0.596926	-2.55928	0.52856
1  	50    	0.440414 	0.106535	0.13672 	0.541591
2  	50    	0.482913 	0.106317	0.0489679	0.541591
3  	50    	0.504207 	0.0767082	0.202489 	0.541591
4  	50    	0.511732 	0.0827495	0.180097 	0.541591
5  	50    	0.53644  	0.0267158	0.357141 	0.541591
6  	50    	0.526469 	0.0750886	0.102186 	0.541591
7  	50    	0.490479 	0.170253 	-0.530637	0.541591
8  	50    	0.497233 	0.119635 	0.00246661	0.541591
9  	50    	0.534798 	0.0269887	0.419505  	0.541591
10 	50    	0.508658 	0.0905341	0.202489  	0.541591
11 	50    	0.516521 	0.0769312	0.149913  	0.541591
12 	50    	0.535127 	0.0282272	0.366034  	0.541591
13 	50    	0.497246 	0.125107 	-0.0683365	0.541591
14 	50    	0.515921 	0.153101 	-0.548001 	0.541591
15 	50    	0.51819  	0.0793175	0.149913  	0.541591
16 	50    	0.527128 	0.0589629	0.219092  	0.541591
17 	50    	0.52316  	0.0766338	0.102186  	0.541591
18 	50    	0.499539 	0.162666 	-0.548001 	0.54159