**Hello everyone**! I am going to compare different nature inspired algorithms for optimizing hyperparameters of the sklearn models. I will run test on classification and one on regression. For each of the problem type I'll 3 different models.

For the nature inspired algorithm I'll use these:
* Artificial Bee Colony (ABC)
* Cuckoo Search (CS)
* Genetic Algorithm (GA)
* Grey Wolf Optimization (GWO)
* Particle Swarm Optimization (PSO)
* Inertia Weight Particle Swarm Optimization (IWPSO)
* Simulated Annealing (SA)

These algorithms were implemented in my library [HypONIC](https://github.com/slewie/HypONIC)

In [1]:
!pip install hyponic

Collecting hyponic
  Downloading hyponic-0.1.1-py3-none-any.whl (28 kB)
Installing collected packages: hyponic
Successfully installed hyponic-0.1.1
[0m

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from hyponic.hyponic import HypONIC
from hyponic.optimizers.swarm_based.ABC import ABC
from hyponic.optimizers.swarm_based.ACO import ACO
from hyponic.optimizers.swarm_based.CS import CS
from hyponic.optimizers.swarm_based.GWO import GWO
from hyponic.optimizers.swarm_based.PSO import PSO, IWPSO
from hyponic.optimizers.physics_based.SA import SA
from hyponic.optimizers.genetic_based.GA import GA
from hyponic.metrics.regression import mse
from hyponic.metrics.classification import f1_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
from time import time

In [4]:
def run(models, optimizers, test):
    time_results = {}
    score_results = {}
    time_results = {}
    score_results = {}
    for model in models:
        for optimizer in optimizers:
            results = test(model, optimizer, X_train_sc, np.array(y_train), X_test_sc, np.array(y_test))
            time_results[(model.__class__.__name__, optimizer.__name__)] = results[0]
            score_results[(model.__class__.__name__, optimizer.__name__)] = results[1]
        print("\nBest results:")
        print(f"Optimizer: {min(time_results, key=time_results.get)[1]}, Time: {min(time_results.values())}")
        print(f"Optimizer: {max(score_results, key=score_results.get)[1]}, Score: {max(score_results.values())}\n")
        print("==================\n")
    return time_results, score_results

In [5]:
optimizer_kwargs = {
    'epoch': 50,
    'population_size': 50,
}
optimizers = [ABC, CS, GWO, PSO, IWPSO, SA, GA]

# Classification

Models: SVM, KNN, Decision Tree

Dataset: Heart Failure Prediction Dataset

### Models and Optimizers hyperparameters

In [6]:
hyperparams_svc = {
    'C': (0.1, 10),
    'gamma': (0.001, 1),
    'kernel': ['rbf', 'poly', 'sigmoid'],
    'degree': [1, 2, 3, 4, 5]
}

hyperparams_knnc = {
    'n_neighbors': range(1, 20),
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
}

hyperparams_dtc = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': range(1, 20),
    'min_samples_split': range(2, 20),
    'min_samples_leaf': range(1, 20)
}

In [7]:
model_params_cs = {
    'SVC': hyperparams_svc,
    'KNeighborsClassifier': hyperparams_knnc,
    'DecisionTreeClassifier': hyperparams_dtc
}

### Preparing Dataset

In [8]:
df = pd.read_csv("/kaggle/input/heart-failure-prediction/heart.csv")

In [9]:
cat_columns = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']
num_columns = ['Age', 'RestingBP', 'Cholesterol', 'MaxHR', 'Oldpeak', 'FastingBS']
cat_encoded = pd.get_dummies(df[cat_columns])
X = pd.concat([df[num_columns], cat_encoded], axis=1)
y = df['HeartDisease']

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [11]:
scaler = MinMaxScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)

### Run tests

In [12]:
def run_test_classification(model, optimizer, X_train, y_train, X_test, y_test):
    print(f"Running {model.__class__.__name__} with {optimizer.__name__}")
    hyperparams = model_params_cs[model.__class__.__name__]
    hyponic = HypONIC(model, X_train, y_train, "log_loss", optimizer, **optimizer_kwargs)
    start = time()
    hyponic.optimize(hyperparams)
    end = time()
    time_taken = end - start
    print(f"Time taken: {time_taken}")
    optimized_model = hyponic.get_optimized_model()
    y_pred = optimized_model.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    print(f"F1 Score: {f1}\n")
    return time_taken, f1

In [13]:
models_cs = [DecisionTreeClassifier(), SVC(), KNeighborsClassifier()]

In [14]:
results_cs = run(models_cs, optimizers, run_test_classification)

Running DecisionTreeClassifier with ABC
Time taken: 20.592409372329712
F1 Score: 0.8300000000000002

Running DecisionTreeClassifier with CS
Time taken: 1.97603440284729
F1 Score: 0.8059701492537314

Running DecisionTreeClassifier with GWO
Time taken: 8.027542114257812
F1 Score: 0.8349514563106797

Running DecisionTreeClassifier with PSO
Time taken: 8.87399673461914
F1 Score: 0.8275862068965516

Running DecisionTreeClassifier with IWPSO
Time taken: 9.28302812576294
F1 Score: 0.8316831683168315

Running DecisionTreeClassifier with SA
Time taken: 6.780801296234131
F1 Score: 0.8516746411483254

Running DecisionTreeClassifier with GA
Time taken: 6.513257741928101
F1 Score: 0.8333333333333334


Best results:
Optimizer: CS, Time: 1.97603440284729
Optimizer: SA, Score: 0.8516746411483254


Running SVC with ABC
Time taken: 298.15651869773865
F1 Score: 0.7132867132867133

Running SVC with CS
Time taken: 26.309800148010254
F1 Score: 0.832535885167464

Running SVC with GWO
Time taken: 132.55893325

# Regression
Models: Decision Tree, KNN, Boosting

Dataset: Fish market

### Models and Optimizers hyperparameters

In [15]:
hyperparams_dtr = {
    'criterion': ['squared_error', 'friedman_mse', 'absolute_error'],
    'splitter': ['best', 'random'],
    'max_depth': [i for i in range(1, 20)],
    'min_samples_split': [i for i in range(2, 20)],
    'min_samples_leaf': [i for i in range(1, 20)]
}

hyperparams_knnr = {
    'n_neighbors': [i for i in range(1, 20)],
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
}

hyperparams_gbr = {
    'learning_rate': [0.001, 0.01, 0.1, 1],
    'n_estimators': [i for i in range(1, 100, 5)],
    'loss': ['squared_error', 'absolute_error'],
    'max_depth': [i for i in range(1, 20)],
    'min_samples_split': [i for i in range(2, 20)],
    'min_samples_leaf': [i for i in range(1, 20)]
}

In [16]:
model_params_reg = {
    'DecisionTreeRegressor': hyperparams_dtr,
    'KNeighborsRegressor': hyperparams_knnr,
    'GradientBoostingRegressor': hyperparams_gbr
}


### Preparing Dataset

In [17]:
df = pd.read_csv('/kaggle/input/fish-market/Fish.csv')

In [18]:
cat_columns = ['Species']
num_columns = ['Weight', 'Length1', 'Length2', 'Length3', 'Height', 'Width']
cat_encoded = pd.get_dummies(df[cat_columns])
X = pd.concat([df[num_columns], cat_encoded], axis=1)
y = df['Weight']

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [20]:
scaler = MinMaxScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)

### Running tests

In [21]:
def run_test_regression(model, optimizer, X_train, y_train, X_test, y_test):
    print(f"Running {model.__class__.__name__} with {optimizer.__name__}")
    hyperparams = model_params_reg[model.__class__.__name__]
    hyponic = HypONIC(model, X_train, y_train, mse, optimizer, **optimizer_kwargs)
    start = time()
    hyponic.optimize(hyperparams)
    end = time()
    time_taken = end - start
    print(f"Time taken: {time_taken}")
    optimized_model = hyponic.get_optimized_model()
    y_pred = optimized_model.predict(X_test)
    mse_score = mse(y_test, y_pred)
    print(f"MSE: {mse_score}\n")
    return time_taken, mse_score

In [22]:
models_reg = [KNeighborsRegressor(),DecisionTreeRegressor(),  GradientBoostingRegressor()]

In [23]:
results_reg = run(models_reg, optimizers, run_test_regression)

Running KNeighborsRegressor with ABC
Time taken: 13.176540613174438
MSE: 2597.2815625000003

Running KNeighborsRegressor with CS
Time taken: 1.5923922061920166
MSE: 2597.2815625

Running KNeighborsRegressor with GWO
Time taken: 5.540287971496582
MSE: 2597.2815625000003

Running KNeighborsRegressor with PSO
Time taken: 5.66847038269043
MSE: 1893.5906339586938

Running KNeighborsRegressor with IWPSO
Time taken: 5.9352240562438965
MSE: 18525.90878409437

Running KNeighborsRegressor with SA
Time taken: 6.364042043685913
MSE: 13831.54518528037

Running KNeighborsRegressor with GA
Time taken: 7.3911449909210205
MSE: 18506.286819876466


Best results:
Optimizer: CS, Time: 1.5923922061920166
Optimizer: IWPSO, Score: 18525.90878409437


Running DecisionTreeRegressor with ABC
Time taken: 9.806201934814453
MSE: 907.1253125000001

Running DecisionTreeRegressor with CS
Time taken: 1.036245346069336
MSE: 3001.1565625

Running DecisionTreeRegressor with GWO
Time taken: 4.2139892578125
MSE: 3027.43781

# Results

**Classification**:
Best result in terms of time shows Cuckoo Search, also it shows good performance in terms of F1 score. Best F1 score shows IWPSO. I think that CS is a best choice, because it very fast, therefore we can increase population size or number of epoch and achieve better scores.

**Regression**:
In this test, algorithms have strange MSE. Scores vary greatly from algorithm to algorithm. In terms of time, CS again performs better than other algorithms, IWPSO is also not far behindю

**Conclusion**:
CS and IWPSO are the best algorithms in terms of metrics and working time.