In [18]:
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm

import optuna

Выберем классификатор для нашего приложения, который будет лучше предсказывать.
На выбор возьмем RandomForestClassifier и SVC. Для определения лучшего из них и подбора гиперпараметров будем использовать optuna.

In [17]:
def objective(trial):
    iris = sklearn.datasets.load_iris()
    x, y = iris.data, iris.target

    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
        classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto", random_state=1)
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        rf_min_samples_split = trial.suggest_int("rf_max_depth", 2, 12, log=True)
        rf_min_samples_leaf = trial.suggest_int("rf_max_depth", 1, 5, log=True)
        classifier_obj = sklearn.ensemble.RandomForestClassifier(
            max_depth=rf_max_depth, min_samples_split=rf_min_samples_split, min_samples_leaf=rf_min_samples_leaf, 
            random_state=1, n_estimators=100
        )

    score = sklearn.model_selection.cross_val_score(classifier_obj, x, y, n_jobs=-1, cv=5)
    accuracy = score.mean()
    return accuracy



study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
print(study.best_trial)

[I 2020-07-20 13:24:28,012] Finished trial#0 with value: 0.9333333333333333 with parameters: {'classifier': 'SVC', 'svc_c': 3.9287858439559256e-05}. Best is trial#0 with value: 0.9333333333333333.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 2, 'high': 32, 'step': 1}

[I 2020-07-20 13:24:28,355] Finished trial#1 with value: 0.9466666666666665 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial#1 with value: 0.9466666666666665.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inco

[I 2020-07-20 13:24:32,907] Finished trial#15 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 2, 'high': 32, 'step': 1}

[I 2020-07-20 13:24:33,248] Finished trial#16 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsi

[I 2020-07-20 13:24:37,135] Finished trial#28 with value: 0.9466666666666665 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial#9 with value: 0.9533333333333334.
[I 2020-07-20 13:24:37,244] Finished trial#29 with value: 0.9400000000000001 with parameters: {'classifier': 'SVC', 'svc_c': 3140.0860676971733}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 2, 'high': 32, 'step': 1}

[I 2020-07-20 13:24:37,590] Finished trial#30 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distributio

[I 2020-07-20 13:24:41,468] Finished trial#42 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 2, 'high': 32, 'step': 1}

[I 2020-07-20 13:24:41,819] Finished trial#43 with value: 0.9466666666666665 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsi

[I 2020-07-20 13:24:45,771] Finished trial#55 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 2, 'high': 32, 'step': 1}

[I 2020-07-20 13:24:46,125] Finished trial#56 with value: 0.9533333333333334 with parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial#9 with value: 0.9533333333333334.

Inconsistent parameter values for distribution with name "rf_max_depth"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsi

[I 2020-07-20 13:24:50,094] Finished trial#68 with value: 0.96 with parameters: {'classifier': 'SVC', 'svc_c': 0.1623737520701852}. Best is trial#68 with value: 0.96.
[I 2020-07-20 13:24:50,204] Finished trial#69 with value: 0.96 with parameters: {'classifier': 'SVC', 'svc_c': 0.11264223471833289}. Best is trial#68 with value: 0.96.
[I 2020-07-20 13:24:50,314] Finished trial#70 with value: 0.96 with parameters: {'classifier': 'SVC', 'svc_c': 0.1099813109932592}. Best is trial#68 with value: 0.96.
[I 2020-07-20 13:24:50,424] Finished trial#71 with value: 0.96 with parameters: {'classifier': 'SVC', 'svc_c': 0.16030232484545778}. Best is trial#68 with value: 0.96.
[I 2020-07-20 13:24:50,535] Finished trial#72 with value: 0.9800000000000001 with parameters: {'classifier': 'SVC', 'svc_c': 0.5009608173441277}. Best is trial#72 with value: 0.9800000000000001.
[I 2020-07-20 13:24:50,643] Finished trial#73 with value: 0.96 with parameters: {'classifier': 'SVC', 'svc_c': 0.14499632834066295}. Be

FrozenTrial(number=96, value=0.9866666666666667, datetime_start=datetime.datetime(2020, 7, 20, 13, 24, 53, 43039), datetime_complete=datetime.datetime(2020, 7, 20, 13, 24, 53, 151946), params={'classifier': 'SVC', 'svc_c': 3.6616809069626406}, distributions={'classifier': CategoricalDistribution(choices=('SVC', 'RandomForest')), 'svc_c': LogUniformDistribution(high=10000000000.0, low=1e-10)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=96, state=TrialState.COMPLETE)


Лучшее качество показала модеь SVC с параметром регуляризации C = 3.6616809069626406. Эту модель будем использовать для финального предсказания