## For English course students :

・Since the Japanese sentence and the English one represent the same content, it is sufficient to read the English.

・I'm not good at English, so I apologize if I made any grammar or other mistakes.

## Readings:
- [Grid search](https://note.com/okonomiyaki011/n/n5fb0365b5141)
- [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
- [RandomezedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)
- [Bayesian optimization](https://qiita.com/masasora/items/cc2f10cb79f8c0a6bbaa)
- [BayesSearchCV](https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html)
- [OPTUNA](https://tech.preferred.jp/ja/blog/optuna-release/)
- [OPTUNA tutorial](https://optuna.readthedocs.io/en/stable/tutorial/001_first.html#first)
- [vcopt](https://vigne-cla.com/vcopt-specification/)

--- 

## Overview
1. What is a hyperparameter ?
1. Introduction to hyperparameter optimization methods
    1. Grid search
    1. Random search
    1. Bayesian optimization
    1. GA
1. Exercise
1. Additional Problems 

---
# What is a hyperparameter?
A hyperparameter is a parameter that controls the behavior of a machine learning algorithm. Especially in deep learning, they correspond to parameters that cannot be optimized by the gradient method. For example, things like learning rate, batch size, and number of learning iterations are hyperparameters. Also, the number of layers and channels in the neural network, and the choice of whether to use Momentum SGD or Adam for training are also hyperparameters.

Adjusting hyperparameters is almost essential for a machine learning algorithm to perform well. In particular, deep learning tends to have a large number of hyperparameters, and their tuning has a large impact on performance.

--

# ハイパーパラメータとは？
ハイパーパラメータとは、機械学習アルゴリズムの挙動を制御するパラメータのこと。特に深層学習では勾配法によって最適化できないパラメータに相当する。例えば、学習率やバッチサイズ、学習イテレーション数といったようなものがハイパーパラメータである。また、ニューラルネットワークの層数やチャンネル数、学習に Momentum SGD を用いるかそれとも Adam を用いるか、といったような選択もハイパーパラメータである。

ハイパーパラメータの調整は機械学習アルゴリズムが力を発揮するためにほぼ不可欠である。特に、深層学習はハイパーパラメータの数が多い傾向がある上に、その調整が性能を大きく左右する。

---
# Introduction to hyperparameter optimization methods

In this material, I will introduce typical hyperparameter optimization methods and hyperparameter optimization libraries.
I will also introduce the concept of Pareto optimal solution, which represents the trade-off relationship in multi-objective optimization.

--

# ハイパーパラメータ最適化の手法紹介

今回は、ハイパーパラメータの最適化手法の代表的なものの紹介と、ハイパーパラメータ最適化ライブラリの紹介、
及びパレート最適解と言う、多目的最適化に於いてトレードオフの関係を表す概念を紹介する。

___
## Grid Search
As the name implies, grid search is an exploration method that explores the points of a grid, and tries combinations of all values of hyperparameters with each hyperparameter value as a discrete value.
The problem with grid search is that when the hyperparameters take discrete values as values, it becomes impossible to find the global optimal solution when there is a global optimal solution between the grid points. In addition, since all combinations of hyperparameter values are tested, the exploration time is generally longer than the other exploration methods mentioned here. However, if the hyperparameter values are only discrete values and the number of combinations is not very large, the exploration time can be very small.

[Advantages]

Useful when the value to be adjusted is known (ex: when the value takes discrete values)

It is also useful when the number of values to be adjusted is small.

[Disadvantages]

More time is needed because of the increase the number of model training (42768 evaluations are required even for 11×2×12×3×3×3×2×3)

Computational cost is very high.

The following figure shows the image of grid search.

--

## グリッドサーチ
まず、グリッドサーチであるが、その名の通り、格子点の点を探査していく探査手法で、各ハイパーパラメータの値を離散値として全ての値のハイパーパラメータの値の組み合わせを試すものである。
グリッドサーチの問題点としては、ハイパーパラメータが値として離散値を取る場合に、格子点と格子点の間に大域最適解が存在する場合に、大域最適解を見つけることができなくなってしまうという問題がある。また、全てのハイパーパラメータの値の組み合わせを試す為、一般的には今回挙げた他の探査手法よりも探査時間がかかってしまうという問題点がある。しかしながら、ハイパーパラメータの値として離散値しか取らず、組み合わせの数もそれほど多くないような場合は探査時間は非常に少なくて済む。

【メリット】

調整する値のあたりが付いている場合に有用 (値が離散値を取る場合等)

調整する値の数が少ない場合にも有用

【デメリット】

モデル訓練回数が増えるので時間が掛かる(11×2×12×3×3×3×2×3程度でも42768回の評価が必要)

計算コストが非常に高い

次図はグリッドサーチのイメージである。

<img src="implements_13/gridsearch.png" width="350">

A simple implementation that is easy to understand is to execute the for statement multiple times as follows.

--

感覚的に理解しやすい簡単な実装としては次のようにfor文をひたすら回す形となる。

In [1]:
def calc_score(param01, param02, param03, param04, param05, param06, param07, param08, param09, param10):
    #Write process to calculate the score
    result = param01 + param02 + param03 + param04 + param05 + param06 + param07 + param08 + param09 + param10
    return result

param_dist = {"param01":[1,2,3,4,5,6,7,8,9,10,11],
	              "param02":[1,2],
	              "param03":[1,2,3,4,5,6,7,8,9,10,11,12],
	              "param04":[1],
	              "param05":[1,2,3],
	              "param06":[1,2,3],
	              "param07":[1,2,3],
	              "param08":[1,2],
	              "param09":[1,2,3],
				  "param10":[1]
	              }

score = 0
params = []
[params.append(0) for i in range(10)]

for a in param_dist["param01"]:
    param01 = a
    for b in param_dist["param02"]:
        param02 = b
        for c in param_dist["param03"]:
            param03 = c
            for d in param_dist["param04"]:
                param04 = d
                for e in param_dist["param05"]:
                    param05 = e
                    for f in param_dist["param06"]:
                        param06 = f
                        for g in param_dist["param07"]:
                            param07 = g
                            for h in param_dist["param08"]:
                                param08 = h
                                for i in param_dist["param09"]:
                                    param09 = i
                                    for j in param_dist["param10"]:
                                        param10 = j
                                        t = calc_score(param01, param02, param03, param04, param05, param06, param07, param08, param09, param10)
                                        
                                        if t > score:
                                            score = t
                                            params[0] = param01
                                            params[1] = param02
                                            params[2] = param03
                                            params[3] = param04
                                            params[4] = param05
                                            params[5] = param06
                                            params[6] = param07
                                            params[7] = param08
                                            params[8] = param09
                                            params[9] = param10

print(params)

[11, 2, 12, 1, 3, 3, 3, 2, 3, 1]


There is a library called gridsearchCV. It automatically does the cross-validation and evaluation for you.

- [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

fit(), predict(), score(), get_params(), set_params()

↑ These functions must be implemented with exact match names.

--

ライブラリとしてはgridsearchCVというものがある。これは交差検証まで自動で行い、評価を行ってくれるものである。

- [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

fit(), predict(), score(), get_params(), set_params()

↑これらの関数は完全一致の関数名で必ず実装する必要がある。

In [2]:
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator

class MyEstimator(BaseEstimator):
    def __init__(self, param01, param02, param03, param04, param05, param06, param07):
        self.param01 = int(param01)
        self.param02 = int(param02)
        self.param03 = int(param03)
        self.param04 = int(param04)
        self.param05 = int(param05)
        self.param06 = int(param06)
        self.param07 = int(param07)
        
        self.make_string(self.param01, self.param02, self.param03, self.param04,self.param05, self.param06, self.param07)
        self.df = pd.read_csv(CSV_PATH, header=0)
    
    def make_string(self, param01, param02, param03, param04, param05, param06, param07):
        #param1
        noiseReduction = ["Gaussian:7 ","Gaussian:7 Gaussian:7 ","Gaussian:7 Gaussian:7 Gaussian:7 "]
        #param2
        filtersize_1 = ["3 ","5 ","7 ","9 "]
        #param3
        filtersize_2 = ["3 ","5 ","7 ","9 "]
        #param4
        padding = ["1 ","3 ","5 "]
        #param5
        activation_function_1 = ["softmax ","relu ","sigmoid ","tanh "]
        #param6
        activation_function_2 = ["softmax ","relu ","sigmoid ","tanh "]
        #param7
        optimization = ["Adam","RMSProp","AdaGrad"]
        
        self.param01_str = noiseReduction[param01 - 1]
        self.param02_str = filtersize_1[param02 - 1]
        self.param03_str = filtersize_2[param03 - 1]
        self.param04_str = padding[param04 - 1]
        self.param05_str = activation_function_1[param05 - 1]
        self.param06_str = activation_function_2[param06 - 1]
        self.param07_str = optimization[param07 - 1]
        
        self.search_str = (self.param01_str + self.param02_str + self.param03_str + self.param04_str + self.param05_str + self.param06_str + self.param07_str)
    
    def fit(self, x, y):
        return self 
    
    def predict(self, x):
        return [1.0]*len(x)
    
    def score(self, x, y):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        judge_value = float(MyEstimator(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07).search_data())
        return judge_value
    
    def get_params(self, deep=True):
        return {'param01': self.param01, 'param02': self.param02, 'param03': self.param03, 'param04': self.param04, 'param05': self.param05, 'param06': self.param06, 'param07': self.param07}
                
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self
    
    def search_data(self):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        
        return float(self.df[self.df['param'] == self.search_str]['Evaluation Value'])

param01 = 1
param02 = 1
param03 = 1
param04 = 1
param05 = 1
param06 = 1
param07 = 1
CSV_PATH  = "data_13/data.csv"

searchEstimator = MyEstimator(param01, param02, param03, param04, param05, param06, param07)
value = searchEstimator.search_data()

param_dist = {"param01":[1,2,3],
              "param02":[1,2,3,4],
              "param03":[1,2,3,4],
              "param04":[1,2,3],
              "param05":[1,2,3,4],
              "param06":[1,2,3,4],
              "param07":[1,2,3],
              }

model_grid = GridSearchCV(  estimator = searchEstimator, 
                                    param_grid = param_dist,
                                    cv=2,              #CV default=None クロスバリデーションの分割方法を決定
                                    #n_iter = 1,        #何パターンまでrando searchで調べるかの指定が必要 default:10
                                    #scoring="accuracy",#metrics モデル評価ルールを記述する
                                    return_train_score=False,
                                    n_jobs=-1           #num of core -1は全てのコアを利用
                                    #verbose=0,          
                                    #random_state=2, #乱数のSEEDを指定
                                    #return_train_score = True #スコアを返すか返さないか
                                    #scoring = 'roc_auc'
                                    #scoring = scoring
                                    ,refit=True
                                    )

x = searchEstimator.df["param"]
y = searchEstimator.df["Evaluation Value"].values

model_grid.fit(x,y)
result_df = pd.DataFrame(model_grid.cv_results_)
print(result_df)
print("##################################")
print(model_grid.best_params_)

      mean_fit_time  std_fit_time  mean_score_time  std_score_time  \
0          0.001500      0.000500         0.064462        0.005496   
1          0.001499      0.000499         0.092447        0.001499   
2          0.001000      0.000000         0.071459        0.008497   
3          0.001498      0.000498         0.073959        0.024984   
4          0.000498      0.000498         0.072957        0.001001   
...             ...           ...              ...             ...   
6907       0.001000      0.000002         0.029982        0.002000   
6908       0.000500      0.000500         0.031483        0.001498   
6909       0.000500      0.000500         0.032483        0.002498   
6910       0.000499      0.000499         0.041984        0.002990   
6911       0.001500      0.000502         0.061463        0.014493   

     param_param01 param_param02 param_param03 param_param04 param_param05  \
0                1             1             1             1             1   
1  

___ 
## Random Search
Next is random search, which is a method to randomly determine the value of each hyperparameter and use it to find the most accurate combination of hyperparameter values. The values of the hyperparameters are determined randomly and then evaluated.
In a random search, whether or not the best hyperparameter values are found is a matter of "luck". However, by retrying a combination that has been tried once, it is generally possible to find the global optimum solution faster than with grid search. Also, if the hyperparameters can take discrete values, it has the advantage of being able to find the global optimum solution between grid points, which is difficult to find with grid search.

The following figure shows the image of random search.

--

## ランダムサーチ
続いてランダムサーチであるが、これは各ハイパーパラメータの値をランダムに決定し、それを用いて最も精度の良いハイパーパラメータの値の組み合わせを探す手法である。ランダムにハイパーパラメータの値を決定し、評価していく。
ランダムサーチでは最適なハイパーパラメータの値が見つかるかどうかは"運"である。しかしながら、一度試した組み合わせが出た場合はリトライする等の工夫を加えることで、一般的にはグリッドサーチよりも早く大域最適解を見つることができる。また、ハイパーパラメータが離散値を取りうる場合は、グリッドサーチでは発見困難であった格子点と格子点の間の大域最適解を見つけることが可能となるという利点がある、



次図はランダムサーチのイメージである。

<img src="implements_13/randomsearch.png" width="350">

A simple implementation that is easy to understand is to use the rand() function to set parameter values and repeat the process over and over again, as shown below.

--

感覚的に理解しやすい簡単な実装としては、次のようにrand()関数でパラメータの値を設定し、それを何度も繰り返す形となる。

In [82]:
import random

def calc_score(param01, param02, param03, param04, param05, param06, param07, param08, param09, param10):
    #Write process to calculate the score
    result = param01 + param02 + param03 + param04 + param05 + param06 + param07 + param08 + param09 + param10
    return result

param_dist = {"param01":[1,2,3,4,5,6,7,8,9,10,11],
	              "param02":[1,2],
	              "param03":[1,2,3,4,5,6,7,8,9,10,11,12],
	              "param04":[1],
	              "param05":[1,2,3],
	              "param06":[1,2,3],
	              "param07":[1,2,3],
	              "param08":[1,2],
	              "param09":[1,2,3],
				  "param10":[1]
	              }

score = 0

for a in range(40000):
    param01 = random.randint(1,11)
    param02 = random.randint(1,2)
    param03 = random.randint(1,12)
    param04 = 1
    param05 = random.randint(1,3)
    param06 = random.randint(1,3)
    param07 = random.randint(1,3)
    param08 = random.randint(1,2)
    param09 = random.randint(1,3)
    param10 = 1
    
    t = calc_score(param01, param02, param03, param04, param05, param06, param07, param08, param09, param10)
                                        
    if t > score:
        t = score
        params[0] = param01
        params[1] = param02
        params[2] = param03
        params[3] = param04
        params[4] = param05
        params[5] = param06
        params[6] = param07
        params[7] = param08
        params[8] = param09
        params[9] = param10
        
print(params)

[3, 1, 4, 1, 3, 1, 2, 2, 3, 1]


There is a library called RandomizedSearchCV, which automatically does the cross-validation and evaluation for you.

- [RandomezedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)

Like a gridsearchCV, 

fit(), predict(), score(), get_params(), set_params()

↑ These functions must be implemented with exact match names.

--

ライブラリとしてはRandomizedSearchCVというものがあるが、これは交差検証まで自動で行い、評価を行ってくれるものである。

- [RandomezedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)

gridsearchCV同様に

fit(), predict(), score(), get_params(), set_params()

関数は完全一致の関数名で必ず実装する必要がある。

In [103]:
from sklearn.model_selection import RandomizedSearchCV

class MyEstimator(BaseEstimator):
    def __init__(self, param01, param02, param03, param04, param05, param06, param07):
        self.param01 = int(param01)
        self.param02 = int(param02)
        self.param03 = int(param03)
        self.param04 = int(param04)
        self.param05 = int(param05)
        self.param06 = int(param06)
        self.param07 = int(param07)
        
        self.make_string(self.param01, self.param02, self.param03, self.param04,self.param05, self.param06, self.param07)
        self.df = pd.read_csv(CSV_PATH, header=0)
    
    def make_string(self, param01, param02, param03, param04, param05, param06, param07):
        #param1
        noiseReduction = ["Gaussian:7 ","Gaussian:7 Gaussian:7 ","Gaussian:7 Gaussian:7 Gaussian:7 "]
        #param2
        filtersize_1 = ["3 ","5 ","7 ","9 "]
        #param3
        filtersize_2 = ["3 ","5 ","7 ","9 "]
        #param4
        padding = ["1 ","3 ","5 "]
        #param5
        activation_function_1 = ["softmax ","relu ","sigmoid ","tanh "]
        #param6
        activation_function_2 = ["softmax ","relu ","sigmoid ","tanh "]
        #param7
        optimization = ["Adam","RMSProp","AdaGrad"]
        
        self.param01_str = noiseReduction[param01 - 1]
        self.param02_str = filtersize_1[param02 - 1]
        self.param03_str = filtersize_2[param03 - 1]
        self.param04_str = padding[param04 - 1]
        self.param05_str = activation_function_1[param05 - 1]
        self.param06_str = activation_function_2[param06 - 1]
        self.param07_str = optimization[param07 - 1]
        
        self.search_str = (self.param01_str + self.param02_str + self.param03_str + self.param04_str + self.param05_str + self.param06_str + self.param07_str)
    
    def fit(self, x, y):
        return self 
    
    def predict(self, x):
        return [1.0]*len(x)
    
    def score(self, x, y):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        judge_value = float(MyEstimator(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07).search_data())
        return judge_value
    
    def get_params(self, deep=True):
        return {'param01': self.param01, 'param02': self.param02, 'param03': self.param03, 'param04': self.param04, 'param05': self.param05, 'param06': self.param06, 'param07': self.param07}
                
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self
    
    def search_data(self):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        
        return float(self.df[self.df['param'] == self.search_str]['Evaluation Value'])

param01 = 1
param02 = 1
param03 = 1
param04 = 1
param05 = 1
param06 = 1
param07 = 1
CSV_PATH  = "data_13/data.csv"

searchEstimator = MyEstimator(param01, param02, param03, param04, param05, param06, param07)
value = searchEstimator.search_data()

param_dist = {"param01":[1,2,3],
              "param02":[1,2,3,4],
              "param03":[1,2,3,4],
              "param04":[1,2,3],
              "param05":[1,2,3,4],
              "param06":[1,2,3,4],
              "param07":[1,2,3],
              }

model_random = RandomizedSearchCV(  estimator = searchEstimator, 
                                    param_distributions = param_dist,
                                    #cv=3,              #CV default=None クロスバリデーションの分割方法を決定
                                    n_iter = 10,        #何パターンまでrando searchで調べるか指定 default:10
                                    #scoring="accuracy",#metrics モデル評価ルールを記述する。
                                    n_jobs=-1           #num of core -1は全てのコアを利用
                                    #verbose=0,          
                                    #random_state=2, #乱数seed
                                    #return_train_score = True #スコアを返すか返さないか
                                    #scoring = 'roc_auc'
                                    #scoring = scoring
                                    ,refit = False
                                    )


x = searchEstimator.df["param"]
y = searchEstimator.df["Evaluation Value"].values
model_random.fit(x,y)
result_df = pd.DataFrame(model_random.cv_results_)

print(result_df)
print("##################################")
print(model_random.best_params_)

   mean_fit_time  std_fit_time  mean_score_time  std_score_time param_param07  \
0       0.001332  4.693475e-04         0.039312        0.007360             3   
1       0.000666  4.712019e-04         0.031982        0.002449             1   
2       0.000999  1.184119e-06         0.035980        0.003265             2   
3       0.001000  5.840039e-07         0.037646        0.003088             3   
4       0.001000  8.104673e-07         0.036644        0.001248             3   
5       0.001000  8.778064e-07         0.038644        0.008803             3   
6       0.001333  4.714828e-04         0.054302        0.003297             3   
7       0.000667  4.713150e-04         0.038310        0.004781             2   
8       0.002665  1.699160e-03         0.050971        0.004318             1   
9       0.001001  6.861413e-06         0.037642        0.003859             2   

  param_param06 param_param05 param_param04 param_param03 param_param02  \
0             2             1    

___
## Bayesian Optimization
Next is Bayesian estimation, which is a type of optimization algorithm that uses uncertainty to find the next value to search for. A Gaussian process is used as a proxy model to estimate the objective function.

Simply put, it is a method to determine the next value to search based on the previous result. It is similar to a human treasure hunt.

In Bayesian optimization, optimization is done sequentially using two strategies, "Exploitation" and "Utilization". "Exploitation" means that when a good result is obtained, we continue to investigate its vicinity.
On the other hand, "Exploitation" is a strategy to investigate different areas far away from the current position, thinking that there is a better combination than the current one. If the results are not satisfactory, you can use the rand() function to go to a different location.

In this way, Bayesian optimization allows us to try the next step in a balanced manner based on the previous result.


If I were to explain the detailed theory, I would not be able to finish in this time, so I will skip it. If you want to know more, please refer to the following URL.

- [Bayesian optimization](https://qiita.com/masasora/items/cc2f10cb79f8c0a6bbaa)

Please keep in mind that this is a way to use the previous results to gradually move in the optimal direction.

--

## ベイズ最適
次にベイズ推定であるが、これは不確かさを利用して次に探索を行うべき値を探していく最適化アルゴリズムの一種である。目的関数を推定する代理モデルにガウス過程が使われる。

簡単に言えば、前回の結果を基に次に調べる値を決める手法。人間が宝探しをするような感覚に近い。

ベイズ最適化では"探索"と"活用"の2つの戦略を使って最適化を順次的に行う。"活用"とは、良い結果が出た場合は継続してその近辺を調べることである。
一方、"探索"は現在の位置よりも、もっと良い組み合わせがあると考えてあえて現在位置から離れた異なる部分を調べる戦略である。イメージ的には、ある部分を探査していて、探査結果が思わしくないようであれば、rand()関数を用いて現在位置とは別の場所へ探査へ行くイメージである。

このようにベイズ最適化では前回の結果を踏まえて次をバランス良く試すことが可能である。


細かな理論まで説明するとこの時間では終わらない為、割愛する。詳しく知りたい人は次のURLを参照して下さい。

- [Bayesian optimization](https://qiita.com/masasora/items/cc2f10cb79f8c0a6bbaa)

前の結果を利用して、最適な方向へ徐々に進んでいく方法なんだなくらいにつかんでおいて下さい。

<img src="implements_13/bayesian0.jpg" width="350">



<img src="implements_13/bayesian.jpg" width="350">

There is a library called BayesSearchCV and Optuna.

- [BayesSearchCV](https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html)

- [Optuna tutorial](https://optuna.readthedocs.io/en/stable/tutorial/001_first.html#first)

In this time, I will show an example of BayesSearchCV. I have been implemented Optuna before, so if you would like to see an example of an implementation using optuna, please tell me.

--

ライブラリとしてはBayesSearchCVやoptunaというものがある。

- [BayesSearchCV](https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html)

- [Optuna tutorial](https://optuna.readthedocs.io/en/stable/tutorial/001_first.html#first)

今回は、BayesSearchCVを例示する。optunaの実装経験はあるので、もし、optunaを用いた実装例が見たい方は声をかけて下さい。

In [106]:
import warnings
from skopt import BayesSearchCV
from sklearn.base import BaseEstimator

warnings.simplefilter('ignore', UserWarning)

class MyEstimator(BaseEstimator):
    def __init__(self, param01, param02, param03, param04, param05, param06, param07):
        self.param01 = int(param01)
        self.param02 = int(param02)
        self.param03 = int(param03)
        self.param04 = int(param04)
        self.param05 = int(param05)
        self.param06 = int(param06)
        self.param07 = int(param07)
        
        self.make_string(self.param01, self.param02, self.param03, self.param04,self.param05, self.param06, self.param07)
        self.df = pd.read_csv(CSV_PATH, header=0)
    
    def make_string(self, param01, param02, param03, param04, param05, param06, param07):
        #param1
        noiseReduction = ["Gaussian:7 ","Gaussian:7 Gaussian:7 ","Gaussian:7 Gaussian:7 Gaussian:7 "]
        #param2
        filtersize_1 = ["3 ","5 ","7 ","9 "]
        #param3
        filtersize_2 = ["3 ","5 ","7 ","9 "]
        #param4
        padding = ["1 ","3 ","5 "]
        #param5
        activation_function_1 = ["softmax ","relu ","sigmoid ","tanh "]
        #param6
        activation_function_2 = ["softmax ","relu ","sigmoid ","tanh "]
        #param7
        optimization = ["Adam","RMSProp","AdaGrad"]
        
        self.param01_str = noiseReduction[param01 - 1]
        self.param02_str = filtersize_1[param02 - 1]
        self.param03_str = filtersize_2[param03 - 1]
        self.param04_str = padding[param04 - 1]
        self.param05_str = activation_function_1[param05 - 1]
        self.param06_str = activation_function_2[param06 - 1]
        self.param07_str = optimization[param07 - 1]
        
        self.search_str = (self.param01_str + self.param02_str + self.param03_str + self.param04_str + self.param05_str + self.param06_str + self.param07_str)
    
    def fit(self, x, y):
        return self 
    
    def predict(self, x):
        return [1.0]*len(x)
    
    def score(self, x, y):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        judge_value = float(MyEstimator(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07).search_data())
        return judge_value
    
    def get_params(self, deep=True):
        return {'param01': self.param01, 'param02': self.param02, 'param03': self.param03, 'param04': self.param04, 'param05': self.param05, 'param06': self.param06, 'param07': self.param07}
                
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self
    
    def search_data(self):
        self.make_string(self.param01, self.param02, self.param03, self.param04, self.param05, self.param06, self.param07)
        
        return float(self.df[self.df['param'] == self.search_str]['Evaluation Value'])
    
param01 = 1
param02 = 1
param03 = 1
param04 = 1
param05 = 1
param06 = 1
param07 = 1
CSV_PATH  = "data_13/data.csv"

bayesEstimator = MyEstimator(param01, param02, param03, param04, param05, param06, param07)
value = bayesEstimator.search_data()

param_dist = {"param01":[1,2,3],
              "param02":[1,2,3,4],
              "param03":[1,2,3,4],
              "param04":[1,2,3],
              "param05":[1,2,3,4],
              "param06":[1,2,3,4],
              "param07":[1,2,3],
              }

model_bayes = BayesSearchCV(estimator = bayesEstimator,
                                #param_distributions = param_dist,
                                search_spaces = param_dist,
                                cv = 2,              #CV default=None クロスバリデーションの分割方法を決定
                                n_iter = 10,         #おおよそ22回まででサンプリング点が収束した為、この値とした。
                                #何パターンまで調べるかの指定が必要 default:50(BayesSearchCVの場合) 
                                #interation numサンプリングされるパラメータ設定の数。 
                                #n_iterは実行時間とソリューションの品質をトレードオフにします。
                                #scoring="accuracy", #metrics モデル評価ルールを記述する。
                                n_jobs= -1,           #num of core -1は全てのコアを利用
                                #verbose=0,          
                                #random_state=2, #乱数seed
                                #return_train_score = True #スコアを返すか返さないか
                                #scoring = 'roc_auc'
                                #scoring = scoring
                                #search_spaces=param_grid,
                                #refit=True
                                #fit_params = pass_params
                                )

x = bayesEstimator.df["param"]
y = bayesEstimator.df["Evaluation Value"].values


model_bayes.fit(x,y)
result_df = pd.DataFrame(model_bayes.cv_results_)

print(result_df)
print("##################################")
print(model_bayes.best_params_)

   split0_test_score  split1_test_score  mean_test_score  std_test_score  \
0           0.135945           0.135945         0.135945    0.000000e+00   
1           1.884874           1.884874         1.884874    0.000000e+00   
2           0.319997           0.319997         0.319997    5.551115e-17   
3           0.769721           0.769721         0.769721    0.000000e+00   
4           0.015238           0.015238         0.015238    0.000000e+00   
5           0.316085           0.316085         0.316085    0.000000e+00   
6           0.949880           0.949880         0.949880    0.000000e+00   
7           0.230639           0.230639         0.230639    0.000000e+00   
8           0.538090           0.538090         0.538090    0.000000e+00   
9           0.818719           0.818719         0.818719    0.000000e+00   

   rank_test_score  mean_fit_time  std_fit_time  mean_score_time  \
0                9       0.000998  4.768372e-07         0.026485   
1                1       0.

---
## Optimization by GA

Finally, Optimization by GA. This is a method that uses a genetic algorithm to search for the best combination of hyperparameters. First, genes are generated randomly, and the ones with the best accuracy are passed on to the next generation. The best combination of hyperparameters is found through mutation and crossover.

My impression from practical use is that if the hyperparameters are discrete values, it is possible to find the ones with good accuracy very fast.

There is a library called vcopt developed by a startup from Tokyo Institute of Technology

--

## GAによる最適化

最後にGAによる最適解であるが、これは遺伝的アルゴリズムを利用し、最良のハイパーパラメータの組み合わせを探査する手法である。まずはランダムに遺伝子を生成し、その中から精度の良いものを次の世代へ受け継ぐ。その中で突然変異や、交叉等を行い、ハイパーパラメータの組み合わせとして最良のものを探し出す。

私が実務で使ったなりの感想ではあるが、ハイパーパラメータが値として離散値を取る場合は、非常に高速に、精度の良いものを見つけることができる印象であった。

ライブラリとしては東工大発のベンチャーが開発したvcoptというものがある。


In [90]:
import numpy as np
import sys
import pandas as pd
from vcopt import vcopt
import time

#評価関数
def score_func(params):
    search_str = make_string_for_GA(params[0],params[1],params[2],
                                    params[3],params[4],params[5],params[6])
    
    judge_value = float(df[df['param'] == search_str]['Evaluation Value'])

    return judge_value

def show_pool_func(pool, **info):
    #GA中の諸情報はinfoという辞書に格納されて渡される
    #これらを受け取って使用することができる
    gen = info['gen'] #現在の世代
    best_index = info['best_index'] #エリート個体のインデックス
    best_score = info['best_score'] #エリート個体の評価値
    mean_score = info['mean_score'] #個体群の平均評価値
    mean_gap = info['mean_gap'] #目標値と評価値の差の絶対値平均
    time = info['time'] #経過時間（秒）

    #可視化
    print(gen, best_score, best_index, time)

def make_string_for_GA(param01, param02, param03, param04, param05, param06, param07):
    #param1
    noiseReduction = ["Gaussian:7 ","Gaussian:7 Gaussian:7 ","Gaussian:7 Gaussian:7 Gaussian:7 "]
    #param2
    filtersize_1 = ["3 ","5 ","7 ","9 "]
    #param3
    filtersize_2 = ["3 ","5 ","7 ","9 "]
    #param4
    padding = ["1 ","3 ","5 "]
    #param5
    activation_function_1 = ["softmax ","relu ","sigmoid ","tanh "]
    #param6
    activation_function_2 = ["softmax ","relu ","sigmoid ","tanh "]
    #param7
    optimization = ["Adam","RMSProp","AdaGrad"]

    param01_str = noiseReduction[param01 - 1]
    param02_str = filtersize_1[param02 - 1]
    param03_str = filtersize_2[param03 - 1]
    param04_str = padding[param04 - 1]
    param05_str = activation_function_1[param05 - 1]
    param06_str = activation_function_2[param06 - 1]
    param07_str = optimization[param07 - 1]

    return (param01_str + param02_str + param03_str + param04_str + param05_str + param06_str + param07_str)

param01 = 1
param02 = 1
param03 = 1
param04 = 1
param05 = 1
param06 = 1
param07 = 1
CSV_PATH  = "data_13/data.csv"

# データ生成と配列の確保
df = pd.read_csv(CSV_PATH, header=0)
dataX = df["param"]
dataY = df["Evaluation Value"].values

target_str = make_string_for_GA(param01, param02, param03, param04, param05, param06, param07)
target_value = float(df[df['param'] == target_str]['Evaluation Value'])

search_str = ""
param_range = [[1,2,3], [2,3,4], [1,2,3,4], [1,2,3], [1,2,3,4], [1,2,3,4], [1,2,3]]

para, score = vcopt().dcGA(param_range ,
                               score_func,
                               9999, #最大化
                               show_pool_func = 'data/',
                               seed = None, #乱数seedを指定
                               pool_num = 100 #個体数を指定
                               ,max_gen = 2000 #最大世代数を指定
                               )

print(para, score)

________________________________________ info ________________________________________
para_range     : n=7
score_func     : <class 'function'>
aim            : ==9999.0
show_pool_func : 'data/'
seed           : None
pool_num       : 100
max_gen        : 2000
core_num       : 1 (*vcopt, vc-grendel)
_______________________________________ start ________________________________________
Scoring first gen 100/100        
|                                       +<                                        | gen=21, best_score=1.9998
_______________________________________ result _______________________________________
para = np.array([1, 4, 4, 3, 2, 3, 1])
score = 1.99977001
________________________________________ end _________________________________________
[1 4 4 3 2 3 1] 1.99977001


___
## Exercise 
(I thought you would be very busy with your graduation thesis, so I reduced the number of questions this time.)

(1) Write the most appropriate optimizing method for hyperparameters in the following cases.

1-1 When the hyperparameter takes discrete values as values and the number of combinations is not so large.

1-2 When the hyperparameter takes a continuous value as a value and the number of combinations is very large. (There is more than one answer, so please write the one you like.)

(2) Describe your research briefly, and write whether you think one of the methods listed in this lecture is appropriate for optimizing hyperparameters in your research, including reasons.

--

## 練習問題 
(卒業論文で非常に忙しいと思いましたので、今回は問題数を減らしました。)

(1) 次の場合に、ハイパーパラメータの探査手法として最も適切なものを書け。

1-1 ハイパーパラメータが値として離散値を取り、組み合わせ数もそれほど多くない場合。

1-2 ハイパーパラメータが値として連続値を取り、組み合わせの数が非常に多い場合。(解答は一つではないので、好きなものを書いて下さい。)

(2) あなたの研究内容を簡単に記述し、あなたの研究でハイパーパラメータの最適化を行う際に今回挙げた中では手法が適切と考えられるか、理由も含めて書け。

## Additional Problems 
(This does not affect the your score, but if you are free, please work on it.)

(3) Add learning iteration as a hyperparameter to the code of GridSearchCV.

(4) Add learning iteration as a hyperparameter to the code of RandmizedSearchCV.

(5) Add learning iteration as a hyperparameter to the code of Baysian optimization.

(6) Add learning iteration as a hyperparameter to the code of GA.


(The dataset does not contain learning iteration, so you do not need to run.)

--

## 追加問題 
(評価に影響はありませんが、もし、物足りないという方が居たら取り組んでみて下さい。)

(3) GridSearchCVのコードに学習イテレーションをハイパーパラメータとして加えよ。

(4) RandmizedSearchCVのコードに学習イテレーションをハイパーパラメータとして加えよ。

(5) Baysian optimizationのコードに学習イテレーションをハイパーパラメータとして加えよ。

(6) GAのコードに学習イテレーションをハイパーパラメータとして加えよ。


(データセットには学習イテレーションの項目は含まれていないので、実行はしなくて大丈夫です。)