#### sklearn.model_selection.GridSearchCV

* class sklearn.model_selection.GridSearchCV(estimator, param_grid, *, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)

* 클래스 : property & method
* 이 클래스 안에는 method가 존재함
* 

* estimator : model, 알고리즘명
* param-grid : 딕셔너리 형태 리스트 {}, dict=딕셔너리 형태 ex. parameters, param_grid
* scoring : 평가 점수, classification에서 평가 지표가 accuracy 말고 recall 등등등 다른 걸 적어줄 수 있음
* n_jobs: 병렬 처리 연산
* refit : 학습시킴 deafault=True
* cv : cross-validation, 교차검증 몇 폴드? 5폴드 기본 none으로 됨
* verbose

* attributes: 객체는 method와 property로 구성됨 객체 지향 프로그램, 객체를 만들 때 어떻게 함? class로 만듦
* class: def ~~~ init:~~ 지역변수 지정함, self.xxxx
* ,(method) : function , class.~~~ : attributes
* attribute : 어떤 걸 실행시킬 때 나오는 것들 = cv_results_ : param_gamma, param_kernal, param_degree````````

* best_estimator_ : 최적의 값을 확인해볼 수 있음 ex. clf_best_estimator_
* best_score_ : 가장 높은 값
* best_params_ : 스코어 가장 잘 나온 값

*** method 주로 사용
- fit(X[, y, groups]) : Run fit with all sets of parameters
- predict(X) : Call predict on the estimator with the best found parameters.
- predict_proba(X): Call predict_proba on the estimator with the best found parameters.



In [5]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score, cross_validate
from sklearn.datasets import load_iris
import numpy as np

In [10]:
iris_data = load_iris()
features = iris.data
label = iris.target

dt_clf = DecisionTreeClassifier(random_state = 156)

data = iris_data.data
label = iris_data.target

In [6]:
scores = cross_val_score(dt_clf, data, label, scoring='accuracy', cv=3)
print('교차 검증별 정확도:', np.round(scores,4))
print('평균 검증 정확도', np.round(np.mean(scores),4))

교차 검증별 정확도: [0.98 0.94 0.98]
평균 검증 정확도 0.9667


In [17]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

In [22]:
iris = load_iris()
train_data = iris.data
train_label = iris.target
dt_clf.fit(train_data, train_label)


X_train, X_test, y_train, y_test = train_test_split(iris_data.data, iris_data.target, test_size=0.3, random_state=121)

In [26]:
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris_data.data, iris_data.target, test_size=0.2, random_state=121)

dtree = DecisionTreeClassifier()

parameters = {'max_depth':[2,3,5], 'min_samples_split': [2,3]}

In [31]:
import pandas as pd

grid_dtree = GridSearchCV(dtree, param_grid=parameters, cv=3, refit=True)
grid_dtree.fit(X_train,  y_train)

scores_df = pd.DataFrame(grid_dtree.cv_results_)
scores_df = [['params', 'mean_test_score', 'rank_test_score', 'split0_test_score', 'split1_test_score', 'split2_test_score']]

In [32]:
grid_dtree.best_estimator_

In [33]:
grid_dtree.best_params_

{'max_depth': 3, 'min_samples_split': 2}

In [34]:
grid_dtree.predict(X_test)

array([1, 2, 1, 0, 0, 1, 1, 1, 1, 2, 2, 1, 1, 0, 0, 2, 1, 0, 2, 0, 2, 2,
       1, 1, 1, 1, 0, 0, 2, 2])

In [35]:
grid_dtree.predict_proba(X_test)

array([[0.        , 0.97142857, 0.02857143],
       [0.        , 0.        , 1.        ],
       [0.        , 0.97142857, 0.02857143],
       [1.        , 0.        , 0.        ],
       [1.        , 0.        , 0.        ],
       [0.        , 0.97142857, 0.02857143],
       [0.        , 0.97142857, 0.02857143],
       [0.        , 0.97142857, 0.02857143],
       [0.        , 0.97142857, 0.02857143],
       [0.        , 0.        , 1.        ],
       [0.        , 0.        , 1.        ],
       [0.        , 0.97142857, 0.02857143],
       [0.        , 0.97142857, 0.02857143],
       [1.        , 0.        , 0.        ],
       [1.        , 0.        , 0.        ],
       [0.        , 0.        , 1.        ],
       [0.        , 0.97142857, 0.02857143],
       [1.        , 0.        , 0.        ],
       [0.        , 0.        , 1.        ],
       [1.        , 0.        , 0.        ],
       [0.        , 0.        , 1.        ],
       [0.        , 0.        , 1.        ],
       [0.

In [36]:
grid_dtree.best_params_

{'max_depth': 3, 'min_samples_split': 2}