P047 决策树 - 训练决策树分类模型

In [1]:
import numpy as np
import pandas as pd

In [2]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

In [3]:
np.random.seed(42)
raw_data = make_moons(n_samples=2000, noise=0.25, random_state=42)
data = raw_data[0]
target = raw_data[1]

In [4]:
data.shape, target.shape

((2000, 2), (2000,))

In [5]:
x_train, x_test, y_train, y_test = train_test_split(data, target)

In [6]:
classifer = DecisionTreeClassifier()
classifer.fit(x_train, y_train)

In [7]:
classifer.score(x_test, y_test)

0.902

P048 决策树 - max_depth 树的最大深度

In [8]:
classifer = DecisionTreeClassifier(max_depth=6)
classifer.fit(x_train, y_train)

In [9]:
classifer.score(x_test, y_test)

0.928

P049 决策树 - min_samples_leaf 叶节点所需的最小样本数

In [10]:
classifer = DecisionTreeClassifier(max_depth=6, min_samples_leaf=6)
classifer.fit(x_train, y_train)

In [11]:
classifer.score(x_test, y_test)

0.93

P050 决策树 - 使用网格搜索获得最优的模型参数

In [12]:
from sklearn.model_selection import GridSearchCV

In [13]:
params = {
    "max_depth" : np.arange(1, 10),
    "min_samples_leaf": np.arange(1, 20),
}

In [14]:
grid_search = GridSearchCV(
    classifer,
    param_grid=params,
    scoring="accuracy",
    cv=5
)

In [15]:
grid_search.fit(x_train, y_train)

In [16]:
grid_search.best_params_

{'max_depth': 6, 'min_samples_leaf': 6}