### 超参数
定义:在算法运行前需要确定的参数

### 模型参数
定义:在算法运行时需要学习的参数

在KNN中,没有模型参数,只有超参数.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets 

In [2]:
digits = datasets.load_digits()

In [3]:
X = digits.data
y = digits.target

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train,X_test,y_train,y_test  = train_test_split(X,y,random_state = 666)

In [6]:
from mymodule.kNN import KNNClassifier

In [7]:
my_knn_clf =KNNClassifier(k=3)

In [8]:
my_knn_clf.fit(X_train,y_train)

KNN(k=3)

In [18]:
y_predict = my_knn_clf.predict(X_test)

In [19]:
sum(y_predict == y_test) / len(y_test)

0.98444444444444446

In [20]:
from sklearn.metrics import accuracy_score

### 使用sklearn计算分类准确度

In [27]:
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier(n_neighbors=3)

In [28]:
knn_clf.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

In [29]:
knn_clf.score(X_test,y_test)

0.98222222222222222

对于KNeighborsClassifier的score,是通过accuracy进行评价

### 寻找最好的K

寻找方式
1. 遍历所有即将寻找的K
2. 将score最高的k保存下来.

In [43]:
best_score   = 0.0
best_k = -1
for k in range(1,20):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(X_train,y_train)
    score = knn_clf.score(X_test,y_test)
    if score > best_score:
        best_k = k
        best_score = score
print ('best_score:',best_score)
print ('best_k',best_k)

best_score: 0.986666666667
best_k 5


### 考虑距离,不考虑距离?

In [47]:
best_method = ''
best_score   = 0.0
best_k = -1
for method in ['uniform','distance']:
    for k in range(1,20):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=method)
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
print ('best_score:',best_score)
print ('best_k',best_k)
print ('best_method',best_method)

best_score: 0.986666666667
best_k 5
best_method uniform


### 搜索明可夫斯基距离的P

In [55]:
%%time

best_p = -1
best_score   = 0.0
best_k = -1
for k in range(1,20):
    for p in range(1,6):       
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights='distance',p = p)
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
            best_p = p
print ('best_score:',best_score)
print ('best_k',best_k)
print ('best_method',best_method)
print ('best_p',p)

best_score: 0.986666666667
best_k 5
best_method distance
best_p 5
CPU times: user 46.1 s, sys: 505 ms, total: 46.6 s
Wall time: 47.6 s


### grid search

1. 需要一个等待验证的实例 和 等待验证的参数
2. 将实例,参数传入  grid searh,进行fit
3. 得到交叉验证的参数属性,精确度
4. 可以得到n_jobs=-1,verbose = 2 进行全速搜索,与查看细节

In [23]:
param_grid = [
    {
        'weights':['uniform'],
        'n_neighbors':[i for i in range(1,11)]   
    },
    {
        'weights':['distance'],
        'n_neighbors':[i for i in range(1,11)] ,
        'p':[p for p in range(1,6)]    
    }
]

In [24]:
from sklearn.neighbors import KNeighborsClassifier

In [25]:
knn_clf  = KNeighborsClassifier()

In [26]:
#进行网格计算,导入网格交叉搜索模块
#对其传入knn_clf算法,组建网格搜索算法
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(knn_clf,param_grid)

In [27]:
%%time
#进行模型拟合
grid_search.fit(X_train,y_train)

CPU times: user 2min 29s, sys: 1.59 s, total: 2min 31s
Wall time: 2min 35s


GridSearchCV(cv=None, error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'weights': ['uniform'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}, {'weights': ['distance'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'p': [1, 2, 3, 4, 5]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [28]:
#查看拟合完成后的最佳参数属性
grid_search.best_estimator_
#可以看到 weights 为distance,p=3,

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=3,
           weights='distance')

In [29]:
#看最佳的准确度
grid_search.best_score_

0.98663697104677062

In [30]:
#查看拟合完成后的最佳参数属性
grid_search.best_params_

{'n_neighbors': 5, 'p': 3, 'weights': 'distance'}

此处参数的下划线,比如best_params_
表示这个参数不是用户传入的参数,而是计算出来的参数

In [31]:
#grid_search.best_estimator_返回的就是 KNeighborsClassifier的实例.
knn_clf = grid_search.best_estimator_

In [33]:
#使用最佳参数,传入测试数据,看看准确度情况.
knn_clf.score(X_test,y_test)

0.98222222222222222

### 关于gridsearch里面的参数

In [39]:
#多核处理,全速前进... verbose 能够给出详细信息
grid_search = GridSearchCV(knn_clf,param_grid,n_jobs=-1,verbose = 2)

In [40]:
grid_search.fit(X_train,y_train)

Fitting 3 folds for each of 60 candidates, totalling 180 fits
[CV] n_neighbors=1, weights=uniform ..................................
[CV] n_neighbors=1, weights=uniform ..................................
[CV] n_neighbors=1, weights=uniform ..................................
[CV] n_neighbors=2, weights=uniform ..................................
[CV] ................... n_neighbors=1, weights=uniform, total=   0.8s
[CV] n_neighbors=2, weights=uniform ..................................
[CV] ................... n_neighbors=1, weights=uniform, total=   0.8s
[CV] n_neighbors=2, weights=uniform ..................................
[CV] ................... n_neighbors=1, weights=uniform, total=   0.9s
[CV] n_neighbors=3, weights=uniform ..................................
[CV] ................... n_neighbors=2, weights=uniform, total=   0.9s
[CV] n_neighbors=3, weights=uniform ..................................
[CV] ................... n_neighbors=2, weights=uniform, total=   1.0s
[CV] ..........

[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   18.3s


[CV] .................. n_neighbors=10, weights=uniform, total=   0.9s
[CV] n_neighbors=1, p=3, weights=distance ............................
[CV] ............. n_neighbors=1, p=3, weights=distance, total=   0.8s
[CV] n_neighbors=1, p=3, weights=distance ............................
[CV] .................. n_neighbors=10, weights=uniform, total=   1.1s
[CV] n_neighbors=1, p=4, weights=distance ............................
[CV] ............. n_neighbors=1, p=3, weights=distance, total=   0.8s
[CV] n_neighbors=1, p=4, weights=distance ............................
[CV] ............. n_neighbors=1, p=3, weights=distance, total=   0.8s
[CV] n_neighbors=1, p=4, weights=distance ............................
[CV] .................. n_neighbors=10, weights=uniform, total=   1.1s
[CV] n_neighbors=1, p=5, weights=distance ............................
[CV] ............. n_neighbors=1, p=4, weights=distance, total=   0.8s
[CV] n_neighbors=1, p=5, weights=distance ............................
[CV] .

[CV] ............. n_neighbors=5, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=5, p=2, weights=distance ............................
[CV] ............. n_neighbors=5, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=5, p=3, weights=distance ............................
[CV] ............. n_neighbors=5, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=5, p=3, weights=distance ............................
[CV] ............. n_neighbors=4, p=5, weights=distance, total=   1.0s
[CV] n_neighbors=5, p=3, weights=distance ............................
[CV] ............. n_neighbors=4, p=5, weights=distance, total=   1.2s
[CV] n_neighbors=5, p=4, weights=distance ............................
[CV] ............. n_neighbors=5, p=3, weights=distance, total=   1.0s
[CV] n_neighbors=5, p=4, weights=distance ............................
[CV] ............. n_neighbors=5, p=3, weights=distance, total=   0.9s
[CV] n_neighbors=5, p=4, weights=distance ............................
[CV] .

[CV] ............. n_neighbors=9, p=1, weights=distance, total=   0.1s
[CV] n_neighbors=9, p=2, weights=distance ............................
[CV] ............. n_neighbors=9, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=9, p=2, weights=distance ............................
[CV] ............. n_neighbors=8, p=5, weights=distance, total=   1.0s
[CV] n_neighbors=9, p=2, weights=distance ............................
[CV] ............. n_neighbors=9, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=9, p=3, weights=distance ............................
[CV] ............. n_neighbors=9, p=2, weights=distance, total=   0.1s
[CV] n_neighbors=9, p=3, weights=distance ............................


[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  1.1min


[CV] ............. n_neighbors=8, p=5, weights=distance, total=   0.9s
[CV] n_neighbors=9, p=3, weights=distance ............................
[CV] ............. n_neighbors=8, p=5, weights=distance, total=   1.0s
[CV] n_neighbors=9, p=4, weights=distance ............................
[CV] ............. n_neighbors=9, p=3, weights=distance, total=   0.9s
[CV] n_neighbors=9, p=4, weights=distance ............................
[CV] ............. n_neighbors=9, p=3, weights=distance, total=   1.0s
[CV] n_neighbors=9, p=4, weights=distance ............................
[CV] ............. n_neighbors=9, p=3, weights=distance, total=   0.9s
[CV] n_neighbors=9, p=5, weights=distance ............................
[CV] ............. n_neighbors=9, p=4, weights=distance, total=   1.1s
[CV] n_neighbors=9, p=5, weights=distance ............................
[CV] ............. n_neighbors=9, p=4, weights=distance, total=   1.1s
[CV] n_neighbors=9, p=5, weights=distance ............................
[CV] .

[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:  1.4min finished


GridSearchCV(cv=None, error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=3,
           weights='distance'),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid=[{'weights': ['uniform'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}, {'weights': ['distance'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'p': [1, 2, 3, 4, 5]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=2)