### Grid Search on Breast Cancer Dataset and KNN Classifier

Previous example: [/examples/shallow/pca_and_svm.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/pca_and_svm.ipynb)  
Next example: [/examples/tensorboard.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/tensorboard.ipynb)

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

In [2]:
dataset = load_breast_cancer()
print(f'{type(dataset) = }')
print(f'{dataset.keys = }')
def print_field(field): print(type(dataset[field]), dataset[field].shape, dataset[field].dtype)
print_field('data')
print_field('target')

type(dataset) = <class 'sklearn.utils._bunch.Bunch'>
dataset.keys = <built-in method keys of Bunch object at 0x1290157b0>
<class 'numpy.ndarray'> (569, 30) float64
<class 'numpy.ndarray'> (569,) int64


In [3]:
def print_data(msg, data): print(f'{msg:25}: {data.shape}')
X_train, X_test, y_train, y_test = train_test_split(dataset['data'], dataset['target'])
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
print_data('X_train', X_train)
print_data('y_train', y_train)
print_data('X_test', X_test)
print_data('y_test', y_test)

X_train                  : (426, 30)
y_train                  : (426,)
X_test                   : (143, 30)
y_test                   : (143,)


In [4]:
def disp_results(func, gt, pred, msg): print(f'{func.__name__} of {msg}:\n{func(gt, pred)}')
def fit_and_analyse(model):
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    disp_results(classification_report, y_train, y_train_pred, 'training data')
    disp_results(confusion_matrix, y_train, y_train_pred, 'training data')
    print()
    disp_results(classification_report, y_test, y_test_pred, 'test data')
    disp_results(confusion_matrix, y_test, y_test_pred, 'test data')
    return model

In [5]:
model = fit_and_analyse(KNeighborsClassifier())
print(f'\n{model.classes_ = }')
print(f'{model.n_neighbors = }')
print(f'{model = }')

classification_report of training data:
              precision    recall  f1-score   support

           0       0.99      0.94      0.97       159
           1       0.97      1.00      0.98       267

    accuracy                           0.98       426
   macro avg       0.98      0.97      0.97       426
weighted avg       0.98      0.98      0.98       426

confusion_matrix of training data:
[[150   9]
 [  1 266]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      0.91      0.95        53
           1       0.95      1.00      0.97        90

    accuracy                           0.97       143
   macro avg       0.97      0.95      0.96       143
weighted avg       0.97      0.97      0.96       143

confusion_matrix of test data:
[[48  5]
 [ 0 90]]

model.classes_ = array([0, 1])
model.n_neighbors = 5
model = KNeighborsClassifier()


In [6]:
new_model = GridSearchCV(\
    KNeighborsClassifier(), \
        {'n_neighbors': [1, 2, 5, 10], \
        'weights': ['uniform', 'distance'], \
        'algorithm': ['ball_tree', 'kd_tree', 'brute']}, \
    refit=True)

new_model = fit_and_analyse(new_model)
print(f'\n{new_model.best_params_ = }')
print(f'{new_model.best_estimator_ = }')
print(f'{new_model.classes_ = }')
print(f'{new_model = }')

classification_report of training data:
              precision    recall  f1-score   support

           0       0.99      0.93      0.96       159
           1       0.96      0.99      0.98       267

    accuracy                           0.97       426
   macro avg       0.97      0.96      0.97       426
weighted avg       0.97      0.97      0.97       426

confusion_matrix of training data:
[[148  11]
 [  2 265]]

classification_report of test data:
              precision    recall  f1-score   support

           0       1.00      0.94      0.97        53
           1       0.97      1.00      0.98        90

    accuracy                           0.98       143
   macro avg       0.98      0.97      0.98       143
weighted avg       0.98      0.98      0.98       143

confusion_matrix of test data:
[[50  3]
 [ 0 90]]

new_model.best_params_ = {'algorithm': 'ball_tree', 'n_neighbors': 10, 'weights': 'uniform'}
new_model.best_estimator_ = KNeighborsClassifier(algorithm='ball_tr

Previous example: [/examples/shallow/pca_and_svm.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/pca_and_svm.ipynb)  
Next example: [/examples/tensorboard.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/tensorboard.ipynb)