# k-Nearest Neighbors (Classification)

Use KNN to classify observations based on nearby points.

## Setup: Load the Breast Cancer Dataset

In [2]:
import cuanalytics as ca

In [3]:
# Load data and create train/test split
df = ca.load_breast_cancer_data()
train_df, test_df = ca.split_data(df, test_size=0.2, random_state=42)
train_df.head()


Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,diagnosis
68,9.029,17.33,58.79,250.5,0.1066,0.1413,0.313,0.04375,0.2111,0.08046,...,22.65,65.5,324.7,0.1482,0.4365,1.252,0.175,0.4228,0.1175,B
181,21.09,26.57,142.7,1311.0,0.1141,0.2832,0.2487,0.1496,0.2395,0.07398,...,33.48,176.5,2089.0,0.1491,0.7584,0.678,0.2903,0.4098,0.1284,M
63,9.173,13.86,59.2,260.9,0.07721,0.08751,0.05988,0.0218,0.2341,0.06963,...,19.23,65.59,310.1,0.09836,0.1678,0.1397,0.05087,0.3282,0.0849,B
248,10.65,25.22,68.01,347.0,0.09657,0.07234,0.02379,0.01615,0.1897,0.06329,...,35.19,77.98,455.7,0.1499,0.1398,0.1125,0.06136,0.3409,0.08147,B
60,10.17,14.88,64.55,311.9,0.1134,0.08061,0.01084,0.0129,0.2743,0.0696,...,17.45,69.86,368.6,0.1275,0.09866,0.02168,0.02579,0.3557,0.0802,B


## Step 1: Fit a KNN Classifier

In [4]:
knn = ca.fit_knn_classifier(train_df, formula='diagnosis ~ .', k=5)



KNN Classifier fitted successfully!
  Classes: ['B', 'M']
  Features: 30
  Training samples: 455
  k: 5


## Step 2: Evaluate Performance

In [5]:
train_report = knn.score(train_df)
test_report = knn.score(test_df)
print(f"Train accuracy: {train_report['accuracy']:.2%}")
print(f"Test accuracy: {test_report['accuracy']:.2%}")



SCORE REPORT
Accuracy: 94.07%
Kappa: 0.8702

Confusion Matrix:
          Pred B  Pred M
Actual B     281       5
Actual M      22     147

Per-Class Metrics:
   precision  recall  sensitivity  specificity     f1
B     0.9274  0.9825       0.9825       0.8698 0.9542
M     0.9671  0.8698       0.8698       0.9825 0.9159

SCORE REPORT
Accuracy: 95.61%
Kappa: 0.9045

Confusion Matrix:
          Pred B  Pred M
Actual B      71       0
Actual M       5      38

Per-Class Metrics:
   precision  recall  sensitivity  specificity     f1
B     0.9342  1.0000       1.0000       0.8837 0.9660
M     1.0000  0.8837       0.8837       1.0000 0.9383
Train accuracy: 94.07%
Test accuracy: 95.61%


## Step 3: Summary

In [6]:
knn.summary()



KNN CLASSIFIER SUMMARY
Target: diagnosis
Classes: ['B', 'M']
Features: 30
k: 5
Weights: uniform
Metric: minkowski
Training samples: 455

TRAINING FIT:
----------------------------------------------------------------------
Training Accuracy: 94.07%

Training Confusion Matrix:
          Pred B  Pred M
Actual B     281       5
Actual M      22     147

Kappa: 0.8702



## ðŸŽ“ Your Turn

- Try different values of k (e.g., 1, 3, 7, 11).
- Compare KNN to logistic regression on the same dataset.