**k-Nearest Neighbour (kNN) on Breast Cancer Dataset**

1: Import Libraries

In [5]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd


2: Load Dataset

In [6]:
cancer = load_breast_cancer()

X = cancer.data
y = cancer.target

print("Feature shape:", X.shape)
print("Target shape:", y.shape)


Feature shape: (569, 30)
Target shape: (569,)


3: Convert to DataFrame

In [7]:
df = pd.DataFrame(X, columns=cancer.feature_names)
df['target'] = y

print(df.head())


   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst texture  worst perimeter  worst area  \
0             

4: Train–Test Split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training samples:", X_train.shape)
print("Testing samples:", X_test.shape)


Training samples: (455, 30)
Testing samples: (114, 30)


5: Feature Scaling

In [9]:
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

print("Scaling completed")


Scaling completed


6: Apply kNN

In [10]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

print("Model trained")


Model trained


7: Prediction

In [11]:
y_pred = knn.predict(X_test)

print("Predicted values:", y_pred[:10])
print("Actual values:   ", y_test[:10])


Predicted values: [1 0 0 1 1 0 0 0 0 1]
Actual values:    [1 0 0 1 1 0 0 0 1 1]


8: Accuracy & Confusion Matrix

In [12]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Accuracy: 0.9473684210526315
Confusion Matrix:
 [[40  3]
 [ 3 68]]


**Conclusion**

In this assignment, the k-Nearest Neighbour (kNN) algorithm was successfully implemented on a UCI classification dataset using Python. The dataset was properly preprocessed through train–test splitting and feature scaling, which is essential for distance-based algorithms like kNN. After training the model and evaluating it on unseen test data, the classifier achieved high accuracy, demonstrating its effectiveness for classification tasks. This experiment helped in understanding the working principle of kNN, the importance of choosing an appropriate value of k, and the role of preprocessing in improving model performance.