# k-Nearest Neighbors (KNN)

**This script demonstrates a basic machine learning workflow using the Iris dataset and the k-Nearest Neighbors (k-NN) algorithm. It covers:**

Exploratory Data Analysis: Inspecting the dataset's structure and contents.
Data Splitting: Dividing data into training and testing sets.
Visualization: Plotting scatter matrices to explore feature relationships.
Model Training: Training a k-NN classifier.
Prediction and Evaluation: Making predictions on new data and evaluating model performance on the test set.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

# Load Iris dataset
data = load_iris()

# Inspect dataset keys
print("Keys of iris_dataset:\n", data.keys())

Keys of iris_dataset:
 dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])


## Explore dataset details

In [2]:
# Classes in the dataset
print("Target names:", data['target_names']) 

Target names: ['setosa' 'versicolor' 'virginica']


In [3]:
# Feature column names
print("Feature names:\n", data['feature_names']) 

Feature names:
 ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [9]:
# Data type of features
print("Type of data:", type(data['data']))

# Dataset shape
print("Shape of data:", data['data'].shape) 

Type of data: <class 'numpy.ndarray'>
Shape of data: (150, 4)


In [None]:
# Preview first 5 rows
print("First five rows of data:\n", data['data'][:5])

First five rows of data:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [None]:
# Shape of labels
print("Shape of target:", data['target'].shape)

Shape of target: (150,)


In [13]:
# Label array
print("Target:\n", data['target'])  

Target:
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


## Data splitting

In [14]:

# Import train_test_split for data splitting
from sklearn.model_selection import train_test_split

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    data['data'], data['target'], random_state=0)

# Inspect split data shapes
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

X_train shape: (112, 4)
y_train shape: (112,)
X_test shape: (38, 4)
y_test shape: (38,)
