# Case Study: Classifying Fruits Using K-NN (K-Nearest Neighbors)


1.   In this case study, we will walk through the steps to classify different
types of fruits based on certain features using the K-Nearest Neighbors (K-NN) algorithm.
2.   K-NN is a supervised learning algorithm that assigns a class to a data point based on the majority class of its K nearest neighbors in the feature space.







# Step 1: Problem Understanding
The task is to classify fruits based on their characteristics, such as weight, color, and size. We will use K-NN to predict the fruit type based on these features. The dataset may consist of the following features:

1. Weight (grams)
2. Diameter (cm)
3. Color (encoded numerically, e.g., 0 for red, 1 for yellow, etc.)

Our goal is to:

1. Load and preprocess the dataset.
2. Train a K-NN model.
3. Evaluate its performance.
4. Make predictions for new fruit instances.

# Step 2: Dataset Preparation
Let's create a synthetic dataset for this case study. We'll generate three types of fruits: Apple, Banana, and Grapes.

**1. Apple: Weight ~150g, Diameter ~8 cm, Color = Red (0)**

**2. Banana: Weight ~120g, Diameter ~10 cm, Color = Yellow (1)**

**3. Grapes: Weight ~50g, Diameter ~3 cm, Color = Green (2)**

We'll represent each fruit type with these characteristics and create a dataset.

In [1]:
import numpy as np
import pandas as pd

# Create a synthetic dataset of fruits
data = {
    'Weight': [150, 120, 50, 160, 130, 55, 140, 110, 60],  # Weight in grams
    'Diameter': [8, 10, 3, 8.5, 10.5, 3.5, 8.2, 9.8, 3.2],  # Diameter in cm
    'Color': [0, 1, 2, 0, 1, 2, 0, 1, 2],  # 0: Red (Apple), 1: Yellow (Banana), 2: Green (Grapes)
    'Fruit': ['Apple', 'Banana', 'Grapes', 'Apple', 'Banana', 'Grapes', 'Apple', 'Banana', 'Grapes']  # Labels
}

# Convert to DataFrame
df = pd.DataFrame(data)
print(df)


   Weight  Diameter  Color   Fruit
0     150       8.0      0   Apple
1     120      10.0      1  Banana
2      50       3.0      2  Grapes
3     160       8.5      0   Apple
4     130      10.5      1  Banana
5      55       3.5      2  Grapes
6     140       8.2      0   Apple
7     110       9.8      1  Banana
8      60       3.2      2  Grapes


# Step 3: Data Preprocessing
In this step, we will:

1. Split the data into features (X) and labels (y).

2. Normalize the feature values to ensure all features contribute equally to the distance calculation in K-NN.

3. Split the dataset into training and testing sets for model evaluation.

In [9]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Features and Labels
X = df[['Weight', 'Diameter', 'Color']]  # Features
y = df['Fruit']  # Labels

# Split data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Normalize the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# Step 4: Train the K-NN Model
Now, we will train a K-NN model on the training data. We will use K = 3 (you can experiment with different values for K).

In [10]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Initialize K-NN classifier with K=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train_scaled, y_train)

# Predict on the test data
y_pred = knn.predict(X_test_scaled)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of K-NN model: {accuracy * 100:.2f}%')


Accuracy of K-NN model: 100.00%


# Step 5: Making Predictions
Now that the model is trained, we can make predictions for new instances of fruit based on their features.

In [12]:
# New fruit data (e.g., a fruit with weight = 140g, diameter = 9cm, and color = yellow)
new_fruit = np.array([[140, 9, 1]])  # Weight = 140g, Diameter = 9cm, Color = Yellow

# Normalize the new fruit data using the same scaler
new_fruit_scaled = scaler.transform(new_fruit)

# Predict the fruit type
predicted_fruit = knn.predict(new_fruit_scaled)
print(f'The predicted fruit is: {predicted_fruit[0]}')


The predicted fruit is: Banana


