In [1]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier

df = pd.DataFrame({
    'Fruit_Size': [1, 2, 3, 4, 5, 6, 7],
    'Fruit_Weight': [100, 200, 300, 400, 500, 600, 700],
    'Fruit_Name': ['Apple', 'Apple', 'Apple', 'Orange', 'Orange', 'WaterMelon', 'WaterMelon']
})
df

Unnamed: 0,Fruit_Size,Fruit_Weight,Fruit_Name
0,1,100,Apple
1,2,200,Apple
2,3,300,Apple
3,4,400,Orange
4,5,500,Orange
5,6,600,WaterMelon
6,7,700,WaterMelon


### KNN (K-Nearest Neighbors) algorithm
KNN (K-Nearest Neighbors) is a supervised learning algorithm used for both Classification and Regression, but it is most commonly used for Classification.

Key Idea
- KNN does not build a model during training.
- Instead, it stores the entire dataset and makes predictions by looking at the K nearest data points (neighbors) to a new input, based on distance.

When to Use KNN
| Condition               | Suitability                         |
| ----------------------- | ----------------------------------- |
| Small to medium dataset | ✅ Good                              |
| Large dataset           | ❌ Slow during prediction            |
| Features are scaled     | ✅ Required (very important!)        |
| Data has outliers       | ⚠️ Can affect distance, use scaling |


In [2]:
# Features (X) and target (y), 2D array is also accepted as X.
X = df[['Fruit_Size', 'Fruit_Weight']]
y = df['Fruit_Name']

# Create and train the model using .fit() method, k is the number of nearest neighbors the algorithm looks at when making a prediction. 
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)

# Take user inputs for prediction
newfruitSize = float(input("Enter fruit size"))
newfruitWeight = float(input("Enter fruit weight"))

# Create a DataFrame for the new input prediction since model.predict() expects a 2D array-like structure or pass a DataFrame
fruitName = model.predict(pd.DataFrame({ 'Fruit_Size': [newfruitSize], 'Fruit_Weight': [newfruitWeight] }))

# Output the prediction
print("Your fruit is:", fruitName)                      # Fruit size: 10, Fruit weight: 900

Your fruit is: ['WaterMelon']


How to Choose K
| K Value                | Behavior                                            |
| ---------------------- | --------------------------------------------------- |
| Very small (e.g., K=1) | Model becomes noisy / overfits                      |
| Very large             | Model becomes too general / underfits               |
| **Rule of thumb**      | Start with **K = √n** (where n = number of samples) |


Note: Important Step → Always Scale Data
- Because KNN uses distance, features with large values will dominate.
- Common scalers:StandardScaler(), MinMaxScaler()