Here’s a **simple from-scratch Python implementation of K-Nearest Neighbors (KNN)** for both **classification and regression**, along with detailed step-by-step explanations for each part.

We won’t use libraries like `scikit-learn` — only Python’s built-in functionality and `math` for distance calculation.

---

# ✅ Step-by-Step Plan

We’ll build:

1. A basic dataset
2. A generic KNN class (handles both classification and regression)
3. Distance calculation
4. Prediction logic
5. Classification: majority vote
6. Regression: average of K values
7. A test run to demonstrate usage

---

# 🧪 Sample Dataset

In [None]:
# Dataset: [feature1, feature2], label
classification_data = [
    ([1, 2], 0),
    ([2, 3], 0),
    ([3, 3], 1),
    ([6, 5], 1),
    ([7, 7], 1)
]

regression_data = [
    ([1, 2], 10),
    ([2, 3], 12),
    ([3, 3], 14),
    ([6, 5], 18),
    ([7, 7], 22)
]


# 🛠️ KNN Classification Implementation

In [None]:
import math
from collections import Counter

class KNN:
    def __init__(self, k=3):
        self.k = k
        self.X = []
        self.y = []

    def fit(self, data):
        # Split features and labels
        self.X = [item[0] for item in data]
        self.y = [item[1] for item in data]

    def _euclidean_distance(self, a, b):
        return math.sqrt(sum((ai - bi) ** 2 for ai, bi in zip(a, b)))

    def _get_neighbors(self, x):
        # Compute distance to all training points
        distances = [(self._euclidean_distance(x, xi), yi)
                     for xi, yi in zip(self.X, self.y)]
        # Sort by distance and pick k nearest
        neighbors = sorted(distances, key=lambda d: d[0])[:self.k]
        return neighbors

    def predict(self, x):
        neighbors = self._get_neighbors(x)
        # Extract the labels of neighbors and vote
        labels = [label for _, label in neighbors]
        most_common = Counter(labels).most_common(1)[0][0]
        return most_common

        

# 🛠️ KNN Regression Implementation

In [None]:
import math
from collections import Counter

class KNN:
    def __init__(self, k=3):
        self.k = k
        self.X = []
        self.y = []

    def fit(self, data):
        # Split features and labels
        self.X = [item[0] for item in data]
        self.y = [item[1] for item in data]

    def _euclidean_distance(self, a, b):
        return math.sqrt(sum((ai - bi) ** 2 for ai, bi in zip(a, b)))

    def _get_neighbors(self, x):
        # Compute distance to all training points
        distances = [(self._euclidean_distance(x, xi), yi)
                     for xi, yi in zip(self.X, self.y)]
        # Sort by distance and pick k nearest
        neighbors = sorted(distances, key=lambda d: d[0])[:self.k]
        return neighbors

    def predict(self, x):
        neighbors = self._get_neighbors(x)
        
        # Extract neighbor values and compute mean
        values = [value for _, value in neighbors]
        return sum(values) / len(values)


# 🔍 Explanation of Each Step

| Step                    | Code               | Explanation                                                             |
| ----------------------- | ------------------ | ----------------------------------------------------------------------- |
| `fit()`                 | Stores the dataset | KNN is lazy — it stores the entire dataset                              |
| `_euclidean_distance()` | Computes distance  | Euclidean distance to each training point                               |
| `_get_neighbors()`      | Finds k closest    | Sorts distances and picks top-k                                         |
| `predict()`             | Makes a prediction | - **Classification**: majority vote<br> - **Regression**: average value |

---

# ✅ Test: Classification

In [None]:
print("=== Classification ===")
knn_cls = KNN(k=3)
knn_cls.fit(classification_data)

test_point = [4, 4]
predicted_class = knn_cls.predict(test_point)
print(f"Test point: {test_point} → Predicted class: {predicted_class}")


### Output 
```python
Test point: [4, 4] → Predicted class: 1
```
---

# ✅ Test: Regression

In [None]:
print("\n=== Regression ===")
knn_reg = KNN(k=3)
knn_reg.fit(regression_data)

test_point = [4, 4]
predicted_value = knn_reg.predict(test_point)
print(f"Test point: {test_point} → Predicted value: {predicted_value:.2f}")


### Output 
```python
Test point: [4, 4] → Predicted value: 16.00
```

## 📌 Notes on Choosing K

* **Odd values** prevent ties (in classification).
* Try different K using **cross-validation**.
* Plot accuracy (classification) or MSE (regression) vs. K to find best.

Let’s now implement **K-Nearest Neighbors (KNN)** from scratch using **NumPy**, which makes operations cleaner and faster thanks to vectorization.

We'll build:

* A simple dataset
* A `KNN` class supporting both **classification** and **regression**
* Euclidean distance using NumPy
* Prediction logic using vectorized operations

---

## ✅ Step-by-Step KNN from Scratch (Using NumPy)

In [None]:
import numpy as np
from collections import Counter

class KNN:
    def __init__(self, k=3, task='classification'):
        self.k = k
        self.task = task
        self.X_train = None
        self.y_train = None

    def fit(self, X, y):
        self.X_train = np.array(X)
        self.y_train = np.array(y)

    def _euclidean_distance(self, x1):
        # Broadcasted distance to all training points
        return np.sqrt(np.sum((self.X_train - x1) ** 2, axis=1))

    def predict(self, X_test):
        X_test = np.array(X_test)
        predictions = []

        for x in X_test:
            distances = self._euclidean_distance(x)

            # Get indices of k nearest neighbors
            k_indices = distances.argsort()[:self.k]
            k_labels = self.y_train[k_indices]

            if self.task == 'classification':
                # Majority vote
                label = Counter(k_labels).most_common(1)[0][0]
            elif self.task == 'regression':
                # Mean of neighbors
                label = np.mean(k_labels)

            predictions.append(label)

        return np.array(predictions)


## 🔍 Step-by-Step Explanation

| Step                      | Description                                                                                                                                   |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `fit()`                   | Stores training data as NumPy arrays                                                                                                          |
| `_euclidean_distance(x1)` | Computes distance from `x1` to all training points using broadcasting                                                                         |
| `predict(X_test)`         | For each test point:<br>1. Compute distances<br>2. Get `k` nearest neighbors<br>3. Use majority vote (classification) or average (regression) |

---

## 🧪 Example: Classification

In [None]:
# Classification dataset
X_cls = [[1, 2], [2, 3], [3, 3], [6, 5], [7, 7]]
y_cls = [0, 0, 1, 1, 1]

knn_cls = KNN(k=3, task='classification')
knn_cls.fit(X_cls, y_cls)

X_test = [[4, 4]]
pred = knn_cls.predict(X_test)
print(f"Classification prediction: {pred[0]}")

## 🧪 Example: Regression

In [None]:
# Regression dataset
X_reg = [[1, 2], [2, 3], [3, 3], [6, 5], [7, 7]]
y_reg = [10, 12, 14, 18, 22]

knn_reg = KNN(k=3, task='regression')
knn_reg.fit(X_reg, y_reg)

X_test = [[4, 4]]
pred = knn_reg.predict(X_test)
print(f"Regression prediction: {pred[0]:.2f}")

## ✅ Output (example):

```
Classification prediction: 1
Regression prediction: 16.00
```