# **Problem Statement**  
## **3. Implement k-Nearest Neighbors (k-NN) classifier without using sklearn.**

Implement the k-Nearest Neighbors (k-NN) classifier from scratch, without using sklearn.

Given:
- A training dataset (X_train, y_train)
- A test point x_test
- A value k

The algorithm should:
- Compute the distance between x_test and all training points.
- Select the k closest neighbors.
- Predict the class using majority voting.

### Constraints & Example Inputs/Outputs

- Distance metric: Euclidean distance
- k must be ≤ number of training samples
- Features must be numeric
- Classification only (not regression)

Example Input:
```python
X_train = [[1,2], [2,3], [3,3], [6,5], [7,7]]
y_train = [0, 0, 0, 1, 1]
k = 3
x_test = [3,4]
```

Example Output:
```python
Predicted Class: 0
```

### Solution Approach

#### How KNN Works
1. Store all training data (lazy learning).
2. For each test point:
- Compute distance to all training points.
- Sort distances.
- Pick top k nearest neighbors.
- Predict the most frequent label.

#### Why no training phase?
k-NN is a lazy algorithm — all computation happens during prediction.

### Solution Code

In [1]:
# Approach 1: Brute Force KNN (Loop Based, Easy to Understand)
import math
from collections import Counter

def euclidean_distance(p1, p2):
    return math.sqrt(sum((a - b) ** 2 for a, b in zip(p1, p2)))

def knn_bruteforce(X_train, y_train, x_test, k):
    distances = []
    
    for i in range(len(X_train)):
        dist = euclidean_distance(X_train[i], x_test)
        distances.append((dist, y_train[i]))
    
    distances.sort(key=lambda x: x[0])
    k_nearest = distances[:k]
    
    labels = [label for _, label in k_nearest]
    return Counter(labels).most_common(1)[0][0]


### Alternative Solution

In [3]:
# Approach 2: Optimized Approach (Binary Search)
import numpy as np

def knn_optimized(X_train, y_train, x_test, k):
    X_train = np.array(X_train)
    y_train = np.array(y_train)
    x_test = np.array(x_test)
    
    # Vectorized distance computation
    distances = np.sqrt(np.sum((X_train - x_test) ** 2, axis=1))
    
    k_indices = np.argsort(distances)[:k]
    k_labels = y_train[k_indices]
    
    return np.bincount(k_labels).argmax()


### Alternative Approaches

1. Brute Force (Nested Loops)
- Simple and intuitive
- Time: O(n × d)

2. Vectorized NumPy (Best for small/medium data)
- Faster execution
- Same complexity but lower constant factors

3. KD-Tree / Ball Tree
- Used for large datasets
- Complex to implement from scratch

### Test Case

In [4]:
# Test Case 1: Simple 2D Classification

X_train = [
    [1, 2],
    [2, 3],
    [3, 3],
    [6, 5],
    [7, 7]
]
y_train = [0, 0, 0, 1, 1]

x_test = [3, 4]
k = 3

print("Brute Force Prediction:", knn_bruteforce(X_train, y_train, x_test, k))
print("Optimized Prediction:", knn_optimized(X_train, y_train, x_test, k))


Brute Force Prediction: 0
Optimized Prediction: 0


In [5]:
# Test Case 2: Different k values

for k in [1, 3, 5]:
    print(f"k={k}, Prediction:", knn_optimized(X_train, y_train, x_test, k))


k=1, Prediction: 0
k=3, Prediction: 0
k=5, Prediction: 0


In [6]:
# Test Case 3: Multiple Test Points

X_test = [[2, 2], [6, 6], [4, 4]]

for x in X_test:
    print(x, "→", knn_optimized(X_train, y_train, x, k=3))


[2, 2] → 0
[6, 6] → 1
[4, 4] → 0


In [7]:
# Test Case 4: Edge Case (k = 1)

print(knn_optimized(X_train, y_train, [7, 6], k=1))


1


## Complexity Analysis

### Time Complexity
- Distance computation: O(n × d)
- Sorting: O(n log n)
- Total: O(n log n)

### Space Complexity
- O(n) for distance storage

#### Thank You!!