<a href="https://colab.research.google.com/github/pravinkr05/Data-Mining-Project/blob/main/KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The steps of the K-Nearest Neighbors (KNN) algorithm:

1. **Choose K**: Determine the number of neighbors (K) to consider.
2. **Calculate Distance**: Measure the distance between the query point and all points in the dataset. Common distance metrics include Euclidean distance, Manhattan distance, etc.
3. **Find Neighbors**: Identify the K nearest neighbors based on the calculated distances.
4. **Vote or Average**: For classification, let the K nearest neighbors "vote" for the class of the query point. For regression, take the average of the target values of the K nearest neighbors.
5. **Predict**: Assign the class label (for classification) or the predicted value (for regression) based on the votes or averages.
6. **Repeat**: Repeat the process for new query points.

Let's consider a simple example to illustrate the K-Nearest Neighbors (KNN) algorithm for classification:

Suppose we have a dataset of animals with two features: weight and height, and their corresponding labels indicating whether they are "cat" or "dog".

| Weight (kg) | Height (cm) | Label  |
|-------------|-------------|--------|
| 5           | 25          | Cat    |
| 8           | 30          | Cat    |
| 7           | 40          | Dog    |
| 10          | 35          | Dog    |

Now, let's say we have a new animal with weight 9 kg and height 33 cm, and we want to classify it as either a cat or a dog using the KNN algorithm with K=3.

1. **Choose K**: Let's choose K=3.
2. **Calculate Distance**: Calculate the Euclidean distance between the new animal and each of the animals in the dataset.
   - Distance from (9, 33) to (5, 25) = sqrt((9-5)^2 + (33-25)^2) = sqrt(16 + 64) = 8.94
   - Distance from (9, 33) to (8, 30) = sqrt((9-8)^2 + (33-30)^2) = sqrt(1 + 9) = 3.16
   - Distance from (9, 33) to (7, 40) = sqrt((9-7)^2 + (33-40)^2) = sqrt(4 + 49) = 7.81
   - Distance from (9, 33) to (10, 35) = sqrt((9-10)^2 + (33-35)^2) = sqrt(1 + 4) = 2.24
3. **Find Neighbors**: Choose the 3 nearest neighbors based on the calculated distances: (8, 30), (10, 35), and (5, 25).
4. **Vote**: Since K=3, we have three neighbors. Two of them are labeled as "Cat" and one is labeled as "Dog".
5. **Predict**: Based on the majority vote, we predict that the new animal is a "Cat".

So, according to the KNN algorithm, the new animal is classified as a "Cat".


In [2]:
#  In regression, we aim to predict a continuous value based on the input features
import numpy as np

# Step 1: Define the KNN Regression class
class KNNRegressor:
    def __init__(self, k=5):
        self.k = k

    # Step 2: Fit the model with training data
    def fit(self, X_train, y_train):
        self.X_train = X_train  # Training features
        self.y_train = y_train  # Corresponding target values

    # Step 3: Make predictions on test data
    def predict(self, X_test):
        predictions = []
        for x in X_test:
            # Step 4: Calculate distances between test point and all training points
            distances = np.sqrt(np.sum((self.X_train - x)**2, axis=1))
            # Step 5: Find the indices of k nearest neighbors
            nearest_indices = np.argsort(distances)[:self.k]
            # Step 6: Use the average of the target values of the k nearest neighbors as prediction
            nearest_neighbors = self.y_train[nearest_indices]
            prediction = np.mean(nearest_neighbors)
            predictions.append(prediction)
        return np.array(predictions)

# Example usage for regression
X_train_regression = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y_train_regression = np.array([10, 20, 30, 40])
X_test_regression = np.array([[2, 3], [6, 7]])

# Step 7: Create an instance of KNNRegressor and fit the model
knn_regressor = KNNRegressor(k=2)
knn_regressor.fit(X_train_regression, y_train_regression)

# Step 8: Make predictions on test data
predictions_regression = knn_regressor.predict(X_test_regression)
print("Regression Predictions:", predictions_regression)


Regression Predictions: [15. 35.]


In [4]:
# In classification, we aim to predict discrete class labels based on the input features.
from collections import Counter
import numpy as np

# Step 1: Define the KNN Classification class
class KNNClassifier:
    def __init__(self, k=5):
        self.k = k

    # Step 2: Fit the model with training data
    def fit(self, X_train, y_train):
        self.X_train = X_train  # Training features
        self.y_train = y_train  # Corresponding class labels

    # Step 3: Make predictions on test data
    def predict(self, X_test):
        predictions = []
        for x in X_test:
            # Step 4: Calculate distances between test point and all training points
            distances = np.sqrt(np.sum((self.X_train - x)**2, axis=1))
            # Step 5: Find the indices of k nearest neighbors
            nearest_indices = np.argsort(distances)[:self.k]
            # Step 6: Count the occurrences of each class label among the k nearest neighbors
            nearest_labels = self.y_train[nearest_indices]
            # Step 7: Predict the class label with the most occurrences
            most_common_label = Counter(nearest_labels).most_common(1)[0][0]
            predictions.append(most_common_label)
        return np.array(predictions)

# Example usage for classification
X_train_classification = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_train_classification = np.array([0, 0, 1, 1])
X_test_classification = np.array([[1.5, 2.5], [3.5, 4.5]])

# Step 8: Create an instance of KNNClassifier and fit the model
knn_classifier = KNNClassifier(k=3)
knn_classifier.fit(X_train_classification, y_train_classification)

# Step 9: Make predictions on test data
predictions_classification = knn_classifier.predict(X_test_classification)
print("Classification Predictions:", predictions_classification)


Classification Predictions: [0 1]
