### KNN Regression:

K-Nearest Neighbors (KNN) regression is a supervised machine learning algorithm used for predictive analysis in regression tasks. Unlike its classification counterpart, KNN regression predicts continuous values. The core idea behind KNN regression is to predict the target value of a new data point by averaging the values of its K nearest neighbors in the feature space. These "neighbors" are determined based on a chosen distance metric, such as Euclidean distance. The algorithm calculates the weighted average of the target values from these K neighbors, with closer neighbors having a stronger influence on the prediction.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import math

In [3]:
path = '/content/drive/MyDrive/KNN/seeds.csv'
df = pd.read_csv(path)



# Select the input features and target variable
X = df[["Area", "Perimeter", "Compactness", "Kernel.Length", "Kernel.Width", "Asymmetry.Coeff", "Kernel.Groove"]]
y = df["Type"]
# Load the dataset
# Load the dataset while skipping the header row
with open('/content/drive/MyDrive/KNN/seeds.csv', 'r') as file:
    data = file.readlines()[1:]

data = [line.strip().split(',') for line in data]

# Extract input features and target variable
X = [[float(x) for x in row[0:7]] for row in data]
y = [float(row[7]) for row in data]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the KNN regression model
# Initialize and train the KNN regression model
k = 23# You can adjust the number of neighbors (K)
y_pred = []

for x in X_test:
    distances = [((sum([(a - b) ** 2 for a, b in zip(x, x_train)])), y_train[i]) for i, x_train in enumerate(X_train)]
    sorted_distances = sorted(distances, key=lambda x: x[0])
    k_nearest = sorted_distances[:k]
    predicted_value = sum([pair[1] for pair in k_nearest]) / k
    y_pred.append(predicted_value)

# Combine y_test and y_pred into a DataFrame
results = pd.DataFrame({"y_test": y_test, "y_pred": y_pred})

# Print the combined DataFrame
print(results)


# Print y_pred after prediction
print("\ny_pred after prediction:")
print(y_pred)
# Calculate evaluation metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = math.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("Root Mean Squared Error (RMSE):", rmse)
print("R-squared (R²):", r2)








    y_test    y_pred
0      2.0  2.000000
1      1.0  2.217391
2      2.0  2.000000
3      3.0  3.000000
4      2.0  1.695652
5      3.0  3.000000
6      2.0  1.826087
7      2.0  1.826087
8      2.0  2.000000
9      3.0  3.000000
10     3.0  3.000000
11     3.0  3.000000
12     1.0  2.217391
13     1.0  1.260870
14     2.0  2.000000
15     2.0  2.000000
16     1.0  1.000000
17     1.0  2.391304
18     2.0  2.000000
19     3.0  1.956522
20     1.0  1.086957
21     2.0  1.869565
22     2.0  1.608696
23     3.0  3.000000
24     2.0  1.956522
25     3.0  2.130435
26     3.0  2.826087
27     1.0  1.347826
28     1.0  2.652174
29     2.0  1.565217
30     3.0  1.869565
31     1.0  1.130435
32     3.0  3.000000
33     3.0  3.000000
34     2.0  2.000000
35     2.0  2.000000
36     2.0  2.000000
37     2.0  2.000000
38     2.0  1.826087
39     2.0  2.000000

y_pred after prediction:
[2.0, 2.217391304347826, 2.0, 3.0, 1.6956521739130435, 3.0, 1.826086956521739, 1.826086956521739, 2.0, 3.0, 3.0, 