## **Analyze k-NN with Different k Values**
   - Run k-NN with k values of 3, 5, 7, and 9.
   - For each k, analyze the precision and recall metrics to understand the trade-offs.

In [2]:
# Imports
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load dataset
df = pd.read_csv('spambase.csv')

# Separate features and target
X = df.drop('class', axis=1)
y = df['class']

# Split dataset into training and testing sets

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

## k-NN

In [10]:
from sklearn.neighbors import KNeighborsClassifier

neighbors = [3, 5, 7, 9]

for n in neighbors:
  cls = KNeighborsClassifier(n_neighbors=n)
  cls.fit(X_train, y_train)
  pred = cls.predict(X_test)
  print(f"==================={n}===================")
  print(f"{classification_report(y_test, pred)}")

              precision    recall  f1-score   support

           0       0.81      0.85      0.83       822
           1       0.76      0.72      0.74       559

    accuracy                           0.80      1381
   macro avg       0.79      0.78      0.79      1381
weighted avg       0.79      0.80      0.79      1381

              precision    recall  f1-score   support

           0       0.82      0.85      0.84       822
           1       0.77      0.72      0.74       559

    accuracy                           0.80      1381
   macro avg       0.79      0.79      0.79      1381
weighted avg       0.80      0.80      0.80      1381

              precision    recall  f1-score   support

           0       0.81      0.86      0.83       822
           1       0.77      0.71      0.74       559

    accuracy                           0.80      1381
   macro avg       0.79      0.78      0.79      1381
weighted avg       0.80      0.80      0.80      1381

              preci

## Cross-Validation

In [12]:
from sklearn.model_selection import cross_val_score

knn = KNeighborsClassifier()

for n in [3, 5, 7, 9]:
  knn = KNeighborsClassifier(n_neighbors=n)
  scores = cross_val_score(knn, X_train, y_train, cv=10, scoring='f1_macro')
  print(f"k={n}: {scores.mean()}")


k=3: 0.7877645358781136
k=5: 0.7769750133256723
k=7: 0.7694562999396758
k=9: 0.7690647214134996


## Results

After analyzing all 4 different n values, no substantial change was noted (difference of 3 pp). But we can notice that by the last n value (9) the recall and precision were the worst, what can mean that are more different classes inside a greater range.