# KNN – Clasificare și Regresie  
## Testarea mai multor valori pentru K

**Cerință:**  
Testați modelele de clasificare și regresie folosind diferite valori pentru K.  
Comparați rezultatele și explicați ce diferențe apar.


In [1]:
# Importuri
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import plotly.express as px
import plotly.graph_objects as go
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix, mean_squared_error

In [2]:
# Descărcare dataset
!pip install -q gdown
!gdown "https://drive.google.com/uc?id=1WO3yoK_Fd-v3JBLBCVNvZVZHfS5Un_Em"

df = pd.read_csv('strength_training_data.csv')
df.head()

Downloading...
From: https://drive.google.com/uc?id=1WO3yoK_Fd-v3JBLBCVNvZVZHfS5Un_Em
To: /content/strength_training_data.csv
  0% 0.00/3.17k [00:00<?, ?B/s]100% 3.17k/3.17k [00:00<00:00, 14.0MB/s]


Unnamed: 0,Bench_Press_kg,Squat_kg,Deadlift_kg,Body_Weight_kg,Strength_Level
0,83.4,91.6,90.9,75.0,Newbie
1,47.0,50.2,55.1,66.2,Newbie
2,69.7,70.8,74.7,60.8,Newbie
3,51.9,58.2,60.7,67.3,Newbie
4,55.8,65.3,71.0,84.5,Newbie


## Clasificare KNN – Strength Level

In [4]:
X = df[['Bench_Press_kg', 'Squat_kg', 'Deadlift_kg', 'Body_Weight_kg']]
y = df['Strength_Level']

le = LabelEncoder()
y_encoded = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded, test_size=0.25, random_state=42
)

In [5]:
k_values = [1, 3, 5, 7, 9, 11]
accuracies = []

for k in k_values:
    knn = KNeighborsClassifier(n_neighbors=k, metric='euclidean')
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    accuracies.append(acc)
    print(f"K={k} -> Accuracy={acc:.4f}")

K=1 -> Accuracy=0.7600
K=3 -> Accuracy=0.8800
K=5 -> Accuracy=0.7200
K=7 -> Accuracy=0.7200
K=9 -> Accuracy=0.6800
K=11 -> Accuracy=0.6800


In [6]:
fig = px.line(
    x=k_values,
    y=accuracies,
    markers=True,
    labels={'x': 'K', 'y': 'Accuracy'},
    title='Clasificare KNN – Accuracy vs K'
)
fig.show()

## Regresie KNN – Body Weight

In [7]:
X = df[['Bench_Press_kg', 'Squat_kg', 'Deadlift_kg']]
y = df['Body_Weight_kg']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

In [8]:
k_values = [1, 3, 5, 7, 9, 11]
mse_values = []

for k in k_values:
    knn = KNeighborsRegressor(n_neighbors=k, metric='euclidean')
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_values.append(mse)
    print(f"K={k} -> MSE={mse:.4f}")

K=1 -> MSE=197.6256
K=3 -> MSE=167.1348
K=5 -> MSE=130.9759
K=7 -> MSE=121.4660
K=9 -> MSE=120.3298
K=11 -> MSE=119.2395


In [9]:
fig = px.line(
    x=k_values,
    y=mse_values,
    markers=True,
    labels={'x': 'K', 'y': 'MSE'},
    title='Regresie KNN – MSE vs K'
)
fig.show()

## Concluzii

- **K mic** → overfitting (sensibil la zgomot)
- **K mare** → underfitting (model prea general)
- **K intermediar** oferă cele mai bune rezultate

Aceleași observații se aplică atât pentru clasificare, cât și pentru regresie.
