# KNN (K-Nearest Neighbors)

KNN is a supervised learning algorithm used for both classification and regression tasks. It works by storing all training data and making predictions based on the 'k' closest data points (neighbors) in the feature space.

It’s called a lazy learner because it doesn’t build a model during training but delays computation until prediction time.

### 📈 KNeighborsRegressor – Parameters

| Parameter       | Default    | Description |
|----------------|------------|-------------|
| `n_neighbors`  | 5          | Number of neighbors to use for prediction. |
| `weights`      | 'uniform'  | `'uniform'`: all neighbors have equal weight. `'distance'`: closer neighbors have greater influence. |
| `algorithm`    | 'auto'     | Algorithm for computing nearest neighbors: `'auto'`, `'ball_tree'`, `'kd_tree'`, `'brute'`. |
| `leaf_size`    | 30         | Affects tree-based algorithm speed/memory tradeoff. |
| `p`            | 2          | Power parameter for Minkowski distance. `p=1`: Manhattan, `p=2`: Euclidean. |
| `metric`       | 'minkowski'| Distance metric used for the tree. |
| `metric_params`| None       | Additional arguments for the metric function. |
| `n_jobs`       | None       | Number of parallel jobs (`-1` to use all CPUs). |


In [56]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [57]:
# Load data for sklearn datasets (fetch_california_housing)
data = fetch_california_housing()

In [58]:
# split data into features and target
x = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

In [59]:
# display features
x.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [60]:
# Display target names
target_name = data.target_names
target_name

['MedHouseVal']

In [61]:
# Display target values
target = pd.Series(data.target, name='MedHouseVal')
target

0        4.526
1        3.585
2        3.521
3        3.413
4        3.422
         ...  
20635    0.781
20636    0.771
20637    0.923
20638    0.847
20639    0.894
Name: MedHouseVal, Length: 20640, dtype: float64

In [62]:
# train-test split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [63]:
# scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [64]:
# model training
model = KNeighborsRegressor(n_neighbors=4)
model.fit(X_train_scaled, y_train)

In [65]:
# predicting
y_pred = model.predict(X_test_scaled)

y_pred

array([0.4715  , 0.685   , 4.687505, ..., 5.00001 , 0.6995  , 1.9185  ])

In [66]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse:.3f}")
print(f"R² Score: {r2:.3f}")

Mean Squared Error (MSE): 0.447
R² Score: 0.659
