The boston data set contains 506 samples of housing prices in Boston suburbs, each with 15 features characterizing various aspects of the neighborhoods, such as crime rate, average number of rooms, and proximity to employment centers. The target variable is the median value of owner-occupied homes in thousands of dollars. 

The boston Data set link:

https://www.kaggle.com/datasets/avish5787/boston-data-set?select=boston.csv

To enhance the model's performance, we conducted parameter tuning using GridSearchCV. The best parameters found were {'algorithm': 'auto', 'n_neighbors': 3, 'p': 1, 'weights': 'distance'}.
Additionally, we applied feature scaling with StandardScaler() and feature selection using SelectKBest(score_func=f_regression, k=5). Utilizing these optimized parameters, a new model was constructed, resulting in an R-squared value of 0.738.

### Step 1: Import Required Libraries

In [12]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, accuracy_score
import pandas as pd
from sklearn.feature_selection import SelectKBest, f_regression

import warnings
warnings.filterwarnings('ignore')

### Step 2: Load Dataset:

In [13]:
# Load the Boston Housing dataset
boston = pd.read_csv("boston.csv")
X = boston.drop(columns=['Price'])
y = boston.Price

In [14]:
boston.shape

(506, 15)

### Step 3: Split Dataset:

In [15]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Step 4: Feature Scaling

In [16]:
#Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Step 5: Feature Selection

In [17]:
#Feature Selection
selector = SelectKBest(score_func=f_regression, k=5)
X_train_selected = selector.fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(X_test_scaled)

### Step 6: Parameter tuning with GridSearchCV

In [18]:
#Parameter tuning with GridSearchCV
param_grid = {
    'n_neighbors': [3, 5, 7, 9, 11],
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'p': [1, 2]  # p=1 for Manhattan distance, p=2 for Euclidean distance
}
grid_search = GridSearchCV(KNeighborsRegressor(), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)
best_params = grid_search.best_params_
best_params

{'algorithm': 'auto', 'n_neighbors': 3, 'p': 1, 'weights': 'distance'}

### Step 7: Create k-NN model with best parameters

In [19]:
#Train the k-NN model
regressor = KNeighborsRegressor(**best_params)

### Step 8: Train the Model:

In [20]:
# Train the model
regressor.fit(X_train_scaled, y_train)

### Step 9: Make Predictions:

In [21]:
# Make predictions
y_pred = regressor.predict(X_test_scaled)

### Step 10: Evaluate the model

In [22]:
# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)

# Print evaluation metrics
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)
print("R-squared:", r2)

Mean Absolute Error: 2.5416224810186936
Mean Squared Error: 19.21622453179004
Root Mean Squared Error: 4.38363143201958
R-squared: 0.7379621819076296
