## Performing Regression using SVR and KNN on Toyoto_Corrola dataset.

In [1]:
# Importing the required libraries
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# Loading the dataset
df = pd.read_csv("E:\\Datas\\Toyoto_Corrola.csv")
df.head(10)

Unnamed: 0,Id,Model,Price,Age_08_04,KM,HP,Doors,Cylinders,Gears,Weight
0,1,TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors,13500,23,46986,90,3,4,5,1165
1,2,TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors,13750,23,72937,90,3,4,5,1165
2,3,ÊTOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors,13950,24,41711,90,3,4,5,1165
3,4,TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors,14950,26,48000,90,3,4,5,1165
4,5,TOYOTA Corolla 2.0 D4D HATCHB SOL 2/3-Doors,13750,30,38500,90,3,4,5,1170
5,6,TOYOTA Corolla 2.0 D4D HATCHB SOL 2/3-Doors,12950,32,61000,90,3,4,5,1170
6,7,ÊTOYOTA Corolla 2.0 D4D 90 3DR TERRA 2/3-Doors,16900,27,94612,90,3,4,5,1245
7,8,TOYOTA Corolla 2.0 D4D 90 3DR TERRA 2/3-Doors,18600,30,75889,90,3,4,5,1245
8,9,ÊTOYOTA Corolla 1800 T SPORT VVT I 2/3-Doors,21500,27,19700,192,3,4,5,1185
9,10,ÊTOYOTA Corolla 1.9 D HATCHB TERRA 2/3-Doors,12950,23,71138,69,3,4,5,1105


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1436 entries, 0 to 1435
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Id         1436 non-null   int64 
 1   Model      1436 non-null   object
 2   Price      1436 non-null   int64 
 3   Age_08_04  1436 non-null   int64 
 4   KM         1436 non-null   int64 
 5   HP         1436 non-null   int64 
 6   Doors      1436 non-null   int64 
 7   Cylinders  1436 non-null   int64 
 8   Gears      1436 non-null   int64 
 9   Weight     1436 non-null   int64 
dtypes: int64(9), object(1)
memory usage: 112.3+ KB


In [4]:
# Split the data into features and target
X = df.drop(['Price','Id','Model'], axis=1)
y = df['Price']

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Support Vector Regression (SVR)

In [7]:
svr_model = SVR()
svr_param_grid = {'C': [0.1, 1, 10, 100],
                  'kernel': ['linear', 'rbf', 'poly'],
                  'gamma': ['scale', 'auto']}

In [8]:
# Hyperparamter Tuning
svr_grid_search = GridSearchCV(svr_model, svr_param_grid, cv=5, scoring='r2')
svr_grid_search.fit(X_train_scaled, y_train)

In [9]:
# Best parameters and Best Score
print("SVR Best Parameters:", svr_grid_search.best_params_)
print("SVR Best Score:", svr_grid_search.best_score_)

SVR Best Parameters: {'C': 100, 'gamma': 'scale', 'kernel': 'linear'}
SVR Best Score: 0.8569554414376158


In [11]:
# Evaluate the SVR model
svr_predictions = svr_grid_search.predict(X_test_scaled)

svr_rmse = mean_squared_error(y_test, svr_predictions, squared=False)
svr_r2 = r2_score(y_test, svr_predictions)

print(f"SVR Root Mean Squared Error: {svr_rmse:.4f}")
print(f"SVR R-squared: {svr_r2:.4f}")

SVR Root Mean Squared Error: 1442.1477
SVR R-squared: 0.8441


# K-Nearest Neighbors (KNN)

In [12]:
knn_model = KNeighborsRegressor()
knn_param_grid = {'n_neighbors': [3, 5, 7, 9],
                  'weights': ['uniform', 'distance']}

In [13]:
# Hyperparameter tuning
knn_grid_search = GridSearchCV(knn_model, knn_param_grid, cv=5, scoring='r2') 
knn_grid_search.fit(X_train_scaled, y_train)

In [14]:
# Best parameters and Best Score
print("KNN Best Parameters:", knn_grid_search.best_params_)
print("KNN Best Score:", knn_grid_search.best_score_)

KNN Best Parameters: {'n_neighbors': 9, 'weights': 'distance'}
KNN Best Score: 0.873401539612806


In [15]:
# Evaluate the KNN model
knn_predictions = knn_grid_search.predict(X_test_scaled)

knn_rmse = mean_squared_error(y_test, knn_predictions, squared=False)
knn_r2 = r2_score(y_test, knn_predictions)

print(f"KNN Root Mean Squared Error: {knn_rmse:.4f}")
print(f"KNN R-squared: {knn_r2:.4f}")

KNN Root Mean Squared Error: 1244.1695
KNN R-squared: 0.8840


#### The Support Vector Regression (SVR) model with parameters {'C': 100, 'gamma': 'scale', 'kernel': 'linear'} achieved a commendable R-squared of 0.8441 and a RMSE of 1442.15, indicating strong predictive performance. 
#### On the other hand, the k-Nearest Neighbors (KNN) model with parameters {'n_neighbors': 9, 'weights': 'distance'} outperformed SVR, boasting an impressive R-squared of 0.8840 and a lower RMSE of 1244.17. 
#### The KNN model appears more effective in capturing the underlying patterns in the Toyota dataset, providing better accuracy and predictive power compared to the SVR model.