Q1. In order to predict house price based on several characteristics, such as location, square footage, number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this situation would be the best to employ?

Dataset link: https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view?usp=share_link

Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?

Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?

Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?

Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?

### Answers: -

1. In order to predict house prices based on several characteristics, such as location, square footage, number of bedrooms, etc., for an SVM regression model, the best regression metric to employ would be Mean Squared Error (MSE).

2. If your goal is to predict the actual price of a house as accurately as possible, the more appropriate evaluation metric would be Mean Squared Error (MSE). MSE penalizes larger errors more than R-squared, making it a suitable choice when the focus is on accurately predicting the target variable.

3. If you have a dataset with a significant number of outliers and are trying to select an appropriate regression metric for your SVM model, the most appropriate metric would be Median Absolute Error (MAE). MAE is less sensitive to outliers compared to other metrics such as MSE or R-squared. It measures the median of the absolute differences between the predicted and actual values, making it more robust in the presence of outliers.

4. If you have built an SVM regression model using a polynomial kernel and both MSE and RMSE values are very close, it is advisable to choose RMSE (Root Mean Squared Error) as the evaluation metric. RMSE provides a measure of the average magnitude of the errors in the same unit as the target variable. It is widely used and easy to interpret, especially when the errors follow a normal distribution.

5. When comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF), the most appropriate evaluation metric to measure how well the model explains the variance in the target variable would be R-squared (coefficient of determination). R-squared quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R-squared value indicates a better fit of the model to the data and better explanation of the variance.


In [1]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the dataset and split the dataset into features (X) and target variable (y)
# Assuming the dataset is stored in a CSV file named 'house_prices.csv'
import pandas as pd
data = pd.read_csv('/Users/aakanksha/My_Codes/data-science-master-course/data/Bengaluru_House_Data.csv')



# dropping irrelavant features
data = data.drop(['size', 'bath', 'availability','society', 'balcony'], axis=1)
data['location'] = data['location'].fillna(data['location'].mode()[0])

# Perform one-hot encoding for categorical variables
data = pd.get_dummies(data, columns=['area_type'])

data.head()



Unnamed: 0,location,total_sqft,price,area_type_Built-up Area,area_type_Carpet Area,area_type_Plot Area,area_type_Super built-up Area
0,Electronic City Phase II,1056,39.07,False,False,False,True
1,Chikka Tirupathi,2600,120.0,False,False,True,False
2,Uttarahalli,1440,62.0,True,False,False,False
3,Lingadheeranahalli,1521,95.0,False,False,False,True
4,Kothanur,1200,51.0,False,False,False,True


In [2]:
## split input features and output variable price

X = data.drop('price', axis=1)
y = data['price']

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.shape, X_test.shape, y_train.shape, y_test.shape


((10656, 6), (2664, 6), (10656,), (2664,))

In [None]:
# Create an instance of the SVR model
from sklearn.base import r2_score
from sklearn.metrics import median_absolute_error


model = SVR()

# Train the model on the training data
model.fit(X_train, y_train)

# Predict the house prices on the testing data
y_pred = model.predict(X_test)

# Calculate the Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)

# Calculate the Median Absolute Error
mae = median_absolute_error(y_test, y_pred)
print("MAE:", mae)

R2_score = r2_score(y_test, y_pred)
print('r2_score', R2_score)