# Support Vector Machines-3

#### Q1. In order to predict house price based on several characteristics, such as location, square footage, number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this situation would be the best to employ?
#### Dataset link:
https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view?


In [21]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.svm import SVR
df=pd.read_csv("Bengaluru_House_Data.csv")
df['balcony'].fillna(df.balcony.mode(), inplace=True)
df.tail()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
918,Plot Area,Ready To Move,Kadugondanahalli,4 Bedroom,,915,4.0,1.0,54.9
919,Plot Area,Ready To Move,2nd Stage Nagarbhavi,6 Bedroom,,3000,8.0,3.0,451.0
920,Super built-up Area,Ready To Move,Seegehalli,3 BHK,,1150,2.0,2.0,42.9
921,Built-up Area,Ready To Move,Hebbal,4 BHK,RMudeat,3900,4.0,2.0,410.0
922,Built-up Area,Ready To Move,Yeshwanthpur,5 Bedroom,,850,4.0,2.0,90.0


##### Split training

In [None]:
X_train,X_test,y_train,y_test=train_test_split(df['balcony'], df['price'],test_size=0.50,random_state=42)
svr_model = SVR(kernel='linear')
svr_model.fit(X_train,y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean sq. error:", mse)

#### Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the dataset
data = pd.read_csv('Bengaluru_House_Data.csv')

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('price', axis=1), data['price'], test_size=0.2, random_state=42)

# Train an SVM regression model
model = SVR(kernel='linear')
model.fit(X_train, y_train)

# Evaluate the model on the testing set using MSE
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)

print("Mean squared error:", mse)

If the goal is to predict the actual price of a house as accurately as possible for the given dataset, then the Mean Squared Error (MSE) would be more appropriate than R-squared as the evaluation metric for the SVM regression model.

MSE measures the average of the squared differences between predicted and actual values, and it is commonly used in regression analysis to assess the performance of a predictive model. In the case of predicting house prices, it gives higher weight to larger errors, which can be significant when predicting the price of a high-value asset like a house. It also penalizes the model more heavily for larger errors, which can help to identify models that are less accurate in predicting house prices.

On the other hand, R-squared measures the proportion of the variance in the target variable (i.e., house price) that is explained by the independent variables (i.e., location, square footage, number of bedrooms, etc.). While R-squared can be useful in understanding how well the independent variables are explaining the variation in the target variable, it may not be the best metric for evaluating the model's ability to predict house prices accurately.

We can develop an SVM regression model using the features provided in the dataset and evaluate its performance using MSE as the regression metric. We can split the dataset into a training set and a testing set using an 80/20 split, train the SVM model on the training set, and then evaluate the model's performance on the testing set using MSE.


#### Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?

### Ans.
If the dataset has a significant number of outliers, then using the Mean Absolute Error (MAE) as the regression metric would be the most appropriate in this scenario with an SVM model.

MAE measures the average of the absolute differences between predicted and actual values. It is less sensitive to outliers compared to the Mean Squared Error (MSE) metric, as it does not square the errors. When a dataset has outliers, it can lead to very large errors that can skew the evaluation of the model's performance. In this case, using MAE as the regression metric can provide a more robust evaluation of the model's performance by reducing the impact of outliers on the overall evaluation.

Here is some sample Python code to implement an SVM model using MAE as the regression metric:

In [None]:
data = pd.read_csv('Bengaluru_House_Data.csv')

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('price', axis=1), data['price'], test_size=0.2, random_state=42)

# Train an SVM regression model
model = SVR(kernel='linear')
model.fit(X_train, y_train)

# Evaluate the model on the testing set using MAE
predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)

print("Mean absolute error:", mae)

#### Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?

#### Ans.
If both the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) values are very close, either metric can be used to evaluate the performance of the SVM regression model using a polynomial kernel.

MSE and RMSE are both commonly used regression metrics, and they measure the same thing: the difference between the predicted and actual values. RMSE is simply the square root of MSE, which means that it penalizes larger errors more heavily than smaller errors.

In some cases, RMSE is preferred over MSE because it has the same unit as the target variable, which makes it more interpretable. However, if the difference between the two metrics is negligible, then it may not matter which one is used.

Therefore, in this case, you can choose either MSE or RMSE to evaluate the performance of the SVM regression model using a polynomial kernel, as both metrics are very close and provide a similar evaluation of the model's performance.

#### Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?

#### Ans.
If your goal is to measure how well the model explains the variance in the target variable, the most appropriate evaluation metric to use would be the coefficient of determination or R-squared.

R-squared measures the proportion of variance in the target variable that is explained by the independent variables in the model. It ranges from 0 to 1, where 0 indicates that the model explains none of the variance in the target variable, and 1 indicates that the model explains all of the variance in the target variable.

When comparing the performance of different SVM regression models using different kernels, R-squared is a useful metric because it provides a measure of the goodness of fit of the model relative to a simple baseline model that always predicts the mean value of the target variable.

Here is some sample Python code to implement an SVM regression model using different kernels and R-squared as the evaluation metric:

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score

# Load the dataset
data = pd.read_csv('Bengaluru_House_Data.csv')

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('price', axis=1), data['price'], test_size=0.2, random_state=42)

# Train SVM regression models with different kernels
linear_model = SVR(kernel='linear')
poly_model = SVR(kernel='poly', degree=3)
rbf_model = SVR(kernel='rbf')

linear_model.fit(X_train, y_train)
poly_model.fit(X_train, y_train)
rbf_model.fit(X_train, y_train)

# Evaluate the models on the testing set using R-squared
linear_r2 = r2_score(y_test, linear_model.predict(X_test))
poly_r2 = r2_score(y_test, poly_model.predict(X_test))
rbf_r2 = r2_score(y_test, rbf_model.predict(X_test))

print("Linear kernel R-squared:", linear_r2)
print("Polynomial kernel R-squared:", poly_r2)
print("RBF kernel R-squared:", rbf_r2)