In [33]:
# Q1. In order to predict house price based on several characteristics, such as location, square footage,
# number of bedrooms, etc., you are developing an SVM regression model.
# Import necessary libraries


In [34]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score


df = pd.read_csv('Bengaluru_House_Data.csv')


df = df.drop(['area_type', 'availability', 'location', 'society'], axis=1)


df["size"] = df["size"].str.replace("BHK", " ")
df["size"] = df["size"].str.replace("Bedroom", " ")
df["size"] = df["size"].str.replace("RK", " ")


index = df['total_sqft'].str.contains('-')
df = df[~index]

index = df['total_sqft'].str.contains('Acre')
df = df[~index]

index = df['total_sqft'].str.contains('Sq. Meter')
df = df[~index]

index = df['total_sqft'].str.contains('Sq. Yards')
df = df[~index]

index = df['total_sqft'].str.contains('Cents')
df = df[~index]

index = df['total_sqft'].str.contains('1Grounds')
df = df[~index]

index = df['total_sqft'].str.contains('Guntha')
df = df[~index]



df=df.dropna()


X = df.drop(['price'], axis=1)
y = df['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)


model = SVR(kernel='rbf')
model.fit(X_train, y_train)


y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean squared error:", mse)
print("R2 score:", r2)


Mean squared error: 11441.085799649498
R2 score: 0.33422930906948656


In [None]:
"""  The best regression metric to employ would be the mean squared error .
MSE measures the average squared difference between the predicted values and the actual values. In this case, the predicted values 
are the predicted house prices and the actual values are the true house prices. The MSE is a good choice because it penalizes 
large errors more heavily than small errors, which is appropriate for a problem where the cost of making large errors is likely 
to be much greater than the cost of making small errors.

Other regression metrics that could also be useful in this situation include mean absolute error , root mean squared error 
, and R-squared  score. However, MSE is generally preferred over these other metrics for this type of problem.

In [None]:
# Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
# your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
# of a house as accurately as possible?
"""If the goal is to predict the actual price of a house as accurately as possible, the mean squared error (MSE) would be the 
more appropriate evaluation metric to use, rather than R-squared.

The reason for this is that MSE measures the average squared difference between the predicted values and the actual values.
 By minimizing MSE, the model is optimized to predict the actual price of a house as accurately as possible,
  as it is penalizing large errors more heavily than small errors. This is important in a situation where the cost of making 
  large errors is likely to be much greater than the cost of making small errors.

On the other hand, R-squared measures the proportion of variance in the dependent variable (i.e., house price) that can be
 explained by the independent variables (i.e., features). While R-squared can be a useful metric to evaluate how well the 
 model fits the data, it may not necessarily be directly related to the accuracy of the house price predictions.

Therefore, if the primary goal is to predict the actual price of a house as accurately as possible, MSE would be the more 
appropriate evaluation metric to use.






In [None]:
# Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
# regression metric to use with your SVM model. Which metric would be the most appropriate in this
# scenario?


"""When dealing with a dataset that contains a significant number of outliers, the most appropriate regression metric to use 
with an SVM model would be the mean absolute error .

Unlike the mean squared error , the MAE measures the average absolute difference between the predicted values and the 
actual values. This means that the MAE is less sensitive to outliers than the MSE, which squares the difference between 
predicted and actual values, causing outliers to have a larger impact on the overall error.

Since outliers can significantly impact the accuracy of a model, using a metric that is less sensitive to outliers, 
like the MAE, can provide a more accurate evaluation of the model's performance.

"""






In [None]:
# Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best
# metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values
# are very close. Which metric should you choose to use in this case?

"""If both the mean squared error  and the root mean squared error  are very close, either metric could be used to evaluate 
the performance of the SVM regression model with a polynomial kernel.

 if forced to choose between the two, the RMSE may be a slightly better choice since it has the same units as the dependent
  variable  and is therefore easier to interpret. The RMSE also has the advantage of being more interpretable than the MSE 
 ."""

In [None]:
# Q5. You are comparing the performance of different SVM regression models using different kernels (linear,
# polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most
# appropriate if your goal is to measure how well the model explains the variance in the target variable?
"""If the goal is to measure how well the SVM regression models with different kernels , the variance in the target variable,
 the most appropriate evaluation metric would be the coefficient of determination or R-squared  score.

The R-squared score is a statistical measure that represents the proportion of variance in the target variable that is explained 
by the independent variables. It ranges  from 0 to 1, with a value of 1 indicating that the model explains all the 
variance in the target variable, and a value of 0 indicating that the model does not explain any of the variance.

 if the goal is to compare the performance of SVM regression models using different kernels and to measure how well 
they explain the variance in the target variable, the R-squared score would be the most appropriate evaluation metric to use."""