Q1. In order to predict house price based on several characteristics, such as location, square footage,
number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
situation would be the best to employ?
Dataset link:
    https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view?usp=share_link

Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
of a house as accurately as possible?
Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
regression metric to use with your SVM model. Which metric would be the most appropriate in this
scenario?
Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best
metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values
are very close. Which metric should you choose to use in this case?
Q5. You are comparing the performance of different SVM regression models using different kernels (linear,
polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most
appropriate if your goal is to measure how well the model explains the variance in the target variable?


### Q1: Best Regression Metric for Predicting House Prices

In the context of predicting house prices based on various characteristics, the most appropriate regression metric would be the **Root Mean Squared Error (RMSE)**. RMSE is preferred because it provides a measure of the average magnitude of the errors between predicted and actual values, giving more weight to larger errors due to the squaring process. This is particularly useful when predicting house prices, as larger errors can significantly impact the assessment of model performance.

### Q2: Choosing Between MSE and R-squared for Predicting House Prices

If your goal is to predict the actual price of a house as accurately as possible, the **Mean Squared Error (MSE)** or **Root Mean Squared Error (RMSE)** would be more appropriate. These metrics directly measure the average magnitude of the prediction errors in units of the target variable (house prices), providing a clear indication of prediction accuracy.

- **MSE** is the average of the squared differences between predicted and actual values.
- **RMSE** is the square root of MSE and has the same units as the target variable, making it more interpretable.

### Q3: Appropriate Metric for Datasets with Outliers

When dealing with datasets with a significant number of outliers, **Mean Absolute Error (MAE)** would be the most appropriate metric. MAE measures the average magnitude of errors without squaring them, making it less sensitive to outliers compared to MSE or RMSE. This allows the evaluation to focus on the average error rather than being disproportionately influenced by large errors.

### Q4: Choosing Between MSE and RMSE for Polynomial Kernel

When both MSE and RMSE values are very close, it typically indicates that the errors are evenly distributed and neither metric will provide significantly different insights. In this case, **RMSE** is often preferred because it is on the same scale as the target variable (house prices), making it more interpretable and easier to communicate.

### Q5: Evaluating Model Performance with Different Kernels

To measure how well the model explains the variance in the target variable, the most appropriate metric is **R-squared (R²)**. R-squared provides a proportion of the variance in the target variable that is explained by the model, allowing for a comparison of explanatory power across different models with various kernels.



In [4]:
### Implementation Example
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import joblib

# Load dataset from the given link
data_url = 'https://drive.google.com/uc?export=download&id=1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0'
data = pd.read_csv(data_url)

# Assuming dataset has 'price' as target variable and rest are features
X = data.drop('price', axis=1)
y = data['price']

# Identify numeric and categorical columns
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_features = X.select_dtypes(include=['object']).columns.tolist()

# Preprocess the data (impute missing values, scale numeric features, and encode categorical features)
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

# Create an SVR model pipeline
svr_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('svr', SVR(kernel='poly', degree=2, C=1.0, epsilon=0.1))
])

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model
svr_pipeline.fit(X_train, y_train)

# Predict the target variable for the test set
y_pred = svr_pipeline.predict(X_test)

# Calculate and print evaluation metrics
mse = mean_squared_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
print(f"MAE: {mae}")
print(f"R-squared: {r2}")

# Save the trained model to a file
joblib.dump(svr_pipeline, 'svr_poly_model.pkl')


MSE: 18878.99963598166
RMSE: 137.4008720350117
MAE: 47.49359331477567
R-squared: 0.17851072584289418


['svr_poly_model.pkl']