# Q1. In order to predict house price based on several characteristics, such as location, square footage, number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this situation would be the best to employ?
## Dataset Link: https://drive.google.com/file/d/1Z9oLpmt6IDRNw7IeNcHYTGeJRYypRSC0/view


#### When evaluating an **SVM regression model** for house price prediction, the following regression metrics are commonly used:

#### **1. Mean Absolute Error (MAE)**
- **Formula:**  
  \[
  MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
  \]
- **Interpretation:**  
  Measures the average absolute difference between actual and predicted prices.
- **Why Use It?**  
  - Easy to interpret in real-world terms (e.g., if MAE = 10,000, the average error is $10,000).
  - Less sensitive to outliers than MSE.

---

#### **2. Mean Squared Error (MSE)**
- **Formula:**  
  \[
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  \]
- **Interpretation:**  
  Penalizes larger errors more than smaller ones by squaring them.
- **Why Use It?**  
  - Useful when large errors should be heavily penalized.
  - Helps optimize models that require smooth gradient updates.

---

#### **3. Root Mean Squared Error (RMSE)**
- **Formula:**  
  \[
  RMSE = \sqrt{MSE}
  \]
- **Interpretation:**  
  Square root of the MSE, making it comparable to actual price values.
- **Why Use It?**  
  - More interpretable than MSE because it's in the same unit as the target variable.
  - Gives higher weight to large errors.

---

#### **4. R-squared (R²) Score**
- **Formula:**  
  \[
  R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
  \]
- **Interpretation:**  
  Measures the proportion of variance explained by the model (ranges from 0 to 1).
- **Why Use It?**  
  - A high R² (closer to 1) means the model explains most of the variance in house prices.
  - A low R² suggests that other factors might influence house prices.

---

### **Best Metric for House Price Prediction**
- **RMSE** is often preferred because it balances penalizing large errors while keeping the units interpretable.
- **MAE** can also be useful for understanding the average error in real terms.
- **R² Score** provides insight into how well the model explains price variation.

#### **Final Recommendation:**
- We need to Use **RMSE** as the primary metric.
- Report **MAE** for better interpretability.
- Include **R² Score** to measure overall model performance.


# Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?

## **Comparison of MSE and R-squared for House Price Prediction**

#### **1. Mean Squared Error (MSE)**
- **Formula:**
  \[
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  \]
- **Interpretation:**  
  - Measures the average squared difference between actual and predicted prices.
  - Penalizes larger errors more than smaller ones.
- **When to Use:**  
  - Best for models where minimizing large deviations is a priority.
  - Useful for optimizing loss functions in regression models.

---

#### **2. R-squared (R²) Score**
- **Formula:**
  \[
  R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
  \]
- **Interpretation:**  
  - Measures the proportion of variance in house prices explained by the model.
  - Ranges from **0 to 1** (higher values indicate a better fit).
- **When to Use:**  
  - Useful for understanding model performance relative to a baseline.
  - Good for comparing different models but does not provide direct error values.

---

### **Best Metric for Predicting House Prices Accurately**
- **MSE** is more appropriate if the goal is to **predict actual house prices as accurately as possible** because:
  - It directly measures the difference between actual and predicted values.
  - It provides a concrete value for how much the predictions deviate from reality.
  - It helps optimize models by reducing large errors.

#### **Final Recommendation:**
- We need to Use **MSE** (or RMSE for interpretability) as the primary metric for accuracy.
-We need to Use **R²** as a supplementary metric to assess model explanatory power.


# Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?

### **Impact of Outliers on Regression Metrics**
Outliers can heavily influence certain regression metrics, leading to misleading evaluations of model performance. Selecting a metric that is **robust to outliers** is crucial in this case.

---

### **Comparison of Common Regression Metrics**

#### **1. Mean Squared Error (MSE)**
- **Formula:**
  \[
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  \]
- **Impact of Outliers:**  
  - Outliers have a **quadratic effect** on MSE, making it highly sensitive to extreme values.
  - Not recommended when outliers are present.

---

#### **2. Mean Absolute Error (MAE)**
- **Formula:**
  \[
  MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
  \]
- **Impact of Outliers:**  
  - **Less sensitive** to outliers than MSE since it takes the absolute error instead of squaring it.
  - A good choice when dealing with outliers.

---

#### **3. Median Absolute Error (MedAE)**
- **Formula:**  
  - Median of **all absolute errors** between actual and predicted values.
- **Impact of Outliers:**  
  - **Highly robust** to outliers because it focuses on the median, not the mean.
  - Best suited for datasets with extreme outliers.

---

#### **4. R-squared (R²) Score**
- **Formula:**
  \[
  R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
  \]
- **Impact of Outliers:**  
  - Can be affected by outliers, as it relies on squared errors.
  - Not the best choice when dealing with extreme outliers.

---

#### **Best Metric for Outlier-Prone Datasets**
- **Primary choice:** **Median Absolute Error (MedAE)** (most robust to outliers).
- **Alternative choice:** **Mean Absolute Error (MAE)** (balances robustness and interpretability).
- **Avoid:** **MSE and R²**, as they are highly sensitive to extreme values.

#### **Final Recommendation:**
W need to Use **MedAE** for extreme outliers and **MAE** if a balance between robustness and interpretability is needed.


# Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?

### **Understanding MSE and RMSE**
#### **1. Mean Squared Error (MSE)**
- **Formula:**
  \[
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  \]
- **Interpretation:**
  - Measures the average squared difference between actual and predicted values.
  - **Sensitive to outliers** due to squared differences.
  - Produces a squared unit of measurement (e.g., if predicting house prices in dollars, MSE is in squared dollars).

---

#### **2. Root Mean Squared Error (RMSE)**
- **Formula:**
  \[
  RMSE = \sqrt{MSE}
  \]
- **Interpretation:**
  - Simply the square root of MSE.
  - Expresses error in the **same unit** as the target variable, making it easier to interpret.
  - Like MSE, it is **sensitive to large errors**.

---

#### **Choosing Between MSE and RMSE**
- Since **MSE and RMSE are very close**, their interpretations are similar.
- However, **RMSE is preferred** because:
  - It provides an error estimate in the **same unit** as the target variable, making it easier to understand.
  - It is directly comparable to the actual values of the dataset.

#### **Final Recommendation:**
We need to Use **RMSE** as it is more interpretable and provides a clearer understanding of model performance in real-world terms.


# Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?

### **Recommended Metric: R-squared (R²)**
#### **Why R²?**
- R² measures the proportion of variance in the target variable that is explained by the model.
- It provides a **normalized measure of performance**, making it easier to compare models with different kernels.
- Unlike MSE or RMSE, which give absolute error values, R² shows how well the model captures variability.

#### **R² Formula:**
\[
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
\]
where:
- \(SS_{res}\) is the sum of squared residuals (errors).
- \(SS_{tot}\) is the total sum of squares (variance of actual values).

#### **Interpretation:**
- **R² = 1** → Perfect model (explains all variance in the target variable).
- **R² = 0** → Model explains no variance (as bad as predicting the mean).
- **Negative R²** → Model is worse than a naive mean prediction.

#### **Final Recommendation:**
For comparing different SVM regression models and their ability to explain variance, **R² is the most appropriate metric**.
