# **Training Results**

![image.png](attachment:image.png)

In this case, I use three metrics: MSE (Mean Squared Error), MAE (Mean Absolute Error), and R² (Coefficient of Determination). These metrics help evaluate the performance of the models during the training phase.

1. MSE (Mean Squared Error): MSE measures the average squared difference between predicted values and actual values. Lower MSE indicates better model performance.
2. MAE (Mean Absolute Error): MAE measures the average absolute difference between predicted values and actual values. Lower MAE indicates better model performance.
3. R² (Coefficient of Determination): R² measures the proportion of variance in the dependent variable explained by the independent variables. Higher R² indicates a better fit of the model to the data.


# **Testing Results**

In the testing phase, I used Kaggle’s online evaluation system to assess the model’s predictive performance. After training and validating the models on the provided dataset, I generated predictions for the test set and submitted them to Kaggle, which automatically scored the results using a predefined metric. 

The public leaderboard score served as an objective measure of how well the model generalized to unseen data, allowing for direct comparison between different models and approaches. This evaluation method not only ensured consistency in scoring but also provided valuable feedback on the effectiveness of various regression techniques and hyperparameter settings. 

*(Lower score = better performance)*

![image.png](attachment:image.png)

# **Summary**

### 1. Linear Regression (Scikit-learn)

* Train (MSE/MAE/R²): Performance is stable, R² around 0.87–0.89 → the model captures basic linear relationships.
* Test (Public Score): 0.25691 → average.
* Remarks: Simple linear model with low risk of overfitting. The moderate train–test gap indicates decent generalization, but lacks modeling power for complex patterns.


### 2. Linear Regression (Keras)

* Train: Significantly higher MSE and MAE, low R² (~0.63) → clearly underperforms Scikit-learn.
* Test: 0.33366 → the lowest among all models.
* Remarks: Gradient Descent with batch training likely failed to converge due to suboptimal learning rate or insufficient epochs → clear underfitting.



### 3. Ridge Regression (Scikit-learn)

* Train: Similar to Linear (Scikit-learn) with L2 regularization; MSE/MAE comparable, R² ~0.85.
* Test: 0.17050 → fairly good.
* Remarks: L2 regularization slightly reduces overfitting and improves generalization over basic linear model, though not as strong as Lasso or ElasticNet.



### 4. Ridge Regression (Keras)

* Train: Lowest R² (~0.63), high MSE/MAE → poor convergence.
* Test: 0.24580 → still underperforms Scikit-learn counterpart.
* Remarks: Same issue as Linear Keras: optimization instability or poor hyperparameter choice → weak predictive performance.



### 5. Lasso Regression (Scikit-learn)

* Train: Low MSE/MAE, high R² (~0.89), selects important features via L1 regularization.
* Test: 0.13459 → top 3 best models.
* Remarks: Strong L1 regularization removes noise and irrelevant features → excellent generalization.



### 6. Lasso Regression (Keras)

* Train: MSE/MAE similar to ElasticNet, high R² (~0.89).
* Test: 0.15724 → good, but slightly worse than Scikit-learn version.
* Remarks: Keras implementation performs better than Linear/Ridge Keras thanks to L1, but tuning (e.g., alpha, batch size) may still be suboptimal compared to Scikit-learn.



### 7. ElasticNet (Scikit-learn)

* Train: Performance on par with Lasso, high R² (~0.89).
* Test: 0.13396 → the best overall.
* Remarks: Combines L1 and L2 penalties → balances feature selection and coefficient stability, achieving the strongest generalization.



### 8. ElasticNet (Keras)

* Train: Low MSE/MAE, high R² (~0.89).
* Test: 0.14579 → close to Lasso, better than Ridge/Linear Keras.
* Remarks: Solid performance, but still slightly behind Scikit-learn due to incomplete optimization in training process.



#### Summary

* Top 3 Kaggle Test Scores:   `ElasticNet (Keras)` >  `ElasticNet (Sklearn)` > `Lasso (Sklearn)`
* Regularization Impact: L1 and ElasticNet outperform L2 by enabling feature selection and reducing overfitting.
* Overfitting/Underfitting: Keras Linear and Ridge models show clear signs of underfitting 