The purpose of this project is to create a Machine Learning model
which will be able to predict
dthe average price of the spanish railway ticket.
It would be applied different regression
modeld and their performance are going to be compared by R square score
on the test sample
and training sample
📍 Programming language: Python
📍 Library: scikit-learn
📍 Applied algorithm: Decision tree
, Bagging
, Boosting
, Random forest
and Xgboost
- Map of the railway city connections:
- Cross validation, boosting model:
- Actuals vs predicted, boosting model:
- Application of fine tuning (after the green line):
As per the above results table
, it seems the Boosting model is the best one
, it has the greatest R square score on the test sample
, equal to 85.92
However, taking into consideration the R square score on the training sample
, the Xgboosting model seems to have the greatest score
. This result is according to the conclussion made before, which suggests that in this model there is overfitting problem
, or at leats, this model overfitted the data more than other models