Prediction of the average price of the Spanish rail tickets data¶

The purpose of this project is to create a Machine Learning model which will be able to predict dthe average price of the spanish railway ticket.

It would be applied different regression modeld and their performance are going to be compared by R square score on the test sample and training sample

Technical details about the project:

📍 Programming language: Python

📍 Library: scikit-learn

📍 Applied algorithm: Decision tree, Bagging, Boosting, Random forest and Xgboost

Data sources:

Kggle: https://www.kaggle.com/thegurusteam/spanish-high-speed-rail-system-ticket-pricing

Some figures:

Map of the railway city connections:

Cross validation, boosting model:

Actuals vs predicted, boosting model:

Results:

Application of fine tuning (after the green line):

Conclusion:

As per the above results table, it seems the Boosting model is the best one, it has the greatest R square score on the test sample, equal to 85.92

However, taking into consideration the R square score on the training sample, the Xgboosting model seems to have the greatest score. This result is according to the conclussion made before, which suggests that in this model there is overfitting problem, or at leats, this model overfitted the data more than other models

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Before_modeling		Before_modeling
Modeling		Modeling
Results		Results
Data_splits.zip		Data_splits.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction of the average price of the Spanish rail tickets data¶

Technical details about the project:

Data sources:

Some figures:

Results:

Conclusion:

Links to notebooks:

1) Before modelling:

✅ Data cleaning

✅ Data exploration

✅ Data transformation

2) Modelling:

✅ Decision trees

✅ Bagging

✅ Boosting

✅ Random forest

✅ Xgboost

About

Releases

Packages

Languages

lajobu/Renfe_pred_avg_price

Folders and files

Latest commit

History

Repository files navigation

Prediction of the average price of the Spanish rail tickets data¶

Technical details about the project:

Data sources:

Some figures:

Results:

Conclusion:

Links to notebooks:

1) Before modelling:

2) Modelling:

About

Topics

Resources

Stars

Watchers

Forks

Languages