Skip to content

Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu

Notifications You must be signed in to change notification settings

szilard/student-data-science-project-1-kaggle

 
 

Repository files navigation

Sample Data Science Project

Excellent student project by Laszlo Sallo for the Data Science course I was teaching at CEU's new MSc in Business Analytics program.

Laszlo has also competed in the Prudential Kaggle competition and has finished in the top 10% (congratulations!). Also notable that he had advanced 400+ places from the public leaderboard (LB) to the private one, an excellent sign that he has applied well the techniques we discussed in class that help you avoid overfitting. Many Kaggle competitors including top ones have overfit the public test set and their private LB score ended up lower than the public one:

plot

(the top 3 kagglers on the public LB /in red in the plot above/ have overfit and lost; the competitors in green have won).

Some important points from Laszlo's report:

This analysis proved some points discussed in the class:
- Coding a data science project is completely different from other coding style. I produced at 
least 10x more code than I present here. Coding here is really a tool the handle to data shape 
it and play with it, then start over.
- H2o is really a handy tool for machine learning, it is easy and intuitive.
- GBM accuracy over Deep Learning and RF in such cases
- Kaggle is good to learn, but it is not the complete picture

Click here to view the full project report (an html file generated by Rmarkdown in a reproducible way).

About

Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.9%
  • R 0.1%