- Teacher: Jonas Moons
- The course on learn.hu.nl
- The master on learn.hu.nl
- Data-driven design Slack
- Deadline: Monday January 28th before 12:00: hand in the assignments as a zipfile on learn.hu.nl.
- Cheatsheet
- Students and their Github accounts
Using tools from data science and machine learning would not make a lot of sense without some understanding of mathematics and statistics. However, the focus of the course is on the application of data science, rather than the mathematical foundation. If I use formulas, I will not focus on the technical aspects, but explain what they do conceptually. If you need to catch up on math, you can use these links to the Khan Academy:
- Basic algebra
- Equations and variables
- Squares and roots
- The coordinate plane and linear equations
- Exponents and logarithms
- Basic probability theory
For every week there is a Jupyter Notebook containing examples relating to the subjects of that week
- Week 1: exploratory data analysis, graphs, Seaborn
- Week 2: hypothesis testing, model evaluation, linear regression, logistic regression
- Week 3: supervised machine learning
- Week 4: unsupervised machine learning
- Week 5: text mining and cross-validation
These are optional exercises you can make during the lesson to test your knowledge. You don't need to submit these with the final assignment.
These are PDF versions of the slides I give every week.
- Week 4, lesson 1
- Week 4, lesson 2
- Week 5, lesson 1
- Week 5, lesson 2
- Week 6, lesson 1
- Week 6, lesson 2
- Week 7, lesson 1
- Week 7, lesson 2
- Week 8, lesson 1
- Week 8, lesson 2
Feel free to fork this file and add more resources!
- Extensive Python cheatsheet with examples
- A more minimal cheatsheet
- Datacamp Python basics
- Datacamp Python for data science
If you are struggling with the mathematics of the course, check out:
- Algebra on Khan Academy (https://www.khanacademy.org/math/algebra)
- Statistics and probability (https://www.khanacademy.org/math/statistics-probability)
- Google, Stack Overflow and Cross Validated are your friends. It’s not a shame to Google even really basic concepts.
- Your code should be properly commented (use
#
). Good commenting means you explain why you do something, not what you’re doing. - Visualize and explore your data. Get acquainted with your data. Explore cases that deviate from the trend (outliers).
- Visualize your model predictions and residuals to look for ways to improve your model.
- Properly label your graphs and axes.