A tutorial of the random forest algorithm, writing a classifier from scratch and applying it to an example problem.
The visualization above demonstrates a random forest model's ability to overcome overfitting and achieve better accuracy on unseen test data. Note how the decision tree learns oddly shaped decision boundaries and is clearly overfitting the training data, while the random forest learns decision boundaries much closer to what one would choose by intuition.
Created as part of a graduate level data science course. The Jupyter notebook provides the full tutorial. Additionally, a Python file with just the code from the tutorial is included.
- Harrison Van Til