Machine learning is the study of computer algorithms that improve automatically through experience, by learning from data. In other words, we can train computers to make predictions or decisions without explicitly being programmed. One of the most popular approaches to machine learning, tree-based models, provides an attractive way to express knowledge and aid in decision-making. Tree-based models use decision trees independently, or as a group (aka. "ensemble"). These types of models have proven to be an effective solution for many machine learning problems in diverse domains, such as credit scoring, fraud detection and medical diagnostics.
In this course you'll learn the basics of using tree-based models in R. You will:
- Learn the how tree-based machine learning algorithms work.
- Learn how to use several popular tree packages in R, such as rpart, ipred, randomForest and gbm.
- Learn how to effectively interpret and explain decisions made from a tree-based model.
- Explore different use cases like identifying risky bank loans and predicting the final grade of students in a course.
- Build and evaluate tree-based models, including classification & regression trees (CART), bagged trees, random forests, gradient boosting machines (GBM).
- Tune model hyperparameters for optimal performance.
- Evaluate variable importance to understand what variables most strongly predict the outcome.
These powerful techinques will allow you to create high performance regression and classification models for your data!
This is a free, open source course on machine learning with tree-based models in R. The course content was created by Erin LeDell (code exercises, scripts), with contributions by Gabriela de Queiroz (slides). Ines Montani designed the web framework that runs this course, and Florencia D'Andrea built the course website. The course logo was created using the flametree R package by Danielle Navarro.