Author: Kyle Hammerberg
Overview:
In this lab you will learn how to classify the famous iris data set with the random forest algorithm.
Fisher’s iris data set is one of most famous data sets used in pattern recognition literature. It contains three classes (or ‘target labels’) with 50 instances of each class. Each class represents one of the species of the iris plant (Iris Setosa, Iris Versicolour, and Iris Virginica). Each instance is defined by 4 attributes: sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm.
First, you will do some preliminary exploration of the data set, then the data will be preprocessed (i.e., prepared for analysis), the model will be trained, and finally, we will use our model to make predictions about the species of unknown iris plants.
Objectives:
(1) To become familiarized with data structures and data transformations used in a simple classification problem
(2) List specific data structures learned in this lab
(3) To gain experience classifying data with the random forest algorithm
(4) To become more comfortable creating data visualizations
What you’ll need:
- Jupyter Notebook
- Python 3
- Basic programming knowledge