# Building a Classification Model for the Iris data set
![irish-flower-classification.png](attachment:irish-flower-classification.png)

This Jupyter notebook is a step by step guide on Random Forest algorith can be used to build a classification model - particularly for the Iris Data set

<i> The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.</i>

A flower has <i>A Petal</i> and <i>A Sepal</i> as the features, each with a length and width i.e Petal Length & Petal Width, also Sepal Length and Sepal Width. This results into <i>four (4) Input Features</i> to be used during classification. The final deduction or output after classification, which answers the question, <i>what class the particular flower belongs to</i>

## 1. Importing Required Libraries

In [3]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

## 2. Loading the iris Data Set

In [4]:
iris = datasets.load_iris()

## 3. The Input features

In [6]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


## 3.1. The Output features

In [7]:
print(iris.target_names)

['setosa' 'versicolor' 'virginica']


## 4. The data as an Array (features)

In [8]:
iris.data

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

### 4.1. Output variable (the Class label)

In [9]:
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

### 4.2. Assigning *input* and *output* variables
Let's assign the 4 input variables to X and the output variable (class label) to Y

In [12]:
X = iris.data
Y = iris.target

### 4.3. Let's examine the data dimension

In [13]:
X.shape

(150, 4)

In [14]:
Y.shape

(150,)

## 5. Build Classification Model using Random Forest

In [16]:
clf = RandomForestClassifier()

In [18]:
clf.fit(X, Y)

RandomForestClassifier()

## 6. Feature Importance

In [19]:
print(clf.feature_importances_)

[0.09950521 0.02715907 0.41388453 0.45945118]


## 7. Make Prediction

In [20]:
X[0]

array([5.1, 3.5, 1.4, 0.2])

In [21]:
print(clf.predict([[5.1, 3.5, 1.4, 0.2]]))

[0]


In [22]:
print(clf.predict(X[[0]]))

[0]


In [23]:
print(clf.predict_proba(X[[0]]))

[[1. 0. 0.]]


In [24]:
clf.fit(iris.data, iris.target_names[iris.target])

RandomForestClassifier()

## 8. Data split (80/20 ratio)

In [25]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

In [26]:
X_train.shape, Y_train.shape

((120, 4), (120,))

In [27]:
X_test.shape, Y_test.shape

((30, 4), (30,))

## 9. Rebuild the Random Forest Model

In [28]:
clf.fit(X_train, Y_train)

RandomForestClassifier()

### 9.1. Performs prediction on single sample from the data set

In [29]:
print(clf.predict([[5.1, 3.5, 1.4, 0.2]]))

[0]


In [30]:
print(clf.predict_proba([[5.1, 3.5, 1.4, 0.2]]))

[[1. 0. 0.]]


### 9.2. Performs prediction on the test set

#### *Predicted class labels*

In [31]:
print(clf.predict(X_test))

[2 2 1 1 0 1 1 0 0 2 2 1 1 1 2 0 2 2 0 2 1 0 1 0 0 0 2 2 2 2]


#### *Actual class labels*

In [32]:
print(Y_test)

[2 2 1 1 0 1 1 0 0 2 2 1 1 1 2 0 2 2 0 2 1 0 2 0 0 0 2 2 2 2]


## 10. Checking The Model Performance

In [34]:
print(clf.score(X_test, Y_test))

0.9666666666666667
