## <span style='color:#DB822E'>Activity 2: Supervised Learning</span>


In this lab we will classify different species of iris flowers based on the classic Iris data set (https://en.wikipedia.org/wiki/Iris_flower_data_set ). The species are setosa, versicolor and virginica.  This dataset is a bit more complex and interesting than the data set used in activity 01.  This lab will emphasize the importance of test data to validate the model and also give a cool graphical representation of the decision tree model.

This lab is inspired by the YouTube video series “Machine Learning Recipes with Josh Gordon” and specifically this video: https://youtu.be/tNa99PG8hR8?si=wyU-efpPqHKYW1Wp 


***

## <font color='#DB822E'>Configure Python

We load the Python modules for Jupyter Lab and Pandas.



In [None]:
# Import Jupyter Display module for better data output
from IPython.display import display, Markdown, Image
import pandas as pd
import numpy as np

display(Markdown('<span style="color: #14B326">Done!</span>'))



***


## <font color='#DB822E'>Load the datasets
We load the sklearn Python module and the Iris flower datasets.
We then print the feature and target names to confirm the datasets are loaded.
<br> 
The `Sepal` is the (usually) the green leaf of the flower<br>
The `Petal` is the non-green leaf/petal of the flower<br>
<br>
    
### This may take a few second to run. Check the status indicator *, to see if its still running or not.

In [None]:
from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()

display(Markdown("### <font color='#DB822E'>Features Names:"))
display(iris.feature_names)

display(Markdown("### <font color='#DB822E'>Target Names: "))
for types in iris.target_names:
    display(types)

display(Markdown('<span style="color: #14B326">Done!</span>'))


*** 

## <font color='#DB822E'>Next we'll print out entire dataset

The [Iris Flower Data Set](https://en.wikipedia.org/wiki/Iris_flower_data_set#Data_set)  is availabile on [Wikipedia](https://en.wikipedia.org/wiki/Iris_flower_data_set#Data_set) as well. It may be easier to view the data on [Wikipedia](https://en.wikipedia.org/wiki/Iris_flower_data_set#Data_set).
<br><br>

In [None]:
# iris_data = []
# for i in range(len(iris.target)):
#     iris_data.append(
#         {"Example": i, "Label": iris.target[i], "Features": iris.data[i]}
#     )

# df = pd.DataFrame(iris_data)
# display(HTML(df.to_html(index=False)))

for i in range(len(iris.target)):
    display(f"Example {i}: label {iris.target[i]}: features {iris.data[i]}")

# display(iris.data[0])

display(Markdown('<span style="color: #14B326">Done!</span>'))




### <span style='color:#DB822E'>Print out entire dataset with target names instead of index</span>
<br>


In [None]:

for i in range(len(iris.target)):
    display(f"Example {i}: label {iris.target_names[iris.target[i]]}({iris.target[i]}): features {iris.data[i]}")
    
display(Markdown('<span style="color: #14B326">Done!</span>'))


## <span style='color:#DB822E'>Create Test Data Set by Removing One Example of Each Flower</span>

The majority of this data set is used to train the model.  However, we also want to have a known set of test data to ensure our model is working.  Therefore, we will remove one example of each type of flower.  The data is organized in groups of 50.

Setosa is 0-49
Versicolor is 50-99
Virginica starts at 100.

In this code block we will delete 0, 50, 100 so that 1 of each species is removed from training data and can be used later for testing.


The testing data is helpful as you're training your model to ensure the dataset is valid.

Then print the expected results



In [None]:

# iris = load_iris()
test_idx = [0, 50, 100]

# training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)

# testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]



# What to expect
display(Markdown("This will show the array of the test data. It will be relevant in the next section:"))

display(test_target)

display(Markdown('<span style="color: #14B326">Done!</span>'))


### <span style='color:#DB822E'> What does the model predict?</font>

We now train the model with the new dataset.
Using the test data we entered above, we then predict what we should get based on that data
<br><br>


In [None]:
# create new classifier
clf = tree.DecisionTreeClassifier()
# train on training data
clf.fit(train_data, train_target)

# What the tree predicts
display(Markdown("This should match the output in the previous section:"))
display(clf.predict(test_data))

display(Markdown('<span style="color: #14B326">Done!</span>'))


***

###  <span style='color:#DB822E'> Visualize the data using an image</span>

Utilizing the Graphviz Dot language module to print out a tree diagram of the Iris petal/sepal data
<br>

In [None]:
# Visualize
# from scikit decision tree tutorial: http://scikit-learn.org/stable/modules/tree.html
from six import StringIO
import pydotplus

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
                     feature_names=iris.feature_names,
                     class_names=iris.target_names,
                     filled=True, rounded=True,
                     impurity=False)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

graph.write_png("images/iris.png")
display(Image(filename="images/iris.png"))

display(Markdown('<span style="color: #14B326">Done!</span>'))


## You have completed Activity 02.

[Start Activity 03](Activity03.ipynb)