# Iris dataset - Predicting Flower Species based on its Measurement Features.

This is a prediction analysis for iris dataset which is a built-in flower species data in the library. We utilize the flower measurement features in the dataset to predict the other flower(new_flower1 or 2) species type by using estimators in the Scikit-Learn library.

In [1]:
from sklearn import datasets
import pandas as pd
import numpy as np

In [18]:
# We use built in data in the library called as 'iris'
iris = datasets.load_iris()
iris_features = iris.data
# We set the iris 'features' column to be the 'target' column
iris_target = iris.target

In [20]:
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
# We set the 'target' data column as a base for prediction
iris_df['target'] = iris.target_names[iris.target]
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

> We will do __classification__ problem by predicting the flower species using its features.

## Estimators objects

The main API implemented by Scikit-Learn is that of the estimator. An estimator is the object that contains the model that we can use to learn from data.

### Import the estimator (model)

In [6]:
from sklearn.neighbors import KNeighborsClassifier

###  Create an instance of the estimator

In [7]:
flower_classifier = KNeighborsClassifier(n_neighbors=3)

### Use the data to train the estimator

Notes:
    
   > Scikit-Learn only accepts numbers.
    
   > The object containing the features must be a two dimensional np.array

In [8]:
iris_features[:10,:]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [9]:
iris_target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

0 ==> sentosa

1 ==> versicolor

2 ==> virginica

In [11]:
# X must be capital letter
flower_classifier.fit(X=iris_features, y=iris_target)

KNeighborsClassifier(n_neighbors=3)

### Use the data to make 'predictions'

In [13]:
# The features must be two-dimensional array
# The prediction is based on the sized of the flower measurement features.
# sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)
new_flower1 = np.array([[5.1, 3.0, 1.1,0.5]]) # the measurements for the new flower1 features.
new_flower2 = np.array([[10.0, 4.9, 6.5, 2.1]])# the measurements for the new flower2 features.

0 ==> sentosa

1 ==> versicolor

2 ==> virginica

In [14]:
flower_classifier.predict(new_flower1)

array([0])

The new_flower1 is a a flower species of sentosa

In [16]:
flower_classifier.predict(new_flower2)

array([2])

The new_flower2 is a a flower species of virgina

In [22]:
# Same purpose as few cell rows above, but in shorter syntax
new_flowers = np.array([[5.1, 3.0, 1.1,0.5],[6.0, 2.9, 4.5,1.1],[10.0,4.5,3.2,2.5]])
predictions = flower_classifier.predict(new_flowers)
predictions

array([0, 1, 2])

The new_flowers are flower species of sentosa,versicolor & virginica