# Classifying a New Flower

If a new flower was added to Fisher's famous Iris data set, which species could it be classified as, based solely on comparing its sepal length and sepal width to the rest of the data set? K Nearest Neighbors can be used to classify the new flower.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import pandas as pd
import random
import math
from collections import Counter

#### Load the data to be used, and get an idea for min and max measurements for our new randomly generated flower. Create the flower.

In [2]:
dataset = load_iris()
names = ['sepal_length','sepal_width','petal_length','petal_width']
df = pd.DataFrame(dataset.data, columns=names)
df['class'] = dataset.target
df = df[['sepal_length','sepal_width','class']]
df.describe()

Unnamed: 0,sepal_length,sepal_width,class
count,150.0,150.0,150.0
mean,5.843333,3.054,1.0
std,0.828066,0.433594,0.819232
min,4.3,2.0,0.0
25%,5.1,2.8,0.0
50%,5.8,3.0,1.0
75%,6.4,3.3,2.0
max,7.9,4.4,2.0


In [3]:
x, y = random.uniform(4.3,7.9), random.uniform(2.0,4.4)
print(x, y)

7.414754866168646 3.874488165195731


#### Find the class of the 10 nearest neighbors, and determine what is the most likely class of the new flower.

In [4]:
distance=[]
for a, b, c, d in df[['sepal_length','sepal_width','class']].itertuples():
    dist = math.hypot(b-x, c-y)
    distance.append((dist, d))
distance.sort(key=lambda x: x[0])
distance[:10]

[(0.29481057160267066, 2),
 (0.3485160044755935, 2),
 (0.4909290444262135, 2),
 (0.707851635254321, 2),
 (0.7918054584300838, 1),
 (0.8484732507471311, 2),
 (0.893893344127582, 2),
 (0.9004716561949682, 2),
 (0.9126104483661875, 2),
 (0.9170120886126404, 2)]

In [5]:
types = [i[1] for i in distance[:10]]
top = Counter(types).most_common(1)
top

[(2, 9)]

In [6]:
print('The new flower most likely belongs to class {}, using KNN={}.'.format(top[0][0], 10))

The new flower most likely belongs to class 2, using KNN=10.
