# Iris Dataset
It is a collection of 150 samples of Iris flowers

We have 50 samples for each species, so we have 3 species

In this dataset we have the width and length of the petal and sepal of each item

## Goal
Train the computer to recognize the species of an Iris flower based on the measurement

### Import Iris Dataset

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()


### Observations (Sample data)

In [None]:
x = iris.data
print(x)
# [
#  [5.1 3.5 1.4 0.2]
#  [4.9 3.  1.4 0.2]
#  ...
# ]

An numpy.**ndarray** where each of each item is a sample with **width/heigh** of **petal** and **sepal**

A **ndarray** is like a list, but much more efficient and with additional methods

widely used in machine learning due to its efficiency in working with giant data

### Target (results)
The species of each flower

The target should be **numeric**


In [None]:
y = iris.target
print(y)
# [ 0 0 1 1 2 2 0 2 0...]

### Target Name

We can get the species name with **.target_names**

In [None]:
print(iris.target_names)
# ['setosa' 'versicolor' 'virginica']

### observations shape
We need to validate if the length of the observation(data) match length of the target(ids)

We can get the length with the shape command

In [None]:
print(iris.data.shape)
# (150, 4)

### target shape


In [None]:
print(iris.target.shape)
# (150,)

## Statistical model
We are going o use an statistical model to analyze the flowers and find out what specie is it using the measurement as input

The model we are going to use is the **KNN**(K Nearest neighbors)

### KNN import

In [None]:
from sklearn.neighbors import KNeighborsClassifier
# choose the initial value for k 
knn = KNeighborsClassifier(n_neighbors=1)

### Train the machine
We are going to train our app using KNN where it will analyze the target and the observation so it can predict the species by the measurement 

We can train our app using the method **fit**

In [None]:
knn.fit(x,y)

### Do predictions
We can do the prediction using the **predict** method

The input is an array of sample observation(measurement)

The response is an array with the target(id) of the species

We can then use the **target_names** method to get the **species name**

In [None]:
species = knn.predict([[5.9, 3, 5.1, 1.8]])
print(species)
## [2]
print(iris.target_names[species[0]])
## virginica

## Separate data into two groups
We are separating the data to ensure it is predicting correctly

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

the method **train_test_split** will divide data for **training** and **testing** 

it will be divided **randomly**

the **test_size = 25** means **75%** to **train** and **25%** to **test**


## Model performance assessment
We are going to know our accuracy

first we train our model with the train observation

In [None]:
knn.fit(x_train, y_train)
predictions = knn.predict(x_test)

Now we will get the accuracy using the metric from sklearn

The first argument is the original data, and the second the prediction

It will then check the percentage of correct predictions 

In [None]:
from sklearn import metrics

correct_predictions = metrics.accuracy_score(y_test, predictions)
print(correct_predictions)
# 0.9736842105263158