<center><h1>IRIS TRAINING</h1><center>

> A model to classify flowers based on the famous Iris flower dataset.

The dataset used here is one of the simplest Machine Learning study cases:
[**Iris**](https://en.wikipedia.org/wiki/Iris_flower_data_set).

We've separated it in two files: one for the features (flower measures) and one for the labels (flower species). The idea is handle tabular datasets structured in multiple files.
    
The original Fisher dataset contains a set of 150 records under five attributes - sepal length, sepal width, petal length, petal width and species:
      
| sepal length | sepal width | petal length | petal width |   species |
|      :-:     |     :-:     |      :-:     |     :-:     |     :-:   |
|      5.1     |     3.5     |      1.4     |     0.2     | I. setosa |
|      4.9     |     3.0     |      1.4     |     0.2     | I. setosa |
|      4.7     |     3.2     |      1.3     |     0.2     | I. setosa |
|      4.6     |     3.1     |      1.5     |     0.2     | I. setosa |
|      5.0     |     3.6     |      1.4     |     0.3     | I. setosa |
|      5.4     |     3.9     |      1.7     |     0.4     | I. setosa |
|      4.6     |     3.4     |      1.4     |     0.3     | I. setosa |
|      5.0     |     3.4     |      1.5     |     0.2     | I. setosa |
|      4.4     |     2.9     |      1.4     |     0.2     | I. setosa |
|      4.9     |     3.1     |      1.5     |     0.1     | I. setosa |
|      ...     |     ...     |      ...     |     ...     |    ...    |
    
Based on this dataset we can split it into two pieces. One for the flower measures, and the other for the corresponding species. Further ahead we are going to name this datasets as:
> **entries**: flower measurements dataset
    
>**classes**: the corresponding classification for the given measures entries

## Application Imports 

In [None]:
import numpy as np
import joblib

## Cross Validation and Classification Methods

Another important aspect regarding Machine Learning is the selection of "training" and "test" sets, a proper allocation of datasets for these two jobs can deeply impact the accuracy and performance of the model. Here, we are going to use a Stratified K Fold method:

[Stratified K Fold Cross Validation](https://www.geeksforgeeks.org/stratified-k-fold-cross-validation/)

As our classification method, we are going to use the Support Vector Machine:

[Wikipedia](https://en.wikipedia.org/wiki/Support_vector_machine)

In [None]:
from sklearn import svm
from sklearn.model_selection import StratifiedKFold, cross_val_score

### Defining the params

In [None]:
# SVM Params
gamma = 0.1
kernel = 'linear'

# K Fold Params
n_folds = 3

In [None]:
params = {"gamma": gamma, "kernel": kernel, "n_folds": n_folds}

## Loading the datasets

The Iris dataset is so popular that you can load them in-memory straight from the scikit-learn package:
```python
from sklearn.datasets import load_iris

iris = load_iris()
iris
```

This will load all the information regarding the dataset into a `dict()`

As mentioned before we will be working with two datasets, `measures.csv` and `species.csv`. It was a particular option to load the datasets from local files instead of loading them from the scikit-learn package. 

In [None]:
entries = np.genfromtxt('datasets/measures.csv', delimiter=',')
classes = np.genfromtxt('datasets/species.csv', dtype=str)

## Creating the estimator and folding strategy:

Now it is time to create our estimator and define our cross validation datasets using the K Fold method:

In [None]:
clf = svm.SVC(kernel=kernel, gamma=gamma)
fold = StratifiedKFold(n_splits=n_folds,
                       shuffle=True,
                       random_state=np.random.RandomState(19))

---
---

## TRAINING

Let's initialize the training:

In [None]:
score = cross_val_score(estimator=clf,
                        X=entries,
                        y=classes,
                        cv=fold,
                        n_jobs=-1)

metrics = dict(score_avg=score.mean(),
               score_var=np.sqrt(score.var()))

clf.fit(entries, classes) # definitive training

---
---

## PUBLISH 

In [None]:
joblib.dump(clf, 'clf.pkl')