## Install instruction

Running this notebook requires that you install scikit-learn using the following commands:
```
conda install scikit-learn
```

In [1]:
%matplotlib inline
from sklearn import datasets
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
sns.set_context("poster")
sns.set_style("ticks")

## Load the data

In [3]:
iris = datasets.load_iris()

In [4]:
print(iris.DESCR)

Iris Plants Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML iris d

In [5]:
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["target"] = iris.target
df["target"] = df["target"].map(dict({i: k for i,k in enumerate(iris.target_names)}).get)

In [6]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [7]:
df.columns = ["sepal_length", "sepal_width", "petal_length", "petal_width", "target"]
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


## Fitting the model

In [8]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

In [9]:
model = LogisticRegression(multi_class="multinomial", solver="lbfgs")
#model = RandomForestClassifier()
model.fit(df[df.columns[:4]], df.target)
pd.DataFrame(model.coef_, columns=df.columns[:4], index=model.classes_)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
setosa,-0.423401,0.961714,-2.519522,-1.085926
versicolor,0.534158,-0.317977,-0.205377,-0.93968
virginica,-0.110757,-0.643737,2.7249,2.025606


In [10]:
model.predict([[
    0.5, 1.5, 1.3, 0.4
]])

array(['setosa'], dtype=object)

## Evaluating the model

**Accuracy**: Percentage of correct predictions

In [11]:
from sklearn.metrics import accuracy_score, classification_report

In [12]:
print(classification_report(model.predict(df[df.columns[:4]]), df.target))
print("Accuracy: {:.3f}".format(accuracy_score(model.predict(df[df.columns[:4]]), df.target)))

             precision    recall  f1-score   support

     setosa       1.00      1.00      1.00        50
 versicolor       0.94      0.98      0.96        48
  virginica       0.98      0.94      0.96        52

avg / total       0.97      0.97      0.97       150

Accuracy: 0.973


In [13]:
model.predict_proba([[
    0.5, 1.5, 1.3, 0.4
]])*100

array([[9.97614114e+01, 2.38579139e-01, 9.45872287e-06]])