# importing libraries


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# importing dataset and change species column to interger

In [2]:
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['Species'] = iris.target
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Prepare the data by assigning the indendent x variables of sepal length, sepal width, petal length and petal width

Prepare the data by assigning the dependent y variable of Species

In [3]:
X = data.iloc[:,[0,1,2,3]].values
y = data.iloc[:,4].values


Scale the data to fit onto the model

In [4]:
X = X.reshape(-1, 4)
X = preprocessing.scale(X) 


Setup X train and test
Setup y train and test

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                y, test_size=0.25, random_state=0)

Fit a model

In [6]:
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)



Make predictions on test data


In [7]:
y_pred = log_reg.predict(X_test).reshape(-1,1)

# Measuring Model Performance

Use score method to get accuracy of model

In [8]:
score = log_reg.score(X_test, y_test)

print('Accuracy: {}'.format(score))

Accuracy: 0.9736842105263158


Check how the model did for different classes. 
The confusion matrix shows us that the model found instances of the species Versicolor the hardest to classify correctly.  

In [9]:
from sklearn.metrics import confusion_matrix

classes = list(iris.target_names)
conf_mat = confusion_matrix(y_test, y_pred)
cm_df = pd.DataFrame(conf_mat, columns=classes, index=classes)
cm_df

Unnamed: 0,setosa,versicolor,virginica
setosa,13,0,0
versicolor,0,15,1
virginica,0,0,9


Looking at the confusion matrix is one way of inspecting performance in more detail.

Looking at the f1 score, precision and recall are another way. 

The imperfect average f1 score tells us not all instances were classified perfectly, and the per-class f1 scores tells us which classes were the most problematic. 

In [10]:
from sklearn.metrics import f1_score, precision_score, recall_score

Average f1 score

In [11]:
av_f1 = f1_score(y_test, y_pred, average='micro')
print(av_f1)


0.9736842105263158


f1 score per class

In [15]:
f = f1_score(y_test, y_pred, average=None)
lowest_score = min(f)
hardest_class = classes[list(f).index(lowest_score)]
print('Hardest class:', hardest_class)
print(f)

Hardest class: virginica
[1.         0.96774194 0.94736842]


The precision and recall for that species then tell us more. What what went wrong is that not the model was too strict about what instances could be considered Virginica, or perhaps mistook them for another class.

Precision for virginica 

In [13]:
prec = precision_score(y_test == classes.index('virginica'), y_pred == classes.index('virginica'))
print('Precision:', prec)


Precision: 0.9


Recall for virginica 

In [14]:
rec = recall_score(y_test == classes.index('virginica'), y_pred == classes.index('virginica'))
print('Recall:', rec)

Recall: 1.0


Now I am going to look at another instance of this species, versicolor.

Precision for versicolor

In [16]:
prec = precision_score(y_test == classes.index('versicolor'), y_pred == classes.index('versicolor'))
print('Precision:', prec)

Precision: 1.0


Recall for virginica

In [17]:
rec = recall_score(y_test == classes.index('versicolor'), y_pred == classes.index('versicolor'))
print('Recall:', rec)

Recall: 0.9375


Now I am going to look at another instance of this species, setosa.

Precision for setosa

In [18]:
prec = precision_score(y_test == classes.index('setosa'), y_pred == classes.index('setosa'))
print('Precision:', prec)

Precision: 1.0


Recall for Setosa

In [19]:
prec = precision_score(y_test == classes.index('setosa'), y_pred == classes.index('setosa'))
print('Precision:', prec)

Precision: 1.0
