**Logistic regression** is a probabilistic statistical classification model that can be used to predict a categorical dependent variable (e.g., iris species) based on one or more predictor variables (e.g., iris features).  Traditionally it's a binary classifier, but can be extended to a multiclass one, which we use here.  It's name comes from the use of the **[logit function](https://en.wikipedia.org/wiki/Logit)** used in statistics.

![Wikimedia](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Logit.svg/200px-Logit.svg.png)


In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import numpy as np
np.set_printoptions(precision=2)  # print only two decimal places

Load the iris data


In [None]:
iris = load_iris()
X = iris.data
y = iris.target

Normalize the data to help convergence and show the first five rows.

In [None]:
X = (X-X.mean())/X.std()
print(X[:5,:])

[[ 0.83  0.02 -1.05 -1.65]
 [ 0.73 -0.24 -1.05 -1.65]
 [ 0.63 -0.13 -1.1  -1.65]
 [ 0.58 -0.18 -1.   -1.65]
 [ 0.78  0.07 -1.05 -1.65]]


Create a logistic regression model that works with more than two classes (multinomial)

In [None]:
clf = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial')

Run a cross validation experiment

In [None]:
clf = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial')
scores = cross_val_score(clf, X, y, cv=10)
print('accuracy scores:', scores)
print("Accuracy: {:0.2f} +/- {:0.2f}".format(scores.mean(), scores.std()*2))

accuracy scores: [1.   1.   1.   1.   0.93 1.   0.87 1.   1.   1.  ]
Accuracy: 0.98 +/- 0.09


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

We got a warning that the the solver failed to converge.  Looking at the documentation we see that the maximum number of iterations is a paramter with default 100.  Let's try increasing it.

In [None]:
clf = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial', max_iter=200)
scores = cross_val_score(clf, X, y, cv=10)
print('accuracy scores:', scores)
print("Accuracy: {:0.2f} +/- {:0.2f}".format(scores.mean(), scores.std()*2))

accuracy scores: [1.   1.   1.   1.   0.93 1.   0.87 1.   1.   1.  ]
Accuracy: 0.98 +/- 0.09


Let's do a more principled evaluation by splitting the dataset into 90% train and 10% test.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10)

Fit the model to the training data

In [None]:
clf = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial', max_iter=200)
clf = clf.fit(X_train, y_train)

Run the model on the held-out test data and display results via a confusion matrix showing the predicted and actual categories of the test data

In [None]:
y_predict = clf.predict(X_test)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_predict)

array([[4, 0, 0],
       [0, 7, 0],
       [0, 0, 4]])