# Logistic Regression

<span>Logistic Regression is a frequent model and often serves as a basic benchmark for most models. Logistic Regressions use a sigmoid function and 50% threshold to classify the data into classes. We will look at different Logistic Regression models from a few packages.</span>

### Import Preliminaries

In [6]:
%matplotlib inline
%config InlineBackend.figure_format='retina'

# Import modules
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
import seaborn as sns
import warnings

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.linear_model import LogisticRegression

# Set pandas options
pd.set_option('max_columns',1000)
pd.set_option('max_rows',30)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Set warning options
warnings.filterwarnings('ignore');

### Import Data

In [7]:
wine = load_wine()
X, y = wine.data, wine.target

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.25)

wdf = pd.DataFrame(wine.data, columns=wine['feature_names'])
wdf = pd.concat([wdf, pd.DataFrame(wine.target, columns=['target'])], axis=1)
wdf.head(5)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


### Fitting the Model

In [8]:
model = LogisticRegression()
model.fit(X_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [9]:
crossvalidation = KFold(n_splits=20, random_state=1, shuffle=False)

scores = cross_val_score(model, X_train, y_train, 
                         scoring ='accuracy', 
                         cv=crossvalidation, n_jobs =1)

print(f'Folds: {len(scores)} \
      accuracy: {np.mean(np.abs(scores))} \
      std: {np.std(scores)}')

Folds: 20       accuracy: 0.9571428571428571       std: 0.07953949089757176


In [10]:
print(f'Training Set Score: {model.score(X_test, y_test)}')

Training Set Score: 0.9333333333333333


### Model Attributes

In [12]:
model.coef_

array([[ -3.30357207e-01,   4.31272739e-01,   6.48692255e-01,
         -7.44791204e-01,  -2.19631689e-02,  -4.38775497e-02,
          1.07680125e+00,   5.60401518e-02,   2.11820295e-01,
         -1.94062162e-01,  -8.39522666e-02,   5.24545237e-01,
          1.83234669e-02],
       [  7.21915432e-01,  -9.02982147e-01,  -6.06888091e-01,
          2.22689976e-01,   1.64600573e-02,   7.16571901e-01,
          7.77184289e-01,   4.15667392e-01,   1.31469888e-01,
         -1.72890430e+00,   7.91065544e-01,   4.34349475e-01,
         -1.52554227e-02],
       [ -4.83320960e-01,   7.36410761e-01,  -2.75438657e-03,
          1.67399054e-01,   2.64637230e-02,  -7.39584924e-01,
         -1.61867032e+00,  -1.01615307e-01,  -6.86914767e-01,
          1.01094814e+00,  -4.39174936e-01,  -1.08025442e+00,
          3.01898430e-04]])

In [13]:
model.intercept_

array([-0.19494128,  0.3193043 , -0.08894107])

In [14]:
model.n_iter_

array([22], dtype=int32)

### Stats Models

In [15]:
import statsmodels.formula.api as sm
model = sm.Logit(y, X)
model

ValueError: endog must be in the unit interval.