## Logistic Regression

Scikit-learn's Train-Test-Split/Instantiate/Fit/Predict paradigm applies to all classifiers and regressors - which are known in scikit-learn as 'estimators'.  

In [2]:
# Import necessary modules
import pandas as pd
import numpy as np

from sklearn.metrics import classification_report 
from sklearn.metrics import confusion_matrix 

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split 

In [3]:
column_names = ['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi',
       'dpf', 'age', 'diabetes']

In [4]:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data',
                 names = column_names)

In [6]:
# Note the use of .drop() to drop the target variable 'party' from the feature array X as well as the use of the 
# .values attribute to ensure X and y are NumPy arrays. Without using .values, X and y are a DataFrame and Series 
# respectively; the scikit-learn API will accept them in this form also as long as they are of the right shape.

# build predictor and target df
X, y = df.drop('diabetes', axis=1).values, df['diabetes'].values

In [7]:
# Create training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state=42)

# Create the classifier: logreg
logreg = LogisticRegression()

# Fit the classifier to the training data
logreg.fit(X_train, y_train)

# Predict the labels of the test set: y_pred
y_pred = logreg.predict(X_test)

# Generate the confusion matrix and classification report
print("Confusion Matrix: \n{}".format(confusion_matrix(y_test, y_pred)))
print('\nClassification_report: \n{}'.format(classification_report(y_test, y_pred)))

Confusion Matrix: 
[[175  31]
 [ 36  66]]

Classification_report: 
             precision    recall  f1-score   support

          0       0.83      0.85      0.84       206
          1       0.68      0.65      0.66       102

avg / total       0.78      0.78      0.78       308

