# Simple Logistic Regression Model

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import pickle

import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style="whitegrid", context='notebook', color_codes=True)

from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression

## Data Gathering

In this example, we will use prepared data from University of California for simplicity purposes.

In [2]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"]
df = pd.read_csv(url, names=names)

Take a sneek peak of what the data looks like! Use <b>head()</b> or <b>tail()</b> methods to view few items of your data.

In [3]:
df.head(10)

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
5,5,116,74,0,0,25.6,0.201,30,0
6,3,78,50,32,88,31.0,0.248,26,1
7,10,115,0,0,0,35.3,0.134,29,0
8,2,197,70,45,543,30.5,0.158,53,1
9,8,125,96,0,0,0.0,0.232,54,1


## Data Exploration 

First, CHECK if there are missing values in your data.

In [4]:
df.describe()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


By invoking <b>describe</b> method, it will display the <b>count</b>, <b>mean</b>, <b>std</b>, <b>min</b>, <b>max</b> and <b>percentiles</b>.
Since all our column counts are equal, that implies we don't have missing data.


<b><i>NOTE: In the real world application, you will encounter a lot of missing and unsanitized data. You will spend 80% of your time in extracting and transforming your data.</i></b>

In [5]:
df.shape

(768, 9)

`shape` returns the number of rows and columns of our dataframe. `(768,9)` implies that we have 768 samples and 9 features in our dataset.

## Model Generation

Since our goal in this example is to predict the class of the diabetes for a given attributes of the patient. That means `class` column is our target value and the rest of the columns are our features.

In [6]:
X = df.values[:, 0:8] # Let X be the matrix of our features

In [7]:
y = df.values[:, 8] # Let y be the matrix of the target value

Split the dataset into training data and test data. We use `train_test_split` method from `cross_validation` library to split the data. I use 25% of the data as test data and 75% as training data.

In [8]:
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25)

### Fitting the training data into the LogisticRegression model.

In [9]:
model = LogisticRegression() # for simplicity purposes, we will not be doing parameter tuning
model.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

## Model Validation

In [10]:
score = model.score(X_test, y_test)
print(score)

0.755208333333


The <i>score</i> would determine how accurate your model.
In this case, we are <b>80.20%</b> confident that our model will determine the value of `class` correctly. 

## Embedding Machine Learning Model into a Web API

### First, save the model into a pickle object.

In [11]:
filename = "pickle_objects/diabetes_model.pkl"
pickle.dump(model, open(filename, 'wb'))

### Second, create a very simple web API using Flask.

You can copy the code snippet below and save the file under app.py.
To deploy the web API, simply execute this command <b>python app.py</b> in your terminal.

In [None]:
from flask import request, url_for
from flask_api import FlaskAPI, status, exceptions

import pickle
import os
import numpy as np

app = FlaskAPI(__name__)
curr_dir = os.path.dirname(__file__)
clf = pickle.load(open(os.path.join(curr_dir, 'diabetes_model.sav'), 'rb'))

@app.route("/", methods=['GET']) 
def home():
    return {'hello': 'world'}


@app.route("/predict", methods=['POST']) # this is the endpoint we use to predict the diabetes class
def predict():
    data = [ float(request.data['preg']),
             float(request.data['plas']),
             float(request.data['pres']),
             float(request.data['skin']),
             float(request.data['test']),
             float(request.data['mass']),
             float(request.data['pedi']),
             float(request.data['age'])
            ]

    m = clf.predict(data)[0]
    return {'prediction': m}

if __name__ == "__main__":
    app.run(debug=True)


Hurrah! We are done!

### Testing your API endpoint

curl -X POST --data "preg=0&plas=137&pres=40&skin=35&test=168&mass=43.1&pedi=2.288&age=33" "http://localhost:5000/predict"

It should return `{"prediction": 1.0}`