# Online_Learning_Classification

*Author: Evan Carey*

*Copyright 2017-2019, BH Analytics, LLC*

## Overview

In this lecture set, we will go over machine learning classification methods appropriate for online learning (chunked optimization). 

## Classification

In the case where our outcome (target) variable is discrete with a limited number of possible values, we can use classification algorithms to predict the outcome. Imagine a binary outcome with values of 'Yes' and 'No'. We are interested in predicting the probability that the outcome is either 'Yes' or 'No'. It is also possible to predict outcomes with more than two possible values, but we will focus on the binary case here. 

## Libraries

In [1]:
## Import Modules
import os
import sys
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from patsy import dmatrices
from sklearn.metrics import confusion_matrix
import sklearn
from sklearn import datasets
import dask

In [2]:
## Set default figure size to be larger 
## this may only work in matplotlib 2.0+!
matplotlib.rcParams['figure.figsize'] = [10.0,6.0]
## Enable multiple outputs from jupyter cells
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [3]:
## Get Version information
print(sys.version)
print("Pandas version: {0}".format(pd.__version__))
print("Matplotlib version: {0}".format(matplotlib.__version__))
print("Numpy version: {0}".format(np.__version__))
print("SciKitLearn version: {0}".format(sklearn.__version__))
print("Dask version: {0}".format(dask.__version__))

3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
Pandas version: 1.3.4
Matplotlib version: 3.4.3
Numpy version: 1.20.3
SciKitLearn version: 0.24.2
Dask version: 2021.10.0


## Check your working directory

Set your working directory to make paths easier :) 

In [4]:
# Working Directory
import os
print("My working directory:\n" + os.getcwd())
# Set Working Directory 
#os.chdir(r"C:\Users\evancarey\Dropbox\Work")
#print("My new working directory:\n" + os.getcwd())

My working directory:
/home/s/teaching/zarchives/Spring2021/HDS5230/notes/week12/Uploads-week-12/Uploads-week-12


## Two different data scenarios for this lecture

We will consider two different possible data scenarios for this lecture set. 

1. The first scenario is that we are working on a large dataset that can fit into RAM. However...this dataset is big enough that we are having issues fitting models on the dataset when we try to converge models. (It **barely** fits into RAM)

2. The second scenario is when the dataset is too big to fit into RAM, so we must try to work on it one piece at a time without ever having the full dataset in RAM!

The techniques we use are different in these two situations. First, we consider data that fits into RAM. 

## Patient Mortality Dataset

We will use a dataset with a binary outcome of mortality as a motivating example.

This is a dataset of patients demographics and disease status, with mortality indicated. The dataset is here: 

`data\healthcare\patientAnalyticFile.csv`

In practice, you most likely would have created a dataset like this from multiple other files after cleaning, reshaping, and joining them. 

You can generalize this setup to any situation with a binary outcome, such as estimating the probability of a customer filing a warranty claim, or the probability of a transaction being fraudulent. 

We will first import this dataset and examine the potential variables to use in our classification algorithm.

In [5]:
## Set print limits
pd.options.display.max_rows = 10
## Import Data
df_patient = \
 pd.read_csv('PatientAnalyticFile.csv')
df_patient

Unnamed: 0,PatientID,DateOfBirth,Gender,Race,Myocardial_infarction,Congestive_heart_failure,Peripheral_vascular_disease,Stroke,Dementia,Pulmonary,...,Metastatic_solid_tumour,HIV,Obesity,Depression,Hypertension,Drugs,Alcohol,First_Appointment_Date,Last_Appointment_Date,DateOfDeath
0,1,1962-02-27,female,hispanic,0,0,0,0,0,0,...,0,0,0,0,0,0,0,2013-04-27,2018-06-01,
1,2,1959-08-18,male,white,0,0,0,0,0,0,...,0,0,0,0,1,0,0,2005-11-30,2008-11-02,2008-11-02
2,3,1946-02-15,female,white,0,0,0,0,0,0,...,0,1,0,0,1,0,0,2011-11-05,2015-11-13,
3,4,1979-07-27,female,white,0,0,0,0,0,1,...,0,0,0,0,0,0,0,2010-03-01,2016-01-17,2016-01-17
4,5,1983-02-19,female,hispanic,0,0,0,0,0,0,...,0,0,0,0,1,0,0,2006-09-22,2018-06-01,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,19996,1997-12-19,female,other,0,0,0,0,0,0,...,0,0,0,0,0,0,0,2008-06-14,2018-06-01,
19996,19997,1984-03-31,female,white,0,0,0,0,0,0,...,0,1,0,0,1,0,0,2007-04-24,2018-06-01,
19997,19998,1993-07-04,female,white,0,0,0,0,0,0,...,0,0,1,0,1,0,0,2010-10-16,2018-06-01,
19998,19999,1984-04-17,male,other,0,0,0,0,0,0,...,0,0,0,0,1,0,0,2015-01-04,2018-06-01,


We need to make a variable to indicate mortality. We can do that based on the absence of 'date of death':

In [6]:
# Create mortality variable
df_patient['mortality'] = \
    np.where(df_patient['DateOfDeath'].isnull(),
             0,1)
# Examine
df_patient['mortality']

0        0
1        1
2        0
3        1
4        0
        ..
19995    0
19996    0
19997    0
19998    0
19999    0
Name: mortality, Length: 20000, dtype: int64

In [7]:
df_patient['mortality'].describe()

count    20000.000000
mean         0.354700
std          0.478434
min          0.000000
25%          0.000000
50%          0.000000
75%          1.000000
max          1.000000
Name: mortality, dtype: float64

In [8]:
df_patient.describe()

Unnamed: 0,PatientID,Myocardial_infarction,Congestive_heart_failure,Peripheral_vascular_disease,Stroke,Dementia,Pulmonary,Rheumatic,Peptic_ulcer_disease,LiverMild,...,Cancer,LiverSevere,Metastatic_solid_tumour,HIV,Obesity,Depression,Hypertension,Drugs,Alcohol,mortality
count,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,...,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0
mean,10000.5,0.0456,0.04345,0.02395,0.02865,0.0314,0.07265,0.0123,0.00965,0.00925,...,0.05045,0.05145,0.03315,0.00645,0.16345,0.1063,0.3029,0.04005,0.07975,0.3547
std,5773.647028,0.208621,0.203873,0.152897,0.166825,0.174401,0.259568,0.110224,0.097762,0.095733,...,0.218877,0.220919,0.179033,0.080054,0.369785,0.308229,0.459524,0.196081,0.270913,0.478434
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,5000.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,10000.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,15000.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
max,20000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [9]:
df_patient.dtypes

PatientID                  int64
DateOfBirth               object
Gender                    object
Race                      object
Myocardial_infarction      int64
                           ...  
Alcohol                    int64
First_Appointment_Date    object
Last_Appointment_Date     object
DateOfDeath               object
mortality                  int64
Length: 30, dtype: object

We should change date of birth to be an actual date and calculate age if we want to include it in the model:

In [10]:
# Convert dateofBirth to date
df_patient['DateOfBirth'] = \
    pd.to_datetime(df_patient['DateOfBirth'])
# Calculate age in years as of 2015-01-01
df_patient['Age_years'] = \
    ((pd.to_datetime('2015-01-01') - df_patient['DateOfBirth']).dt.days/365.25)
df_patient['Age_years'].describe()


count    20000.000000
mean        47.247474
std         18.145086
min         15.753593
25%         31.733744
50%         47.099247
75%         62.924025
max         78.743326
Name: Age_years, dtype: float64

## Workflow into scikit-learn


* There are a number of possible ways to prepare data for modeling in scikit-learn. 
* You must end up with a numeric ndarray of inputs (X) and a numeric ndarray matrix of the target (Y)
* I prefer the following workflow:
  * We use pandas to import and clean data
  * We use Patsy to create the X and Y ndarrays
  * Using categorical transformations (dummy coding) as needed
  * Also can generate non-linear terms including splines
  * Use scikit-learn for machine learning

## Use Patsy to Create the Model Matrices

We typically start out with a pandas dataframe for manipulation purposes, then we will use this dataframe as the input to the machine learning library. I created a pandas dataframe above to replicate this process. We will use the dmatrices function from the patsy library to easily generate the design matrices for the machine learning algorithms representing the inputs. THis handles the following:

* drops rows with missing data
* construct one-hot encoding for categorical variables
* optionally adds constant intecercept

In [11]:
df_patient.columns

Index(['PatientID', 'DateOfBirth', 'Gender', 'Race', 'Myocardial_infarction',
       'Congestive_heart_failure', 'Peripheral_vascular_disease', 'Stroke',
       'Dementia', 'Pulmonary', 'Rheumatic', 'Peptic_ulcer_disease',
       'LiverMild', 'Diabetes_without_complications',
       'Diabetes_with_complications', 'Paralysis', 'Renal', 'Cancer',
       'LiverSevere', 'Metastatic_solid_tumour', 'HIV', 'Obesity',
       'Depression', 'Hypertension', 'Drugs', 'Alcohol',
       'First_Appointment_Date', 'Last_Appointment_Date', 'DateOfDeath',
       'mortality', 'Age_years'],
      dtype='object')

In [12]:
## Create formula for all variables in model
vars_remove = ['PatientID','First_Appointment_Date','DateOfBirth',
               'Last_Appointment_Date','DateOfDeath','mortality']
vars_left = set(df_patient.columns) - set(vars_remove)
formula = "mortality ~ " + " + ".join(vars_left)
formula

'mortality ~ Drugs + Diabetes_without_complications + LiverSevere + Obesity + HIV + Hypertension + Peptic_ulcer_disease + Depression + Paralysis + Rheumatic + Myocardial_infarction + Age_years + Race + Pulmonary + Cancer + Diabetes_with_complications + Peripheral_vascular_disease + LiverMild + Stroke + Congestive_heart_failure + Renal + Gender + Dementia + Alcohol + Metastatic_solid_tumour'

In [13]:
## only use subset of data so models fit in reasonable time
df_patient_sub = \
    df_patient.sample(frac=0.1,
                     random_state=32)    
## use Patsy to create model matrices
Y,X = dmatrices(formula,
                df_patient_sub)

In [14]:
Y

DesignMatrix with shape (2000, 1)
  mortality
          0
          0
          1
          1
          0
          0
          1
          1
          0
          0
          1
          0
          1
          0
          1
          0
          1
          0
          0
          1
          0
          1
          0
          0
          0
          0
          1
          1
          0
          0
  [1970 rows omitted]
  Terms:
    'mortality' (column 0)
  (to view full data, use np.asarray(this_obj))

In [15]:
X

DesignMatrix with shape (2000, 28)
  Columns:
    ['Intercept',
     'Race[T.hispanic]',
     'Race[T.other]',
     'Race[T.white]',
     'Gender[T.male]',
     'Drugs',
     'Diabetes_without_complications',
     'LiverSevere',
     'Obesity',
     'HIV',
     'Hypertension',
     'Peptic_ulcer_disease',
     'Depression',
     'Paralysis',
     'Rheumatic',
     'Myocardial_infarction',
     'Age_years',
     'Pulmonary',
     'Cancer',
     'Diabetes_with_complications',
     'Peripheral_vascular_disease',
     'LiverMild',
     'Stroke',
     'Congestive_heart_failure',
     'Renal',
     'Dementia',
     'Alcohol',
     'Metastatic_solid_tumour']
  Terms:
    'Intercept' (column 0)
    'Race' (columns 1:4)
    'Gender' (column 4)
    'Drugs' (column 5)
    'Diabetes_without_complications' (column 6)
    'LiverSevere' (column 7)
    'Obesity' (column 8)
    'HIV' (column 9)
    'Hypertension' (column 10)
    'Peptic_ulcer_disease' (column 11)
    'Depression' (column 12)
    'Paral

## Split into Testing and Training Samples

* The first step is to set aside a test sample of data that will allow us to estimate the generalization error post-fit. This protects against overfitting. 
* We can use “tuple unpacking” to assign the values (very pythonic :)
* We can assign a random seed (state) and fraction to split.

 For simple random splits, scikit-learn has a function `train_test_split()`

In [16]:
## Split Data into training and sample
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = \
    train_test_split(X,
                     np.ravel(Y), # prevents dimensionality error later!
                     test_size=0.25,
                     random_state=42)

## Confirm the Output Dimensions

* We can confirm the dimensions of the data are the same within test and train
* The proportion should also be close to the test_size argument. 

In [17]:
## Confirm dimensions
X_train.shape

(1500, 28)

In [18]:
X_test.shape

(500, 28)

In [19]:
y_train.shape

(1500,)

In [20]:
y_test.shape

(500,)

## Start with the Null Model

We can consider a null model of simply predicting the most frequent class as a base model. Without any other information, I may predict based simply on the distribution of the outcome.

In [21]:
## Null information rate
1 - y_train.mean()

array(0.64666667)

Scikitlearn has a built in dummy classifier that works similarly:

In [22]:
## Dummy classifier
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent',
                      random_state=0)
clf.fit(X_train, y_train)
clf.score(X_train, y_train)  

DummyClassifier(random_state=0, strategy='most_frequent')

0.6466666666666666

I am going to write a small function that will print the scores from a dict so we can compare the models. I will store the model scores in the dict as well. 

In [23]:
## Create dict to store all these results:
result_scores = {}

In [24]:
## Create Function to Print Results
def get_results(x1):
    print("\n{0:20}   {1:4}    {2:4}".format('Model','Train','Test'))
    print('-------------------------------------------')
    for i in x1.keys():
        print("{0:20}   {1:<6.4}   {2:<6.4}".format(i,x1[i][0],x1[i][1]))

In [25]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------


In [26]:
## Score the Model on Training and Testing Set
result_scores['Null'] = \
            (sklearn.metrics.accuracy_score(y_train,clf.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,clf.predict(X_test)))

In [27]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 


## Logistic Regression with the SAG Solver

* We will start with a logistic regression model, using the `solver=sag` option. 
* SAG handles an L2 penalty (not L1)
* For any SAG based approach, the model optimization is sensitive to the scale of the inputs. So we must scale our parameters on the way in! 
* We can use the pipeline approach to preprocess the data. 

Check the docs: 

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Also, if you want to pick something other than the default accuracy for your cross validation you can:

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Here is a good post on differences between classification metrics:

https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

In [28]:
## import linear model
from sklearn import linear_model
from sklearn import preprocessing
from sklearn.pipeline import Pipeline

## set our transformation
scaler = preprocessing.StandardScaler()

## Set our model
clf = linear_model.LogisticRegressionCV(fit_intercept=False, # already have the intercept
                                        solver='sag', # stochastic average gradient...
                                        Cs=20,
                                        cv=5,
                                        penalty='l2',
                                        max_iter=500, # may need to increase this from default for convergence! 
                                        random_state=10) 
## put together in pipeline
pipe1 = Pipeline([("scale", scaler),
                  ("logit", clf)])
## fit model using data with .fit
pipe1.fit(X_train,y_train)

Pipeline(steps=[('scale', StandardScaler()),
                ('logit',
                 LogisticRegressionCV(Cs=20, cv=5, fit_intercept=False,
                                      max_iter=500, random_state=10,
                                      solver='sag'))])

In [29]:
## Score the Model on Training and Testing Set
result_scores['logit_SAG'] = \
            (sklearn.metrics.accuracy_score(y_train,pipe1.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,pipe1.predict(X_test)))

In [30]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 
logit_SAG              0.6767   0.688 


## Logistic Regression with the SAGA Solver

* We will start with a logistic regression model, using the `solver=saga` option. 
* SAGA handles an L1 penalty (not L2)
* For any SAG based approach, the model optimization is sensitive to the scale of the inputs. So we must scale our parameters on the way in! 
* We can use the pipeline approach to preprocess the data. 

Check the docs: 

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Also, if you want to pick something other than the default accuracy for your cross validation you can:

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Here is a good post on differences between classification metrics:

https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

In [31]:
## import linear model
from sklearn import linear_model
from sklearn import preprocessing
from sklearn.pipeline import Pipeline

## set our transformation
scaler = preprocessing.StandardScaler()

## Set our model
clf = linear_model.LogisticRegressionCV(fit_intercept=False, # already have the intercept
                                        solver='saga', # stochastic average gradient...
                                        Cs=20,
                                        cv=5,
                                        penalty='l1',
                                        max_iter=500, # may need to increase this from default for convergence! 
                                        random_state=10) 
## put together in pipeline
pipe1 = Pipeline([("scale", scaler),
                  ("logit", clf)])
## fit model using data with .fit
pipe1.fit(X_train,y_train)

Pipeline(steps=[('scale', StandardScaler()),
                ('logit',
                 LogisticRegressionCV(Cs=20, cv=5, fit_intercept=False,
                                      max_iter=500, penalty='l1',
                                      random_state=10, solver='saga'))])

In [32]:
## Score the Model on Training and Testing Set
result_scores['logit_SAGA'] = \
            (sklearn.metrics.accuracy_score(y_train,pipe1.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,pipe1.predict(X_test)))

In [33]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 
logit_SAG              0.6767   0.688 
logit_SAGA             0.682    0.688 


## Stochastic Gradient Descent in Scikit-learn

There is a routine in scikit-learn called SGD we can directly call, and specify the loss function and potential regularizer. This allows us to specify a logistic regression with L1, L2, or both regularizers. We specify logistic regression by saying  `loss='log'`.

In [34]:
from sklearn.linear_model import SGDClassifier
## set our transformation
scaler = preprocessing.StandardScaler()

## Set our model
clf = linear_model.SGDClassifier(fit_intercept=False, # already have the intercept
                                 loss='log',
                                 shuffle=True,
                                 verbose=1,
                                 random_state=12,
                                 max_iter=1000, # total number of potential epochs
                                 tol=1e-6)
## put together in pipeline
pipe1 = Pipeline([("scale", scaler),
                  ("sgd", clf)])
## fit model using data with .fit
pipe1.fit(X_train,y_train)

-- Epoch 1
Norm: 109.46, NNZs: 27, Bias: 0.000000, T: 1500, Avg. loss: 33.367018
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 63.37, NNZs: 27, Bias: 0.000000, T: 3000, Avg. loss: 18.324365
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 39.34, NNZs: 27, Bias: 0.000000, T: 4500, Avg. loss: 12.874834
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 26.38, NNZs: 27, Bias: 0.000000, T: 6000, Avg. loss: 9.091764
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 29.60, NNZs: 27, Bias: 0.000000, T: 7500, Avg. loss: 7.548377
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 22.33, NNZs: 27, Bias: 0.000000, T: 9000, Avg. loss: 6.438229
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 17.23, NNZs: 27, Bias: 0.000000, T: 10500, Avg. loss: 5.222644
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 17.31, NNZs: 27, Bias: 0.000000, T: 12000, Avg. loss: 5.001976
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 16.78, NNZs: 27, Bias: 0.000000, T: 13500, Avg. loss: 4.228054
To

Pipeline(steps=[('scale', StandardScaler()),
                ('sgd',
                 SGDClassifier(fit_intercept=False, loss='log', random_state=12,
                               tol=1e-06, verbose=1))])

In [35]:
## Score the Model on Training and Testing Set
result_scores['SAG'] = \
            (sklearn.metrics.accuracy_score(y_train,pipe1.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,pipe1.predict(X_test)))

In [36]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 
logit_SAG              0.6767   0.688 
logit_SAGA             0.682    0.688 
SAG                    0.6713   0.682 


### Implementing regularization

Perhaps we should consider regularizing this model? 

We can implement either elastic net, l1, or l2 penalties. 

In [37]:
from sklearn.linear_model import SGDClassifier
## set our transformation
scaler = preprocessing.StandardScaler()

## Set our model
clf = linear_model.SGDClassifier(fit_intercept=False, # already have the intercept
                                 loss='log',
                                 penalty='elasticnet',
                                 alpha=1,
                                 shuffle=True,
                                 verbose=1,
                                 random_state=12,
                                 max_iter=1000, # total number of potential epochs
                                 tol=1e-6)
## put together in pipeline
pipe1 = Pipeline([("scale", scaler),
                  ("sgd", clf)])
## fit model using data with .fit
pipe1.fit(X_train,y_train)

-- Epoch 1
Norm: 0.12, NNZs: 1, Bias: 0.000000, T: 1500, Avg. loss: 0.687885
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.11, NNZs: 1, Bias: 0.000000, T: 3000, Avg. loss: 0.684548
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.11, NNZs: 1, Bias: 0.000000, T: 4500, Avg. loss: 0.684819
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.11, NNZs: 1, Bias: 0.000000, T: 6000, Avg. loss: 0.684356
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.10, NNZs: 1, Bias: 0.000000, T: 7500, Avg. loss: 0.684601
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.10, NNZs: 1, Bias: 0.000000, T: 9000, Avg. loss: 0.684838
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.10, NNZs: 1, Bias: 0.000000, T: 10500, Avg. loss: 0.684678
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 0.10, NNZs: 1, Bias: 0.000000, T: 12000, Avg. loss: 0.684611
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 0.10, NNZs: 1, Bias: 0.000000, T: 13500, Avg. loss: 0.684671
Total training time: 0.0

Pipeline(steps=[('scale', StandardScaler()),
                ('sgd',
                 SGDClassifier(alpha=1, fit_intercept=False, loss='log',
                               penalty='elasticnet', random_state=12, tol=1e-06,
                               verbose=1))])

In [38]:
## Score the Model on Training and Testing Set
result_scores['SAG_en'] = \
            (sklearn.metrics.accuracy_score(y_train,pipe1.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,pipe1.predict(X_test)))

In [39]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 
logit_SAG              0.6767   0.688 
logit_SAGA             0.682    0.688 
SAG                    0.6713   0.682 
SAG_en                 0.6607   0.684 


We should cross validate the penalty with SGD. We can do that using 

In [40]:
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV
## specify grid
parameters = {'alpha':(0.001,0.01,0.5,1,2,5,10),
              'l1_ratio':(0,0.1,0.25,0.5,0.75,0.9,1)}
## specify model without hyperparameters
clf = linear_model.SGDClassifier(fit_intercept=False, # already have the intercept
                                 loss='log',
                                 penalty='elasticnet',
                                 shuffle=True,
                                 #verbose=1,
                                 random_state=12,
                                 max_iter=1000, # total number of potential epochs
                                 tol=1e-6)
## specify grid
clf2 = GridSearchCV(clf,
                    parameters,
                    cv=5,
                    return_train_score=True)
## set our transformation
scaler = preprocessing.StandardScaler()
## put together in pipeline
pipe1 = Pipeline([("scale", scaler),
                  ("sgd", clf2)])
## fit model using data with .fit
pipe1.fit(X_train,y_train)

Pipeline(steps=[('scale', StandardScaler()),
                ('sgd',
                 GridSearchCV(cv=5,
                              estimator=SGDClassifier(fit_intercept=False,
                                                      loss='log',
                                                      penalty='elasticnet',
                                                      random_state=12,
                                                      tol=1e-06),
                              param_grid={'alpha': (0.001, 0.01, 0.5, 1, 2, 5,
                                                    10),
                                          'l1_ratio': (0, 0.1, 0.25, 0.5, 0.75,
                                                       0.9, 1)},
                              return_train_score=True))])

In [41]:
## check best model params
pipe1.named_steps['sgd'].cv_results_

{'mean_fit_time': array([0.01470766, 0.00952396, 0.00788884, 0.00776582, 0.00791755,
        0.00781679, 0.00778399, 0.0339704 , 0.02443452, 0.01988082,
        0.01378212, 0.01230869, 0.01019073, 0.00759149, 0.00481939,
        0.00234766, 0.002384  , 0.00218372, 0.00222616, 0.00234599,
        0.00224209, 0.00479159, 0.00234652, 0.00214643, 0.00220008,
        0.00223441, 0.002108  , 0.00213828, 0.00377336, 0.00213628,
        0.00222478, 0.00213389, 0.00210719, 0.00199995, 0.00201235,
        0.00363708, 0.00220704, 0.00218554, 0.00206981, 0.00198359,
        0.00208111, 0.00204444, 0.00390267, 0.0021224 , 0.00201788,
        0.00199747, 0.00203338, 0.00198479, 0.00199366]),
 'std_fit_time': array([4.91245134e-03, 2.41388360e-03, 8.89243431e-04, 8.69908658e-04,
        5.03181439e-04, 9.39500468e-04, 7.75067663e-04, 6.55350001e-03,
        8.09277385e-03, 3.72866921e-03, 3.33728366e-03, 3.80132909e-03,
        3.80830236e-03, 1.52047910e-03, 1.01244779e-03, 1.51105655e-04,
        2

In [42]:
pipe1.named_steps['sgd'].best_params_

{'alpha': 0.01, 'l1_ratio': 0.75}

In [43]:
## Score the Model on Training and Testing Set
result_scores['SAG_en_cv'] = \
            (sklearn.metrics.accuracy_score(y_train,pipe1.predict(X_train)),
             sklearn.metrics.accuracy_score(y_test,pipe1.predict(X_test)))

In [44]:
get_results(result_scores)


Model                  Train    Test
-------------------------------------------
Null                   0.6467   0.608 
logit_SAG              0.6767   0.688 
logit_SAGA             0.682    0.688 
SAG                    0.6713   0.682 
SAG_en                 0.6607   0.684 
SAG_en_cv              0.6833   0.688 


## Streaming data and updating partial fits

This is a bit more complex scenario. What if we want to fit a model on a dataset that we cannot have entirely in memory all at once? 

The SGD classifier supports a partial fit method where we can stream numpy arrays for each 'epoch', rather than making multiple passes over the data for each epoch. However, we must take care in preparing the datasets. Remember, the data is typically scaled and dummy coded prior to being passed into scikit-learn. We need to ensure that this is done in a uniform fashion when we stream over the dataset. 

The things that will get us are:

* Consistently scaling the data
* Ensuring the correct number of classes are specified (even if they aren't all represented in this subset of data)
* Consistently dummy coding the data. 

You must code your way around this one way or another. I will show a basic approach below. 

In [45]:
## establish generator to yield data
df_patient_gen = \
 pd.read_csv('PatientAnalyticFile.csv',
             chunksize=1000)
## test
for n,chunk in enumerate(df_patient_gen):
    print(n)
    print(chunk)
    if n>1:
        break

0
     PatientID DateOfBirth  Gender      Race  Myocardial_infarction  \
0            1  1962-02-27  female  hispanic                      0   
1            2  1959-08-18    male     white                      0   
2            3  1946-02-15  female     white                      0   
3            4  1979-07-27  female     white                      0   
4            5  1983-02-19  female  hispanic                      0   
..         ...         ...     ...       ...                    ...   
995        996  1964-07-28  female     black                      0   
996        997  1980-10-07    male     white                      0   
997        998  1949-09-06    male     black                      0   
998        999  1945-07-20    male     white                      0   
999       1000  1987-04-14    male     white                      0   

     Congestive_heart_failure  Peripheral_vascular_disease  Stroke  Dementia  \
0                           0                            0       

In [46]:
df1 = \
    pd.read_csv('PatientAnalyticFile.csv',
                nrows=10)
# Create mortality variable
df1['mortality'] = \
    np.where(df1['DateOfDeath'].isnull(),
             0,1)
# Convert dateofBirth to date
df1['DateOfBirth'] = \
    pd.to_datetime(df1['DateOfBirth'])
# Calculate age in years as of 2015-01-01
df1['Age_years'] = \
    ((pd.to_datetime('2015-01-01') - df1['DateOfBirth']).dt.days/365.25)
df1

Unnamed: 0,PatientID,DateOfBirth,Gender,Race,Myocardial_infarction,Congestive_heart_failure,Peripheral_vascular_disease,Stroke,Dementia,Pulmonary,...,Obesity,Depression,Hypertension,Drugs,Alcohol,First_Appointment_Date,Last_Appointment_Date,DateOfDeath,mortality,Age_years
0,1,1962-02-27,female,hispanic,0,0,0,0,0,0,...,0,0,0,0,0,2013-04-27,2018-06-01,,0,52.843258
1,2,1959-08-18,male,white,0,0,0,0,0,0,...,0,0,1,0,0,2005-11-30,2008-11-02,2008-11-02,1,55.373032
2,3,1946-02-15,female,white,0,0,0,0,0,0,...,0,0,1,0,0,2011-11-05,2015-11-13,,0,68.876112
3,4,1979-07-27,female,white,0,0,0,0,0,1,...,0,0,0,0,0,2010-03-01,2016-01-17,2016-01-17,1,35.433265
4,5,1983-02-19,female,hispanic,0,0,0,0,0,0,...,0,0,1,0,0,2006-09-22,2018-06-01,,0,31.865845
5,6,1987-11-16,male,black,0,0,0,0,0,0,...,0,0,0,0,0,2006-10-22,2011-01-13,,0,27.126626
6,7,1958-01-11,male,white,0,0,0,0,0,0,...,0,0,0,0,0,2015-01-20,2018-06-01,,0,56.971937
7,8,1952-10-31,female,black,0,0,0,0,0,0,...,0,0,1,0,0,2013-03-25,2018-06-01,,0,62.168378
8,9,1951-10-06,female,white,0,0,0,0,0,0,...,0,1,0,0,0,2008-08-04,2010-05-23,,0,63.238877
9,10,1954-10-16,male,white,0,0,0,0,0,0,...,0,0,0,0,0,2014-07-01,2015-10-19,,0,60.210815


In [47]:
df1.columns

Index(['PatientID', 'DateOfBirth', 'Gender', 'Race', 'Myocardial_infarction',
       'Congestive_heart_failure', 'Peripheral_vascular_disease', 'Stroke',
       'Dementia', 'Pulmonary', 'Rheumatic', 'Peptic_ulcer_disease',
       'LiverMild', 'Diabetes_without_complications',
       'Diabetes_with_complications', 'Paralysis', 'Renal', 'Cancer',
       'LiverSevere', 'Metastatic_solid_tumour', 'HIV', 'Obesity',
       'Depression', 'Hypertension', 'Drugs', 'Alcohol',
       'First_Appointment_Date', 'Last_Appointment_Date', 'DateOfDeath',
       'mortality', 'Age_years'],
      dtype='object')

In [48]:
## numeric cols
num_cols = ['Age_years']
## categorical cols
cat_cols = ['Gender', 'Race', 'Myocardial_infarction',
       'Congestive_heart_failure', 'Peripheral_vascular_disease', 'Stroke',
       'Dementia', 'Pulmonary', 'Rheumatic', 'Peptic_ulcer_disease',
       'LiverMild', 'Diabetes_without_complications',
       'Diabetes_with_complications', 'Paralysis', 'Renal', 'Cancer',
       'LiverSevere', 'Metastatic_solid_tumour', 'HIV', 'Obesity',
       'Depression', 'Hypertension', 'Drugs', 'Alcohol','mortality']

In [49]:
## calculate sum of X and X**2 for numeric variables as we go
def sum2(x):
    return(np.sum(x**2))
df1.loc[:,num_cols].agg(sum2,axis=0)
df1.loc[:,num_cols].agg(np.sum,axis=0)

Age_years    28344.510245
dtype: float64

Age_years    514.108145
dtype: float64

In [50]:
## calculate unique values of categorical variables as we go
dict_unique = {}
for col in cat_cols:
    dict_unique[col] = set(df1.loc[:,col])
dict_unique
dict_unique['Gender']
dict_unique['Gender']

{'Gender': {'female', 'male'},
 'Race': {'black', 'hispanic', 'white'},
 'Myocardial_infarction': {0},
 'Congestive_heart_failure': {0},
 'Peripheral_vascular_disease': {0},
 'Stroke': {0},
 'Dementia': {0},
 'Pulmonary': {0, 1},
 'Rheumatic': {0},
 'Peptic_ulcer_disease': {0, 1},
 'LiverMild': {0},
 'Diabetes_without_complications': {0, 1},
 'Diabetes_with_complications': {0},
 'Paralysis': {0},
 'Renal': {0},
 'Cancer': {0, 1},
 'LiverSevere': {0, 1},
 'Metastatic_solid_tumour': {0},
 'HIV': {0, 1},
 'Obesity': {0},
 'Depression': {0, 1},
 'Hypertension': {0, 1},
 'Drugs': {0},
 'Alcohol': {0},
 'mortality': {0, 1}}

{'female', 'male'}

{'female', 'male'}

In [51]:
## establish generator to yield data
df_patient_gen2 = \
 pd.read_csv(r'PatientAnalyticFile.csv',
             chunksize=100)
## test
for n,chunk in enumerate(df_patient_gen2):
    ## process data first
    # Create mortality variable
    chunk['mortality'] = \
        np.where(chunk['DateOfDeath'].isnull(),
                 0,1)
    # Convert dateofBirth to date
    chunk['DateOfBirth'] = \
        pd.to_datetime(chunk['DateOfBirth'])
    # Calculate age in years as of 2015-01-01
    chunk['Age_years'] = \
        ((pd.to_datetime('2015-01-01') - chunk['DateOfBirth']).dt.days/365.25)
    
    if n==0: ## for first chunk, establish new vars
        n_rows = chunk.shape[0]
        running_sum = chunk.loc[:,num_cols].agg(np.sum,axis=0)
        running_sum_2 = chunk.loc[:,num_cols].agg(sum2,axis=0)
        ## calculate unique values of categorical variables as we go
        dict_unique = {}
        for col in cat_cols:
            dict_unique[col] = set(chunk.loc[:,col])
    if n>0: ## for subsequent chunks, update these variables
        n_rows = chunk.shape[0] + n_rows
        running_sum = running_sum + chunk.loc[:,num_cols].agg(np.sum,axis=0)
        running_sum_2 = running_sum_2 + chunk.loc[:,num_cols].agg(sum2,axis=0)
        ## calculate unique values of categorical variables as we go
        for col in cat_cols:
            dict_unique[col] = set(chunk.loc[:,col]).union(set(dict_unique[col]))

In [52]:
## check out the values we calculated:
num_cols
n_rows
running_sum
## unique vars
dict_unique

['Age_years']

20000

Age_years    944949.478439
dtype: float64

{'Gender': {'female', 'male'},
 'Race': {'black', 'hispanic', 'other', 'white'},
 'Myocardial_infarction': {0, 1},
 'Congestive_heart_failure': {0, 1},
 'Peripheral_vascular_disease': {0, 1},
 'Stroke': {0, 1},
 'Dementia': {0, 1},
 'Pulmonary': {0, 1},
 'Rheumatic': {0, 1},
 'Peptic_ulcer_disease': {0, 1},
 'LiverMild': {0, 1},
 'Diabetes_without_complications': {0, 1},
 'Diabetes_with_complications': {0, 1},
 'Paralysis': {0, 1},
 'Renal': {0, 1},
 'Cancer': {0, 1},
 'LiverSevere': {0, 1},
 'Metastatic_solid_tumour': {0, 1},
 'HIV': {0, 1},
 'Obesity': {0, 1},
 'Depression': {0, 1},
 'Hypertension': {0, 1},
 'Drugs': {0, 1},
 'Alcohol': {0, 1},
 'mortality': {0, 1}}

In [53]:
## calculate mean
mean_all = running_sum / n_rows
## calculate stdev
stdev_all = np.sqrt((running_sum_2 / n_rows) - (mean_all * mean_all)) 
## final nums to use for standardization:
mean_all
stdev_all

Age_years    47.247474
dtype: float64

Age_years    18.144632
dtype: float64

Now we write a function that will use these metrics to generate appropriate X and Y matrices to feed into the scitlearn partial fit algorithm.

In [54]:
df2 = \
    pd.read_csv(r'PatientAnalyticFile.csv',
                nrows=10)
# Create mortality variable
df2['mortality'] = \
    np.where(df2['DateOfDeath'].isnull(),
             0,1)
# Convert dateofBirth to date
df2['DateOfBirth'] = \
    pd.to_datetime(df2['DateOfBirth'])
# Calculate age in years as of 2015-01-01
df2['Age_years'] = \
    ((pd.to_datetime('2015-01-01') - df2['DateOfBirth']).dt.days/365.25)
df2

Unnamed: 0,PatientID,DateOfBirth,Gender,Race,Myocardial_infarction,Congestive_heart_failure,Peripheral_vascular_disease,Stroke,Dementia,Pulmonary,...,Obesity,Depression,Hypertension,Drugs,Alcohol,First_Appointment_Date,Last_Appointment_Date,DateOfDeath,mortality,Age_years
0,1,1962-02-27,female,hispanic,0,0,0,0,0,0,...,0,0,0,0,0,2013-04-27,2018-06-01,,0,52.843258
1,2,1959-08-18,male,white,0,0,0,0,0,0,...,0,0,1,0,0,2005-11-30,2008-11-02,2008-11-02,1,55.373032
2,3,1946-02-15,female,white,0,0,0,0,0,0,...,0,0,1,0,0,2011-11-05,2015-11-13,,0,68.876112
3,4,1979-07-27,female,white,0,0,0,0,0,1,...,0,0,0,0,0,2010-03-01,2016-01-17,2016-01-17,1,35.433265
4,5,1983-02-19,female,hispanic,0,0,0,0,0,0,...,0,0,1,0,0,2006-09-22,2018-06-01,,0,31.865845
5,6,1987-11-16,male,black,0,0,0,0,0,0,...,0,0,0,0,0,2006-10-22,2011-01-13,,0,27.126626
6,7,1958-01-11,male,white,0,0,0,0,0,0,...,0,0,0,0,0,2015-01-20,2018-06-01,,0,56.971937
7,8,1952-10-31,female,black,0,0,0,0,0,0,...,0,0,1,0,0,2013-03-25,2018-06-01,,0,62.168378
8,9,1951-10-06,female,white,0,0,0,0,0,0,...,0,1,0,0,0,2008-08-04,2010-05-23,,0,63.238877
9,10,1954-10-16,male,white,0,0,0,0,0,0,...,0,0,0,0,0,2014-07-01,2015-10-19,,0,60.210815


In [55]:
## standardized numeric cols
X_num = (df2.loc[:,num_cols] - mean_all)/stdev_all
X_num

Unnamed: 0,Age_years
0,0.308399
1,0.447822
2,1.192013
3,-0.651113
4,-0.847723
5,-1.108915
6,0.535942
7,0.822332
8,0.88133
9,0.714445


In [56]:
## get dummy coded categoricals
for col in cat_cols:
    df2[col] = pd.Categorical(df2[col],
                              categories=dict_unique[col])
## use patsy to create formula of these
formula = "mortality ~ " + " + ".join(cat_cols)
formula

'mortality ~ Gender + Race + Myocardial_infarction + Congestive_heart_failure + Peripheral_vascular_disease + Stroke + Dementia + Pulmonary + Rheumatic + Peptic_ulcer_disease + LiverMild + Diabetes_without_complications + Diabetes_with_complications + Paralysis + Renal + Cancer + LiverSevere + Metastatic_solid_tumour + HIV + Obesity + Depression + Hypertension + Drugs + Alcohol + mortality'

In [57]:
## use Patsy to create model matrices
Y,X_cat = dmatrices(formula,
                    df2)

In [58]:
## join in the continuous X
X_all = np.concatenate([X_cat,X_num],axis=1)

In [59]:
Y[:,1]
classes_potential = np.array([0,1])
classes_potential

array([0., 1., 0., 1., 0., 0., 0., 0., 0., 0.])

array([0, 1])

In [60]:
## combine into one big step!!
## setup the partial fit operator
classes_potential = np.array([0,1])

## Set our model
clf = linear_model.SGDClassifier(fit_intercept=False, # already have the intercept
                                 loss='log',
                                 penalty='elasticnet',
                                 warm_start=True,
                                 alpha=1,
                                 verbose=1,
                                 tol=1e-6)

## establish generator to yield data
df_patient_gen3 = \
 pd.read_csv(r'PatientAnalyticFile.csv',
             chunksize=500)
## test the loop!
for n,chunk in enumerate(df_patient_gen3):
    ## process data first
    # Create mortality variable
    chunk['mortality'] = \
        np.where(chunk['DateOfDeath'].isnull(),
                 0,1)
    # Convert dateofBirth to date
    chunk['DateOfBirth'] = \
        pd.to_datetime(chunk['DateOfBirth'])
    # Calculate age in years as of 2015-01-01
    chunk['Age_years'] = \
        ((pd.to_datetime('2015-01-01') - chunk['DateOfBirth']).dt.days/365.25)
    # standardized numeric cols
    X_num = (chunk.loc[:,num_cols] - mean_all)/stdev_all
    ## get dummy coded categoricals
    for col in cat_cols:
        chunk[col] = pd.Categorical(chunk[col],
                                    categories=dict_unique[col])
    ## use Patsy to create model matrices
    Y,X_cat = dmatrices(formula,
                        chunk)
    ## join in the continuous X
    X_all = np.concatenate([X_cat,X_num],axis=1)
    
    ## update the fit, one full EPOCH?
    clf.partial_fit(X_all,Y[:,1],classes=classes_potential)

-- Epoch 1
Norm: 0.09, NNZs: 1, Bias: 0.000000, T: 500, Avg. loss: 0.689114
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.09, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.683806
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.09, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.679597
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.09, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677449
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.09, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677750
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.09, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675638
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677372
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677242
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.674696
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675598
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675079
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.676038
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.674875
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.674419
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.678431
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.08, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675907
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.678451
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677391
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.673917
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675070
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.676041
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677758
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675875
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677626
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.675913
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.674610
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.676823
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.677144
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.678849
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.677531
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.676035
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.674470
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.677894
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.676767
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.676555
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.676891
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 2, Bias: 0.000000, T: 500, Avg. loss: 0.677159
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.678523
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.679098
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)

-- Epoch 1
Norm: 0.07, NNZs: 3, Bias: 0.000000, T: 500, Avg. loss: 0.674924
Total training time: 0.00 seconds.


SGDClassifier(alpha=1, fit_intercept=False, loss='log', penalty='elasticnet',
              tol=1e-06, verbose=1, warm_start=True)