# Logistic Regression Analysis

### Libraries

We utilize both external libraries, like numpy and scikit-learn, as well as internally written libraries for the sake of modularity and simplicity of code within this notebook. The goal for modularizing the code base is so that running the different algorithms here can be clean and require as few parameters and extraneous code blocks as possible, enabling us to focus on analysis.

In [1]:
# External libraries:
import numpy as np
from sklearn.model_selection import train_test_split

# Internal libraries:
import datasets.data as data
from descent_algorithms import *
from learning_rates import *
from models import *
from util import *

Using TensorFlow backend.


### Data
We use three different datasets for our analysis of our algorithms, all of which provide a binary classification problem (??? 0 or 1 ???). (DESCRIPTION OF THREE DATASETS HERE AND HOW THEY ARE PREPPED IN THE data.py FILE)

Here, we read in the data vectors and labels using the datasets/data utility functions, and then perform a train/test split of 80%/20% of the provided samples. The splitting is done using the train_test_split function from the sklearn.model_selection package, which randomizes the splits.

In [2]:
features, labels = data.load_wisconsin_breast_cancer()
wbc_X_train, wbc_X_test, wbc_y_train, wbc_y_test = train_test_split(
    features, labels, test_size=0.2)
wbc_n = wbc_X_train.shape[0]

M_features, M_labels = data.load_MNIST_13()
mnist_X_train, mnist_X_test, mnist_y_train, mnist_y_test = train_test_split(
    M_features, M_labels, test_size = 0.2)
mnist_n = mnist_X_train.shape[0]

cod_features, cod_labels = data.load_cod_rna()
cod_X_train, cod_X_test, cod_y_train, cod_y_test = train_test_split(
    cod_features, cod_labels, test_size = 0.2)
cod_n = cod_X_train.shape[0]

### Comparative Measures
We use a relative convergence measure of 0.000001 (1/10000% change in loss between iterations), in order to determine whether or not an algorithm has converged. This allows us to directly compare the various descent methods and learning rates (?? and regularizations ??) for convergence rate.

Additionally, we keep track of the final loss converged too, the resultant test accuracy, and the time per iteration in order to fully compare the relative performance of the all of the algorithms.

In [3]:
# relative convergence limit
rel_conv = 0.000001

### Fixed Step Size
We begin our analysis with a look at the fixed learning rate convergence for our GD, SGD, AGD, and SVRG algorithms on our three datasets.

The default learning rate for fixed is set to 0.01.

In [4]:
# initialize our learning rate object
lr = FixedRate()

#### Wisconsin Breast Cancer Data
Then, we setup the run for all of our descent methods on the Wisconsin Breast Cancer dataset, beginning with the initialization of each of our descent method objects.

In [5]:
# initialize our descent methods
gd = GradientDescent()
sgd_1 = GradientDescent() # the GD algorithm is used for all SGD algorithms, 
                          # with the smaller batch size specified in the model
sgd_10 = GradientDescent()
sgd_100 = GradientDescent()
agd = NesterovAcceleratedDescent()
svrg = StochasticVarianceReducedGradientDescent()

Next, we initialize all of our model objects (all logistic regression models in this case), with the appropriate parameters for each algorithm.

In [6]:
# LogisticRegression(DescentAlgorithm, LearningRate, max iterations, 
# batch size, relative convergence)
gd_log = LogisticRegression(gd, lr, 5000, wbc_n, rel_conv)
sgd_1_log = LogisticRegression(sgd_1, lr, 2000, 1, rel_conv)
sgd_10_log = LogisticRegression(sgd_10, lr, 4000, 10, rel_conv)
sgd_100_log = LogisticRegression(sgd_100, lr, 4000, 100, rel_conv)
agd_log = LogisticRegression(agd, lr, 400, wbc_n, rel_conv)
svrg_log = LogisticRegression(svrg, lr, 20, wbc_n, rel_conv)

Then, we run the fit for each model:

In [7]:
print('Fitting gradient descent:')
wbc_gd_loss = gd_log.fit(wbc_X_train, wbc_y_train)
print('\nFitting stochastic gradient descent, batch size = 1:')
wbc_sgd_1_loss = sgd_1_log.fit(wbc_X_train, wbc_y_train)
print('\nFitting stochastic gradient descent, batch size = 10:')
wbc_sgd_10_loss = sgd_10_log.fit(wbc_X_train, wbc_y_train)
print('\nFitting stochastic gradient descent, batch size = 100:')
wbc_sgd_100_loss = sgd_100_log.fit(wbc_X_train, wbc_y_train)
print('\nFitting accelerated gradient descent:')
wbc_agd_loss = agd_log.fit(wbc_X_train, wbc_y_train)
print('\nFitting stochastic variance reduced gradient descent:')
wbc_svrg_loss = svrg_log.fit(wbc_X_train, wbc_y_train)

Fitting gradient descent:
Iter:        0 train loss: 375.565
Iter:      500 train loss: 234.202
Iter:     1000 train loss: 227.359
Iter:     1500 train loss: 225.602
Iter:     2000 train loss: 224.971
Iter:     2500 train loss: 224.705
Converged in 2788 iterations.

Fitting stochastic gradient descent, batch size = 1:
Iter:        0 train loss: 392.547
Iter:      200 train loss: 256.957
Iter:      400 train loss: 238.522
Iter:      600 train loss: 232.521
Iter:      800 train loss: 325.748
Converged in 987 iterations.

Fitting stochastic gradient descent, batch size = 10:
Iter:        0 train loss: 372.393
Iter:      400 train loss: 238.371
Iter:      800 train loss: 228.928
Iter:     1200 train loss: 226.759
Converged in 1461 iterations.

Fitting stochastic gradient descent, batch size = 100:
Iter:        0 train loss: 382.632
Iter:      400 train loss: 237.869
Converged in 480 iterations.

Fitting accelerated gradient descent:
Iter:        0 train loss: 386.776
Iter:       40 train l

In [8]:
acc = check_accuracy(gd_log, wbc_X_test, wbc_y_test)
print("GD Accuracy: {0:.2f}%".format(acc * 100))
acc = check_accuracy(sgd_1_log, wbc_X_test, wbc_y_test)
print("SGD 1 Accuracy: {0:.2f}%".format(acc * 100))
acc = check_accuracy(sgd_10_log, wbc_X_test, wbc_y_test)
print("SGD 10 Accuracy: {0:.2f}%".format(acc * 100))
acc = check_accuracy(sgd_100_log, wbc_X_test, wbc_y_test)
print("SGD 100 Accuracy: {0:.2f}%".format(acc * 100))
acc = check_accuracy(agd_log, wbc_X_test, wbc_y_test)
print("AGD Accuracy: {0:.2f}%".format(acc * 100))
acc = check_accuracy(svrg_log, wbc_X_test, wbc_y_test)
print("SVRG Accuracy: {0:.2f}%".format(acc * 100))

GD Accuracy: 86.43%
SGD 1 Accuracy: 77.14%
SGD 10 Accuracy: 86.43%
SGD 100 Accuracy: 86.43%
AGD Accuracy: 87.86%
SVRG Accuracy: 87.86%


In [9]:
plot_losses(wbc_gd_loss, wbc_sgd_1_loss, wbc_sgd_10_loss, wbc_sgd_100_loss, wbc_agd_loss, wbc_svrg_loss, wbc_agd_loss)

NameError: name 'figsize' is not defined

#### MNIST Data
Then, we setup the run for all of our descent methods on the MNIST dataset, beginning with the initialization of each of our descent method objects. We combine the cells here and reduce the footprint, as the usage is the same as above.

In [None]:
lr = FixedRate(0.000001)
# initialize our descent methods
gd = GradientDescent()
sgd_1 = GradientDescent() 
sgd_10 = GradientDescent()
sgd_100 = GradientDescent()
agd = NesterovAcceleratedDescent()
svrd = StochasticVarianceReducedGradientDescent()
# initialize the logisitic regression objects
gd_log = LogisticRegression(gd, lr, 2000, mnist_n, rel_conv)
sgd_1_log = LogisticRegression(sgd_1, lr, 100, 1, rel_conv)
sgd_10_log = LogisticRegression(sgd_10, lr, 200, 10, rel_conv)
sgd_100_log = LogisticRegression(sgd_100, lr, 2000, 100, rel_conv)
agd_log = LogisticRegression(agd, lr, 200, mnist_n, rel_conv)
svrd_log = LogisticRegression(svrd, lr, 20, mnist_n, rel_conv)
# and run the fit for each of these models, this time on the MNIST data set:
print('Fitting gradient descent:')
mnist_gd_loss = gd_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 1:')
mnist_sgd_1_loss = sgd_1_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 10:')
mnist_sgd_10_loss = sgd_10_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 100:')
mnist_sgd_100_loss = sgd_100_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)
print('\nFitting accelerated gradient descent:')
mnist_agd_loss = agd_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)
print('\nFitting stochastic variance reduced gradient descent:')
mnist_sbrd_loss = svrd_log.fit(mnist_X_train, mnist_y_train, non_zero_init = True)

#### COD-RNA Data
Lastly, we setup the run for all of our descent methods on the COD-RNA dataset, again using a reduced-frill cell to run our fit for each method's model.

In [None]:
lr = FixedRate(0.001)
# initialize our descent methods
gd = GradientDescent()
sgd_1 = GradientDescent()
sgd_10 = GradientDescent()
sgd_100 = GradientDescent()
agd = NesterovAcceleratedDescent()
svrd = StochasticVarianceReducedGradientDescent()
# initialize the logisitic regression objects
gd_log = LogisticRegression(gd, lr, 5000, wbc_n, rel_conv)
sgd_1_log = LogisticRegression(sgd_1, lr, 2000, 1, rel_conv)
sgd_10_log = LogisticRegression(sgd_10, lr, 2000, 10, rel_conv)
sgd_100_log = LogisticRegression(sgd_100, lr, 2000, 100, rel_conv)
agd_log = LogisticRegression(agd, lr, 200, wbc_n, rel_conv)
svrd_log = LogisticRegression(svrd, lr, 20, wbc_n, rel_conv)
# and run the fit for each of these models, this time on the MNIST data set:
print('Fitting gradient descent:')
wbc_gd_loss = gd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 1:')
wbc_sgd_1_loss = sgd_1_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 10:')
wbc_sgd_10_loss = sgd_10_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 100:')
wbc_sgd_100_loss = sgd_100_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting accelerated gradient descent:')
wbc_agd_loss = agd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic variance reduced gradient descent:')
wbc_sbrd_loss = svrd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)

Next, we initialize all of our model objects (all logistic regression models in this case), with the appropriate parameters for each algorithm.

In [None]:
# LogisticRegression(DescentAlgorithm, LearningRate, max iterations, 
# batch size, relative convergence)
gd_log = LogisticRegression(gd, lr, 5000, wbc_n, rel_conv)
sgd_1_log = LogisticRegression(sgd_1, lr, 2000, 1, rel_conv)
sgd_10_log = LogisticRegression(sgd_10, lr, 2000, 10, rel_conv)
sgd_100_log = LogisticRegression(sgd_100, lr, 2000, 100, rel_conv)
agd_log = LogisticRegression(agd, lr, 200, wbc_n, rel_conv)
svrd_log = LogisticRegression(svrd, lr, 20, wbc_n, rel_conv)

Then, we run the fit for each model:

In [None]:
print('Fitting gradient descent:')
wbc_gd_loss = gd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 1:')
wbc_sgd_1_loss = sgd_1_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 10:')
wbc_sgd_10_loss = sgd_10_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic gradient descent, batch size = 100:')
wbc_sgd_100_loss = sgd_100_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting accelerated gradient descent:')
wbc_agd_loss = agd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)
print('\nFitting stochastic variance reduced gradient descent:')
wbc_sbrd_loss = svrd_log.fit(wbc_X_train, wbc_y_train, non_zero_init = True)

In [None]:
acc = check_accuracy(log, wbc_X_test, wbc_y_test)
print("Model Accuracy: {0:.2f}%".format(acc * 100))

plt.figure(1, figsize=(12, 6))
plt.title('Loss Plot')
plt.xlabel('Iteration Number')
plt.ylabel('Loss')
plt.plot(loss, 'b')
plt.show()

Run notes:

FIRST:
lr:
    - fixed: GD(0.01), SGD(0.01) - batched at 1,10,100 , SVRG(0.01), Nest(0.01)
    - polydecay: GD(0.01, 0.0001), SGD(0.01, 0.00001) - batched, SVRG(N/A), Nest(N/A)
    - expdecay: GD(0.1,0.001), SGD(0.1,.001) - batched, SVRG(N/A), Nest(N/A)