# Logistic Regression using `keras`

The [Keras Documentation](http://keras.io/) states that "Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano". In that sense, it's not really meant to be used to perform logistic regression, but to continue gradually building towards neural networks, we'll walk through it using `keras`.

## Computational Graphs for Logistic Regression 

As a reference, the computational graphs that we used to visualize the forward and backward propagation steps in solving our logistic regression problem with gradient descent are as follows: 

### Forward Propagation

<img src="../imgs/custom/logistic_comp_graph_condensed_forprop.png" width=300\>

### Backward Propagation

<img src="../imgs/custom/logistic_comp_graph_condensed_backprop.png" width=400\>

### Performing Multiple Linear Regression with Keras

Since `keras` can be run on top of either `tensorflow` or `theano`, this means that under the hood of our logistic regression using `keras`, a similar (if not the same) version of the code that we wrote in our `theano` or `tensorflow` implementation is being run. By default, `keras` runs on `theano`, but by [adjusting our keras configuration file](http://keras.io/backend/#switching-from-one-backend-to-another), we can easily change that. For now, though, we'll just run it on `theano`. 

In [1]:
import numpy as np
from keras.layers import Input
from keras.layers.core import Dense
from keras.models import Model
from keras.optimizers import SGD
from datasets.general import gen_multiple_logistic

Using TensorFlow backend.


In [2]:
def get_keras_model(): 
    learning_rate = 0.1
    # 1. Specify a placeholder for the inputs. 
    xs = Input(shape=(4,))
    # 2. Define the equation that generates predictions. 
    ys = Dense(1, activation='sigmoid', bias=False)(xs)

    # 3. Define a `Model` object that will be used to train/learn the coefficients. 
    logistic_model = Model(input=xs, output=ys)
    
    # 4. Define the optimizer and loss function used to train/learn the coefficients. 
    sgd = SGD(learning_rate)
    
    # 5. Compile the model (basically, build up the backpropagation steps)
    logistic_model.compile(loss='binary_crossentropy', optimizer=sgd)
    
    return logistic_model

Similar to the comparisons of multiple linear regression and logistic regression using `theano` and `tensorflow`, we'll see only minor differences between our `keras` implementatinos for multiple linear regression and logistic regression. Our `get_keras_model` function still returns back what our `get_theano_graph` (notebook `3c`) and `get_tensorflow_graph` (notebook `3d`) functions returned - a set of computations that perform forward and backward propagation in order to solve a multiple linear regression problem using gradient descent. 

Compared to our `theano` and `tensorflow` implementations, our `get_keras_model` has a smaller code base, which makes sense given it's goal to be a "minimalist, highly modular neural networks library". In particular, we see that our forward propagation is defined in 2 steps, compared to the 5 steps it took with `theano` or `tensorflow`: 

* Step `1` is simply the `keras` way of generating a placeholder variable that will later be replaced with real data. The one piece of information we have to provide is the dimensionality of one of our input observations (e.g. how many features it has). Since we specified that we would have three features and added a column of ones into our `xs` to account for the intercept (
<img src="../imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>
), our `shape` needs to have 4 dimensions.  
* Step `2` defines our logistic regression equation, <img src="../imgs/equations/logistic.png" width=125 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" \>. To specify logistic regression in place of linear regression, all we have to do is change the **activation** to 'sigmoid'. The use of this activation here will apply the sigmoid function to the sum of the inputs, which is just 
<img src="../imgs/variables/x_beta.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=30 \> . In later notebooks, we'll detail exactly what that `1` passed into `Dense` refers to, but for now trust by changing the activation to 'sigmoid', we are performing logistic regression. 

Backward propagation is defined in steps `4-5`: 

* Step `4` specifies exactly how to perform our gradient descent updates. Here, we'll use what we've used in all of our prior implementations - stochastic gradient descent with a learning rate of 0.1. As we'll see in later notebooks, there are a number of more complicated flavors of gradient descent that we also have the option to use.
* Step `5` tells `keras` to calculate the update rules for our coefficients, defining each of the steps necessary for doing so. Here, we have to specify a `loss` as well as an `optimizer`. As discussed above, the `optimizer` specifies how to perform our gradient descent updates. The `loss` function specifies how we calculate the error, which for logistic regression has involved the use of binary crossentropy. After defining both a `loss` and `optimizer`, `keras` has all of the pieces it needs to calculate the update rules for our coefficients, and to add those update steps into the graph that it will later run through. 

Step `3` builds a model object that we can later use to learn our coefficients. To instantiate it, we have to specify `input` as well as `output`. In order to finish building it for later use, we have to run `compile` on it like we did in step `5`. 

We'll now run our `keras` model and perform logistic regression...

In [12]:
# Randomly generate our betas and number of observations, used to generate 
# fake data to fit. We need a minimum of 4 obs. 
true_betas_array = np.random.randint(2, 10, size=3)
true_beta_0 = np.random.random(size=1)
true_betas_array = np.concatenate([true_beta_0, true_betas_array])
n_obs = np.random.randint(9500, 10000) 
for idx, beta in enumerate(true_betas_array): 
        print("Actual beta_{}: {}".format(idx, beta))  
print ('\n')

# Generate the data that follows a linear relationship specified 
# by true_beta_0 and true_beta_1.
xs, ys = gen_multiple_logistic(true_betas_array, n_obs)

# Generate the keras model and print out the initial weights.
logistic_model = get_keras_model()
init_weights = logistic_model.get_weights()
for idx, beta in enumerate(init_weights[0]): 
        print("Initial keras value for beta_{}: {}".format(idx, beta[0]))  
print ('\n')


# Learn the coefficients (perform iterations of forward and backward propagation)
logistic_model.fit(xs, ys, nb_epoch=100000, verbose=0, batch_size=n_obs)
final_weights = logistic_model.get_weights()
for idx, beta in enumerate(final_weights[0]): 
        print("Initial keras value for beta_{}: {}".format(idx, beta[0]))  
print ('\n')

Actual beta_0: 0.1923596155415337
Actual beta_1: 5.0
Actual beta_2: 2.0
Actual beta_3: 9.0


Initial keras value for beta_0: -0.5797706842422485
Initial keras value for beta_1: 0.5497211813926697
Initial keras value for beta_2: 0.7954688668251038
Initial keras value for beta_3: -1.0786195993423462


Initial keras value for beta_0: 0.1923292726278305
Initial keras value for beta_1: 4.99879789352417
Initial keras value for beta_2: 1.9995830059051514
Initial keras value for beta_3: 8.998096466064453




Just as with our linear regression problems, running our `keras` model is fairly straightforward. We simply call `fit` on it, making sure to pass in our inputs and outputs (`xs` and `ys`) and specify how many iterations of forward and backward propagation to perform over our dataset (this is the `nb_epoch` argument, which we'll detail more later as we dive into neural networks). 

Upon running it, we can see that we are also able to solve our multiple linear regression problem using `keras`, and that we obtain the expected coefficients. Next, we'll actually start coding up our first true neural network! 