# Deep Learning Framework Comparison

**Logistic Regression** implemented in:

* Theano
* Tensorflow
* PyTorch
* Neon
* Keras

In [1]:
import numpy as np

np.set_printoptions(suppress=True)

num_samples = 100
num_feats = 40
epochs = 10000
learning_rate = .01

x_train = np.random.normal(size=(num_samples, num_feats))

y_train = np.random.randint(2, size=(num_samples))
y_train_2dim = y_train.reshape(num_samples, 1)

In [2]:
x_train.shape

(100, 40)

In [3]:
y_train.shape

(100,)

In [4]:
y_train_2dim.shape

(100, 1)

## Static Computation Graphs

Theano and Tensorflow require *static* computation graphs; they are defined *once* and executed over and over again.

### Theano

Relevant docs:

* [Shared variables](http://deeplearning.net/software/theano_versions/dev/tutorial/examples.html#using-shared-variables)
* [Typed constructors](http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors)
* [Gradients](http://deeplearning.net/software/theano/tutorial/gradients.html)
* [Functions](http://deeplearning.net/software/theano/library/compile/function.html)

In [5]:
import theano
import theano.tensor as T

x = T.dmatrix('x')
y = T.dvector('y')

w = theano.shared(np.zeros(num_feats))
b = theano.shared(0.)

yhat = T.nnet.sigmoid(T.dot(x, w) + b)
loss = T.nnet.binary_crossentropy(output=yhat, target=y)
cost = loss.mean()

dw, db = T.grad(cost, [w, b])

train = theano.function(
  inputs=[x, y],
  outputs=cost,
  updates=((w, w - learning_rate * dw), (b, b - learning_rate * db))
)

In [6]:
for epoch in range(epochs):
    mean_loss = train(x=x_train, y=y_train)
    
    if epoch % 1000 == 0:
        print('Mean Loss', mean_loss)

Mean Loss 0.6931471805599458
Mean Loss 0.5136904630014473
Mean Loss 0.49878517969608693
Mean Loss 0.49352835203491163
Mean Loss 0.491066829816844
Mean Loss 0.48975355712923296
Mean Loss 0.4889947104353848
Mean Loss 0.48853172240666787
Mean Loss 0.4882380835373335
Mean Loss 0.4880465157264737


### Tensorflow

Relevant docs:

* [Variables](https://www.tensorflow.org/programmers_guide/variables)
* [Placeholders](https://www.tensorflow.org/api_guides/python/reading_data#feeding)
* [Gradients](https://www.tensorflow.org/versions/r0.12/api_docs/python/train/gradient_computation)
* [Session](https://www.tensorflow.org/api_docs/python/tf/Session#run)

In [7]:
import tensorflow as tf

x = tf.placeholder(tf.float32, shape=(num_samples, num_feats))
y = tf.placeholder(tf.float32, shape=(num_samples, 1))

w = tf.Variable(tf.zeros((num_feats, 1)))
b = tf.Variable(tf.zeros(1))

yhat = tf.sigmoid(tf.matmul(x, w) + b)
loss = tf.losses.log_loss(labels=y, predictions=yhat)
cost = tf.reduce_mean(loss)

dw, db = tf.gradients(cost, [w, b])

new_w = w.assign(w - learning_rate * dw)
new_b = b.assign(b - learning_rate * db)

In [8]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch in range(epochs):
        mean_loss, _, _ = sess.run([cost, new_w, new_b], feed_dict={x: x_train, y: y_train_2dim})
        
        if epoch % 1000 == 0:
            print('Mean Loss', mean_loss)

Mean Loss 0.693147
Mean Loss 0.51369
Mean Loss 0.498785
Mean Loss 0.493528
Mean Loss 0.491067
Mean Loss 0.489753
Mean Loss 0.488995
Mean Loss 0.488532
Mean Loss 0.488238
Mean Loss 0.488046


## Dynamic Computation Graphs

PyTorch and Neon offer "true" automatic differientiation. The computation graph is *dynamic*; it is constructed on the fly during each epoch as we calculate the loss and weight updates.

### PyTorch

Relevant docs:

* [Variables and autograd](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-variables-and-autograd)
* [Static vs dynamic graphs](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#tensorflow-static-graphs)

In [9]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor

x = Variable(torch.from_numpy(x_train).type(dtype), requires_grad=False)
y = Variable(torch.from_numpy(y_train_2dim).type(dtype), requires_grad=False)

w = Variable(torch.zeros(num_feats, 1), requires_grad=True)
b = Variable(torch.zeros(1), requires_grad=True)

loss_fn = torch.nn.BCELoss()

In [10]:
for epoch in range(epochs):
    yhat = torch.sigmoid(x.mm(w) + b)
    loss = loss_fn(yhat, y)
    cost = torch.mean(loss)
    
    cost.backward()  # <-- This is neat.

    w.data -= learning_rate * w.grad.data
    b.data -= learning_rate * b.grad.data

    w.grad.data.zero_()
    b.grad.data.zero_()
    
    if epoch % 1000 == 0:
        print('Mean Loss', cost.data[0])

Mean Loss 0.6931465268135071
Mean Loss 0.5136904716491699
Mean Loss 0.49878519773483276
Mean Loss 0.4935283362865448
Mean Loss 0.49106690287590027
Mean Loss 0.48975348472595215
Mean Loss 0.4889947175979614
Mean Loss 0.48853182792663574
Mean Loss 0.48823806643486023
Mean Loss 0.48804646730422974


## Neon

Relevant docs:

* [Neon backend](https://neon.nervanasys.com/index.html/backends.html)
* [Automatic differentiation](https://neon.nervanasys.com/index.html/backends.html#automatic-differentiation)

In [11]:
from neon.backends import gen_backend, Autodiff

be = gen_backend('cpu')

w = be.zeros(num_feats)
b = be.zeros(1)

x = be.empty_like(x_train)
y = be.empty_like(y_train)
x[:] = x_train
y[:] = y_train

def loss_fn(yhat, y):
    # Binary cross entropy.
    return -y * be.log(yhat) - (1 - y) * be.log(1 - yhat)

DISPLAY:neon:mklEngine.so not found; falling back to cpu backend


In [12]:
for epoch in range(epochs):
    yhat = be.sig(be.dot(x, w) + b)
    loss = loss_fn(yhat, y)
    cost = be.mean(loss)

    ad = Autodiff(op_tree=cost, be=be)
    dw, db = ad.get_grad_tensor([w, b])
    
    w[:] = w - learning_rate * dw
    b[:] = b - learning_rate * db
    
    if epoch % 1000 == 0:
        print('Mean Loss', cost.asnumpyarray()[0, 0])

Mean Loss 0.692147
Mean Loss 0.513663
Mean Loss 0.498777
Mean Loss 0.493525
Mean Loss 0.491065
Mean Loss 0.489753
Mean Loss 0.488994
Mean Loss 0.488531
Mean Loss 0.488238
Mean Loss 0.488046


## Abstractions

### Keras

External abstractions for Theano and Tensorflow.

In [13]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(1, activation='sigmoid', input_shape=(num_feats,)))

optimizer = SGD(lr=learning_rate, momentum=0.)
model.compile(loss='binary_crossentropy', optimizer=optimizer)

Using Theano backend.


In [14]:
model.fit(x_train, y_train, epochs=epochs, batch_size=num_samples, verbose=0)
model.predict(x_train[:5])

array([[ 0.80225438],
       [ 0.40147239],
       [ 0.85430479],
       [ 0.59398532],
       [ 0.74547029]], dtype=float32)

### Tensorflow's [Estimator](https://www.tensorflow.org/extend/estimators) API

Internal abstractions for Tensorflow.

In [15]:
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.ERROR)

Estimator = tf.estimator.Estimator
EstimatorSpec = tf.estimator.EstimatorSpec
ModeKeys = tf.estimator.ModeKeys

def model_fn(features, labels, mode):
    x = features['x']
    yhat = tf.layers.dense(inputs=x, units=1, activation=tf.nn.sigmoid)
    
    if mode == ModeKeys.PREDICT:
        return EstimatorSpec(mode=mode, predictions={'probs': yhat})
    
    loss = tf.losses.log_loss(labels=labels, predictions=yhat)
    cost = tf.reduce_mean(loss)
    
    if mode == ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
        train_op = optimizer.minimize(loss=cost, global_step=tf.train.get_global_step())
        return EstimatorSpec(mode=mode, loss=cost, train_op=train_op)
    else:
        # Eval mode.
        return EstimatorSpec(mode=mode, loss=cost)
    

model = Estimator(model_fn=model_fn)

In [16]:
numpy_input_fn = tf.estimator.inputs.numpy_input_fn

train_input_fn = numpy_input_fn(x={'x': x_train}, y=y_train_2dim, num_epochs=None, shuffle=True)
model.train(input_fn=train_input_fn, steps=epochs)

<tensorflow.python.estimator.estimator.Estimator at 0x117b51518>

In [17]:
predict_input_fn = numpy_input_fn(x={'x': x_train[:5]}, num_epochs=1, shuffle=False)
list(model.predict(input_fn=predict_input_fn))

[{'probs': array([ 0.80095333])},
 {'probs': array([ 0.39785189])},
 {'probs': array([ 0.85093534])},
 {'probs': array([ 0.59857714])},
 {'probs': array([ 0.74323036])}]

### PyTorch's [nn](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#nn-module) module

Internal abstractions for PyTorch.

In [18]:
import torch
from torch.nn import Sequential, Linear, Sigmoid, BCELoss
from torch.autograd import Variable

dtype = torch.FloatTensor

x = Variable(torch.from_numpy(x_train).type(dtype))
y = Variable(torch.from_numpy(y_train_2dim).type(dtype))

model = Sequential(
    Linear(in_features=num_feats, out_features=1),
    Sigmoid()
)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.)
loss_fn = BCELoss()

In [19]:
for epoch in range(epochs):
    yhat = model(x)
    loss = loss_fn(yhat, y)
    cost = torch.mean(loss)
    
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

In [20]:
def predict(model, x):
    x = Variable(torch.from_numpy(x).type(torch.FloatTensor))
    yhat = model(x)
    return yhat.data.numpy()

In [21]:
predict(model, x_train[:5])

array([[ 0.80273807],
       [ 0.39796931],
       [ 0.85340548],
       [ 0.59930348],
       [ 0.74230564]], dtype=float32)

### Neon

Internal abstractions for Neon.

In [22]:
from neon.layers import Affine, GeneralizedCost
from neon.initializers import Constant
from neon.transforms import Logistic, CrossEntropyBinary
from neon.transforms.cost import Cost
from neon.models import Model
from neon.optimizers import GradientDescentMomentum
from neon.data import ArrayIterator
from neon.backends import gen_backend

be = gen_backend('cpu', batch_size=num_samples)

model = Model([
    Affine(nout=1, init=Constant(0.), bias=Constant(0.), activation=Logistic())
])

cost = GeneralizedCost(CrossEntropyBinary())
optimizer = GradientDescentMomentum(learning_rate=learning_rate, momentum_coef=0.)

In [23]:
dataset = ArrayIterator(x_train, y_train_2dim, make_onehot=False)
model.fit(dataset, cost=cost, optimizer=optimizer, num_epochs=epochs)

In [24]:
def predict(model, x):
    # The tensor passed through fprop needs to have the 
    # same size as the batch size used during training.
    # There must be a better way...
    original_x_len = x.shape[0]
    
    padded_x = np.concatenate((x, np.zeros((be.bsz - original_x_len, num_feats))), axis=0)
    padded_x_tensor = be.empty_like(padded_x.T)
    padded_x_tensor[:] = padded_x.T
    
    output = model.fprop(padded_x_tensor)
    output = output.asnumpyarray()[0]
    
    return output[:original_x_len]

In [25]:
predict(model, x_train[:5])

array([ 0.67268467,  0.41059297,  0.80636424,  0.58491576,  0.69745988], dtype=float32)