# Deep Learning Framework Comparison

**Logistic Regression** implemented in:

* Theano
* Tensorflow
* PyTorch
* Neon

In [1]:
import numpy as np

num_samples = 100
num_feats = 40
epochs = 10000
learning_rate = .01

x_train = np.random.normal(size=(num_samples, num_feats))
y_train = np.random.randint(2, size=(num_samples))

In [2]:
x_train.shape

(100, 40)

In [3]:
y_train.shape

(100,)

## Static Computational Graphs

Theano and Tensorflow require *static* computational graphs; they are defined *once* and executed over and over again.

### Theano

Relevant docs:

* [Shared variables](http://deeplearning.net/software/theano_versions/dev/tutorial/examples.html#using-shared-variables)
* [Typed constructors](http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors)
* [Gradients](http://deeplearning.net/software/theano/tutorial/gradients.html)
* [Functions](http://deeplearning.net/software/theano/library/compile/function.html)

In [4]:
import theano
import theano.tensor as T

x = T.dmatrix('x')
y = T.dvector('y')

w = theano.shared(np.zeros(num_feats))
b = theano.shared(0.)

yhat = 1 / (1 + T.exp(-T.dot(x, w) - b))
loss = -y * T.log(yhat) - (1 - y) * T.log(1 - yhat)
cost = loss.mean()

dw, db = T.grad(cost, [w, b])

train = theano.function(
  inputs=[x, y],
  outputs=cost,
  updates=((w, w - learning_rate * dw), (b, b - learning_rate * db))
)

In [5]:
for epoch in range(epochs):
    mean_loss = train(x=x_train, y=y_train)
    
    if epoch % 1000 == 0:
        print('Mean Loss', mean_loss)

Mean Loss 0.6931471805599458
Mean Loss 0.4907696315076982
Mean Loss 0.4601996725846576
Mean Loss 0.4449403242847565
Mean Loss 0.4353546200341992
Mean Loss 0.42868588008315733
Mean Loss 0.4237745066451708
Mean Loss 0.420024527532572
Mean Loss 0.417088669765471
Mean Loss 0.4147469388599722


### Tensorflow

Relevant docs:

* [Variables](https://www.tensorflow.org/programmers_guide/variables)
* [Placeholders](https://www.tensorflow.org/api_guides/python/reading_data#feeding)
* [Gradients](https://www.tensorflow.org/versions/r0.12/api_docs/python/train/gradient_computation)
* [Session](https://www.tensorflow.org/api_docs/python/tf/Session#run)

In [6]:
import tensorflow as tf

x = tf.placeholder(tf.float32, shape=(num_samples, num_feats))
y = tf.placeholder(tf.float32, shape=(num_samples))

w = tf.Variable(tf.zeros((num_feats, 1)))
b = tf.Variable(tf.zeros(1))

yhat = 1 / (1 + tf.exp(-tf.matmul(x, w) - b))
loss = -y * tf.log(yhat) - (1 - y) * tf.log(1 - yhat)
cost = tf.reduce_mean(loss)

dw, db = tf.gradients(cost, [w, b])

new_w = w.assign(w - learning_rate * dw)
new_b = b.assign(b - learning_rate * db)

In [7]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch in range(epochs):
        mean_loss, _, _ = sess.run([cost, new_w, new_b], feed_dict={x: x_train, y: y_train})
        
        if epoch % 1000 == 0:
            print('Mean Loss', mean_loss)

Mean Loss 0.693142
Mean Loss 0.688257
Mean Loss 0.688163
Mean Loss 0.688143
Mean Loss 0.688145
Mean Loss 0.688138
Mean Loss 0.688141
Mean Loss 0.688138
Mean Loss 0.688141
Mean Loss 0.68814


## Dynamic Computational Graphs

PyTorch and Neon offer "true" automatic differientiation. The computational graph is *dynamic*; it is constructed on the fly during each epoch as we calculate the loss and weight updates.

### PyTorch

Relevant docs:

* [Variables and autograd](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-variables-and-autograd)
* [Static vs dynamic graphs](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#tensorflow-static-graphs)

In [8]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor

x = Variable(torch.from_numpy(x_train).type(dtype), requires_grad=False)
y = Variable(torch.from_numpy(y_train).type(dtype), requires_grad=False)

w = Variable(torch.zeros(num_feats, 1), requires_grad=True)
b = Variable(torch.zeros(1), requires_grad=True)

In [9]:
for epoch in range(epochs):
    yhat = 1 / (1 + torch.exp(-x.mm(w) - b))
    loss = -y * torch.log(yhat) - (1 - y) * torch.log(1 - yhat)
    cost = torch.mean(loss)
    
    cost.backward()  # <-- This is neat.

    w.data -= learning_rate * w.grad.data
    b.data -= learning_rate * b.grad.data

    w.grad.data.zero_()
    b.grad.data.zero_()
    
    if epoch % 1000 == 0:
        print('Mean Loss', cost.data[0])

Mean Loss 0.6931471824645996
Mean Loss 0.6882584095001221
Mean Loss 0.6881639361381531
Mean Loss 0.6881466507911682
Mean Loss 0.6881416440010071
Mean Loss 0.6881399154663086
Mean Loss 0.6881392598152161
Mean Loss 0.688139021396637
Mean Loss 0.6881389021873474
Mean Loss 0.6881388425827026


## Neon

Relevant docs:

* [Neon backend](https://neon.nervanasys.com/index.html/backends.html)
* [Automatic differentiation](https://neon.nervanasys.com/index.html/backends.html#automatic-differentiation)

In [20]:
from neon.backends import gen_backend, Autodiff

be = gen_backend('cpu')

w = be.zeros(num_feats)
b = be.zeros(1)

x = be.empty_like(x_train)
y = be.empty_like(y_train)
x[:] = x_train
y[:] = y_train

In [21]:
for epoch in range(epochs):
    yhat = 1 / (1 + be.exp(-be.dot(x, w) - b))
    loss = -y * be.log(yhat) - (1 - y) * be.log(1 - yhat)
    cost = be.mean(loss)

    ad = Autodiff(op_tree=cost, be=be)
    dw, db = ad.get_grad_tensor([w, b])
    
    new_w = w - learning_rate * dw
    new_b = b - learning_rate * db
    
    w.copy(new_w)
    b.copy(new_b)
    
    if epoch % 1000 == 0:
        print('Mean Loss', cost.asnumpyarray()[0, 0])

Mean Loss 0.692168
Mean Loss 0.490721
Mean Loss 0.46018
Mean Loss 0.444929
Mean Loss 0.435347
Mean Loss 0.42868
Mean Loss 0.42377
Mean Loss 0.420021
Mean Loss 0.417086
Mean Loss 0.414745


## Abstractions

### Keras

External abstractions for Theano and Tensorflow.

In [22]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(1, activation='sigmoid', input_shape=(num_feats,)))

model.compile(loss='binary_crossentropy', optimizer='sgd')

In [23]:
history = model.fit(x_train, y_train, epochs=epochs, batch_size=num_samples, verbose=0)

### PyTorch's `nn` module

Internal abstractions for PyTorch.

In [24]:
from torch.nn import Sequential, Linear, Sigmoid, CrossEntropyLoss
from torch.autograd import Variable

dtype = torch.FloatTensor

x = Variable(torch.from_numpy(x_train).type(dtype), requires_grad=False)
y = Variable(torch.from_numpy(y_train).type(dtype), requires_grad=False)

model = Sequential(
    Linear(in_features=num_feats, out_features=1),
    Sigmoid()
)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

In [25]:
for epoch in range(epochs):
    yhat = model(x)
    loss = -y * torch.log(yhat) - (1 - y) * torch.log(1 - yhat)
    cost = torch.mean(loss)
    
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

### Neon

Internal abstractions for Neon.

In [26]:
from neon.layers import Affine, GeneralizedCost
from neon.initializers import Constant
from neon.transforms import Logistic, CrossEntropyBinary
from neon.models import Model
from neon.optimizers import GradientDescentMomentum
from neon.data import ArrayIterator
from neon.backends import gen_backend

be = gen_backend('cpu', batch_size=num_samples)

model = Model([
    Affine(nout=2, init=Constant(0.), bias=Constant(0.), activation=Logistic())
])

cost = GeneralizedCost(CrossEntropyBinary())
optimizer = GradientDescentMomentum(learning_rate=learning_rate, momentum_coef=0.9)

In [27]:
dataset = ArrayIterator(x_train, y_train, nclass=2)
model.fit(dataset, cost=cost, optimizer=optimizer, num_epochs=epochs)