In this version of the notebook, we use mini-batch learning rather than full batch learning. This may help when we are dealing with very large datasets that wouldn't fit in the device memory.

Another change is that we use the [FtrlOptimizer](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf) (*follow the regularized leader*) algorithm for optimization. This algorithm is a coordinate-descent type algorithm and it works better with the mini-batch version if this Poisson regression problem 

In [1]:
from __future__ import print_function
import tensorflow as tf
import numpy as np
import pandas as pd
import statsmodels.api as sm

### Generating data

In [2]:
def gen_data(N = 10000):
    data = np.random.uniform(-1, 1, (N, 3))
    data = sm.add_constant(data)
    data = pd.DataFrame(data, columns = ['intercept', 'Var1', 'Var2', 'Var3'])
    lam = np.exp(-2*data['intercept'] + data['Var1'] - 0.5*data['Var2'] + 0.3*data['Var3'] )
    resp = np.random.poisson(lam = lam)
    data['lam'] = lam
    data['resp'] = resp
    return data

In [3]:
dtrain = gen_data()

### Mini-batch learnign

In [4]:
batch_size = 500
with tf.device('/cpu:0'):
    X = tf.placeholder(dtype=tf.float32, shape = (None, 4), name = 'X')
    y = tf.placeholder(dtype = tf.float32, name='y', shape=(None, 1))

    w = tf.Variable(tf.zeros([4, 1]))

    y_hat = tf.exp(tf.matmul(X, w))

    loss_function = tf.reduce_mean(-y*tf.log(y_hat)+y_hat)

with tf.device('/cpu:0'):
    train_step = tf.train.FtrlOptimizer(1).minimize(loss_function)
    init = tf.global_variables_initializer()
session = tf.InteractiveSession()
session.run(init)

for i in range(1000):
    dtrain = gen_data(batch_size)
    X_ = dtrain[['intercept', 'Var1', 'Var2', 'Var3']].as_matrix()
    y_ = dtrain[['resp']]
    session.run([train_step, loss_function], feed_dict = {X:X_ , y:y_})

### The result 

In [5]:
w.eval()

array([[-2.02333641],
       [ 0.9831714 ],
       [-0.56171674],
       [ 0.3710334 ]], dtype=float32)

### Comparison with the statsmodels package 
To check the results, we repeat solving the same regression problem using the statsmodels library:

In [6]:
poisson_family = sm.families.family.Poisson(link=sm.genmod.families.links.log)
poisson_model = sm.GLM(dtrain['resp'], dtrain[['intercept', 'Var1', 'Var2', 'Var3']], family=poisson_family)
poisson_results = poisson_model.fit()

In [7]:
poisson_results.summary()

0,1,2,3
Dep. Variable:,resp,No. Observations:,500.0
Model:,GLM,Df Residuals:,496.0
Model Family:,Poisson,Df Model:,3.0
Link Function:,log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-216.75
Date:,"Sun, 01 Oct 2017",Deviance:,285.97
Time:,21:34:43,Pearson chi2:,503.0
No. Iterations:,6,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,-2.0542,0.136,-15.103,0.000,-2.321,-1.788
Var1,0.9316,0.208,4.473,0.000,0.523,1.340
Var2,-0.5411,0.193,-2.803,0.005,-0.919,-0.163
Var3,0.2502,0.193,1.296,0.195,-0.128,0.629
