## Poisson regression with Tensorflow
*Gorkem Ozkaya*

In this notebook we will perform a Poisson regression with Google's Deep Learning library Tensorflow. Although Tensorflow is not primarily designed for traditional modeling tasks such as Poisson regression, we still can benefit from the flexibility of Tensorflow. Starting from shallow models like these, one can later obtain deeper versions of Poisson regression and other GLM's, which can handle nonlinearaties and variable interactions. 

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import statsmodels.api as sm

### Generating data

In [2]:
def gen_data(N = 10000):
    data = np.random.uniform(-1, 1, (N, 3))
    data = sm.add_constant(data)
    data = pd.DataFrame(data, columns = ['intercept', 'Var1', 'Var2', 'Var3'])
    lam = np.exp(-2*data['intercept'] + data['Var1'] - 0.5*data['Var2'] + 0.3*data['Var3'] )
    resp = np.random.poisson(lam = lam)
    data['lam'] = lam
    data['resp'] = resp
    return data

In [3]:
dtrain = gen_data()

### Stating Poisson regression as an optimization problem
Let $X$ be the design matrix, $w$ be the model coefficient vector and $y$ be the observed response. Let $\hat y(w) = \exp\left(Xw\right)$ be the estimated mean by the model. We are looking for the coefficient vector $w$ that maximizes the likelihood
$$
L(w) =  \prod_{i = 1}^N \frac{\hat y_i(w)^{y_i} e^{-\hat y_i(w)}}{y_i!},
$$
where $y = (y_i)_i$ and $\hat y = (\hat y_i)_i$.  Taking logarithms and removing the constants, this problem is equivalent to minimizing the loss function
$$
   L(w)  =  -\sum_{i=1}^N \left(y_i \log(\hat y_i(w)) - \hat y_i(w) \right).
$$


### Fitting the model with Tensorflow
Having defined the objective function, now we can set the Tensorflow model. With its automatic differentiation support, Tensorflow automatically calculates the gradients of mathematical expressions, hence can do gradient descent optimization on them. We chose *Adam* as the optimization algorithm. 

In [4]:
X = tf.constant(dtrain[['intercept', 'Var1', 'Var2', 'Var3']].as_matrix(), name = 'X', dtype=tf.float32)
y = tf.constant(value = list(dtrain['resp']), dtype = tf.float32, name='y', shape=(dtrain.shape[0], 1))

w = tf.Variable(tf.zeros([4, 1]))

y_hat = tf.exp(tf.matmul(X, w))

loss_function = tf.reduce_mean(-y*tf.log(y_hat)+y_hat)

train_step = tf.train.AdamOptimizer(0.001).minimize(loss_function)
init = tf.global_variables_initializer()
session = tf.InteractiveSession()
session.run(init)

for i in xrange(10000):
    session.run(train_step)

### The result 

In [5]:
w.eval()

array([[-1.98536253],
       [ 0.97114146],
       [-0.49905109],
       [ 0.36309692]], dtype=float32)

### Comparison with the statsmodels package 
To check the results, we repeat solving the same regression problem using the statsmodels library:

In [6]:
poisson_family = sm.families.family.Poisson(link=sm.genmod.families.links.log)
poisson_model = sm.GLM(dtrain['resp'], dtrain[['intercept', 'Var1', 'Var2', 'Var3']], family=poisson_family)
poisson_results = poisson_model.fit()

In [7]:
poisson_results.summary()

0,1,2,3
Dep. Variable:,resp,No. Observations:,10000.0
Model:,GLM,Df Residuals:,9996.0
Model Family:,Poisson,Df Model:,3.0
Link Function:,log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-4492.6
Date:,"Sun, 30 Apr 2017",Deviance:,5841.5
Time:,21:10:01,Pearson chi2:,9930.0
No. Iterations:,9,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
intercept,-1.9854,0.029,-67.760,0.000,-2.043 -1.928
Var1,0.9711,0.046,21.151,0.000,0.881 1.061
Var2,-0.4991,0.044,-11.465,0.000,-0.584 -0.414
Var3,0.3631,0.043,8.535,0.000,0.280 0.446
