
Logistic Regression Using TensorFlow
==================================

In this notebook, we will go through the steps for performing Logistic Regression on using TensorFlow.

---
Loading Libraries and Dependencies
----------------------------------

In [14]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
import random

---
Part 1: Simple Problem
--------------------

In this example we will use the dataset available at http://www.ats.ucla.edu/stat/data/binary.csv.

This dataset has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and rank. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. 

In [80]:
# Load the data
data = np.genfromtxt("http://www.ats.ucla.edu/stat/data/binary.csv", delimiter= ",")[1:, :]

# Generate x and y values
y_values = data[:, 0]
x_values = data[:, 1:]

# Create the variables required for the training
x = tf.placeholder("float", shape=[None, 3])
y = tf.placeholder("float", shape=[None, 1])

# Create weight and bias terms
W = tf.Variable(tf.zeros([3, 1]))
b = tf.Variable(tf.zeros([1]))

# Output of prediction
y_model = tf.sigmoid(tf.matmul(x, W) + b)

# Cost function of the problem
cost = tf.reduce_mean(-(y*tf.log(y_model) + (1-y)*tf.log(1 - y_model)))

# Optimizer
optimizer = tf.train.AdamOptimizer(0.005).minimize(cost)

# Initialize all variables
init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    
    for i in range(100000):
        _, err = sess.run([optimizer, cost], feed_dict={x:x_values, y:y_values.reshape(y_values.shape[0], -1)})
        
        if i % 10000 == 0:
            print "The error after {0} rounds is {1}".format(i, err)
    W_ = sess.run(W)
    b_ = sess.run(b)
    
    print "The Weight term is {0} and bias term is {1}".format(W_, b_)

The error after 0 rounds is 0.693146586418
The error after 10000 rounds is 0.578312397003
The error after 20000 rounds is 0.576482713223
The error after 30000 rounds is 0.574327528477
The error after 40000 rounds is 0.577231287956
The error after 50000 rounds is 0.575083255768
The error after 60000 rounds is 0.57432192564
The error after 70000 rounds is 0.576453328133
The error after 80000 rounds is 0.574348032475
The error after 90000 rounds is 0.574313104153
The Weight term is [[ 0.00233621]
 [ 0.77817112]
 [-0.56119227]] and bias term is [-3.45223737]


In [44]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(x_values, y_values)

clf.coef_, clf.intercept_

(array([[ 0.00188723,  0.31928562, -0.60537482]]), array([-1.5115323]))