## Tutorial on Logistic Regression using TensorFlow

In this tutorial, we are going to exploit TensorFlow to implement Logistic Regression. First of all, we will breifly look into Logistic Regression before going on. Logistic Regression is a classification algorithm that given a data point (feature), predict its class (or label). In this tutorial, we assume that there are two classes to be predicted (i.e., in this case, we call that binary classification). 

Suppose $X$ is a set of data points, and $W$ is a weight parameter to be learned in Logistic Regression.
The hypothesis for Logistic Regression is represented as follows:

$$ H(X) = \frac{1}{1 + e^{XW}} $$

This is came from the combination of sigmoid function (logistic) and the hypothesis of Linear Regression. 
The cost function based on the hypothesis is written by:

$$cost(W) = \frac{1}{m}\sum_{i=1}^{m} (-y_ilog(H(x_{i})) - (1-y_i)log(1-H(x_{i})))$$

In [10]:
import tensorflow as tf
import numpy as np

# data points
#x_data = [[1,2], [2,3], [3,1], [4,3], [5,3], [6,2]]
#y_data = [[0], [0], [0], [1], [1], [1]]
xy = np.loadtxt('datasets/data-03-diabetes.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

# placeholders
X = tf.placeholder(tf.float32, shape=[None, 8])
Y = tf.placeholder(tf.float32, shape=[None, 1])
W = tf.Variable(tf.random_normal([8, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

# set hypothesis 
H = tf.sigmoid(tf.matmul(X, W) + b)

# set cost
cost = -tf.reduce_mean(Y * tf.log(H) + (1 - Y)*tf.log(1 - H))

# set train
train = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)

# 
predicted = tf.cast(H > 0.5, dtype=tf.float32) #
accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))

#
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for step in range(10001):
        cost_val, _ = sess.run([cost, train], feed_dict={X: x_data, Y: y_data})
        if step % 1000 == 0:
            print(step, cost_val)
            
    h, c, a = sess.run([H, predicted, accuracy], feed_dict={X: x_data, Y: y_data})
    
    print("\nHypothesis: ", h, "\nCorrect (Y): ", c, "\nAccuracy: ", a)

0 0.875619
1000 0.709529
2000 0.633418
3000 0.585739
4000 0.555107
5000 0.534702
6000 0.520605
7000 0.510542
8000 0.503156
9000 0.497601
10000 0.493334

Hypothesis:  [[ 0.43030173]
 [ 0.91276926]
 [ 0.15013102]
 [ 0.95045185]
 [ 0.17929338]
 [ 0.68751585]
 [ 0.93711311]
 [ 0.55946165]
 [ 0.27696759]
 [ 0.5340848 ]
 [ 0.71557057]
 [ 0.16941322]
 [ 0.19292168]
 [ 0.32622164]
 [ 0.67936754]
 [ 0.51745939]
 [ 0.67202163]
 [ 0.90390569]
 [ 0.83909291]
 [ 0.62098706]
 [ 0.67001009]
 [ 0.12528408]
 [ 0.59002829]
 [ 0.66889811]
 [ 0.40450162]
 [ 0.91930544]
 [ 0.49920774]
 [ 0.63435423]
 [ 0.75712901]
 [ 0.42115414]
 [ 0.94310874]
 [ 0.78518128]
 [ 0.52696371]
 [ 0.76723599]
 [ 0.31803849]
 [ 0.62284684]
 [ 0.83485079]
 [ 0.57885134]
 [ 0.49893287]
 [ 0.38626158]
 [ 0.79735982]
 [ 0.23763229]
 [ 0.34920806]
 [ 0.05716807]
 [ 0.54189456]
 [ 0.91376632]
 [ 0.73068899]
 [ 0.66979027]
 [ 0.91514969]
 [ 0.92231125]
 [ 0.91521448]
 [ 0.26726344]
 [ 0.35598469]
 [ 0.96341008]
 [ 0.24560258]
 [ 0.5199