# Logistic Regression with TensorFlow

[**Important.** If you are not familiar with logistic regression, then you should try this tutorial _after_ tomorrow's session, where we will learn about logistic regression. If you already know what it is about, then you can go ahead with this notebook.]

Given what you have learned, can you develop logistic regression now? In logistic regression, the dependent variable `y` is binary, and its expectation (the probability that $y=1$) is
$$
\textrm{Prob}(y=1|x) = \sigma(w^\top x + b),
$$
where $\sigma(x) = \frac{1}{1+\exp(-x)}$ is the sigmoid function. The sigmoid function, which returns a number between 0 and 1, is available as `tf.sigmoid` in TensorFlow, or `scipy.special.expit` in Scipy.

**Import the packages.**

In [None]:
import tensorflow as tf
import numpy as np
from scipy.special import expit
import matplotlib.pyplot as plt
%matplotlib notebook

## Data

**Generate the data.**

In [None]:
# True coefficients
true_weights = np.array([-3.0])
true_intercept = 1.0

# Generate N 1-dimensional locations X at random
N = 40
num_dim = 1
x_data = np.random.rand(N, num_dim).astype(np.float32) # Conversion to type float required for TF

# Generate the dependent variable y
y_data_logistic = (np.random.rand(N) < expit(np.matmul(x_data, true_weights) + true_intercept)).astype(np.float32)

**Visualize the data.** As a sanity check, we plot the data, together with the underlying probability $\textrm{Pr}(y=1|x)$.

In [None]:
# Plot the data
plt.figure()
plt.plot(x_data[:,0], y_data_logistic, '.')
# Plot the probability P(y=1) on top
x_loc = np.expand_dims(np.linspace(0.0, 1.0, num=200), axis=1)
y_loc = expit(np.matmul(x_loc, true_weights) + true_intercept)
plt.plot(x_loc, y_loc, color='red')

## Computational Graph

This steps are analogous to the linear regression case.

**Declare the variables.** The variables remain unchanged.

In [None]:
weight = tf.Variable(tf.random_uniform([num_dim, 1], -1.0, 1.0))
intercept = tf.Variable(tf.constant(0.0, shape=[1, 1]))

**Compute the loss.** The loss function needs to be changed with respect to the linear regression case. In logistic regression, it is given by
$$\mathcal{L} = \frac{1}{N}\sum_{n=1}^N y_n \log\sigma(w^\top x + b) + (1-y_n) \log\left(1-\sigma(w^\top x + b)\right).$$

Fortunately, TensorFlow has a function that computes that, called `tf.nn.sigmoid_cross_entropy_with_logits`. It receives the *labels* $y$ and the *logits* $w^\top x + b$.The loss
$$
\mathcal{L} = \frac{1}{N}\sum_{n=1}^{N} \left( \hat{y}_n - y_n \right)^2,\qquad
\textrm{with} \quad \hat{y}_n = b+w^\top x_n.
$$



In [None]:
y_predicted = intercept + tf.matmul(x_data, weight)
loss_logistic = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=tf.squeeze(y_predicted), labels=y_data_logistic))

**Optimizer.** This is the same as in the linear regression case. Although we are optimizing a more complex function, we don't need to change this piece of code! That is the main advantage of TensorFlow (and similar packages).

In [None]:
# Define an optimizer. We will use a simple gradient descent optimizer
learning_rate = 0.5
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss_logistic) # We need to specify which variable we want to minimize

## Session

We can only evaluate variables and perform computations within a TensorFlow session. We will create a session that:
1. Initializes the variables (weight and intercept).
2. Runs gradient descent to minimize the loss. We use a `for` loop for that.
3. Prints the progress.

**[Task]** In the cell below, create a session to run gradient descent, with $5000$ iterations. You do *not* need to plot the results after convergence.

**[Question]** Do we recover the weight and intercept as accurately as in linear regression? Why/Why not?