# How to Implement Logistic Regression with TensorFlow
![alt text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/lr.png)

In [None]:
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [None]:
df = pd.read_csv('../input/heart-disease-uci/heart.csv')
df

In [None]:
x, y = df.iloc[:, 0:-1].values, df.iloc[:, -1].values.reshape((-1, 1))
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8)

In [None]:
scaler = MinMaxScaler().fit(x_train)
x_train, x_test = scaler.transform(x_train), scaler.transform(x_test)

## TL; DR
If you are here for a quick solution that just works, then here it is in just 5 lines of code:

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='bce', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=500, verbose=0)

Now, let's evaluate our model:

In [None]:
model.evaluate(x_test, y_test)

## The long way
Now, if you’re still with me it means that you don’t want just to copy + paste 5 lines of code, but to see how you can actually implement this method yourself from scratch.  
TensorFlow is a rich library; it has many APIs that you can use. Among them is the Keras API which can be used to build a logistic regression model very quickly, as you can see above. And there’s nothing wrong with that. If you have to implement a complex deep learning model, that perhaps you saw in a new paper, Keras saves you a lot of time; it lets you focus on what’s important and don’t have to care about each math operation that has to be done.  
But, if your purpose is to learn a basic machine learning technique, like logistic regression, it is worth it using the core math functions from TensorFlow and implementing it from scratch.    
Knowing TensorFlow’s lower-level math APIs also can help you building a deep learning model when you need to implement a custom training loop, or a custom activation or loss function. It can also be more fun!  
So, let’s get started!

To understand better what we’re going to do next, you can read [this article about logistic regression](https://towardsdatascience.com/understanding-logistic-regression-81779525d5c6).  
What's our plan for implementing Logistic Regression with TensorFlow?  
Let's first think of the underlying math that we want to use.  
There are many ways to define a loss function and then find the optimal parameters for it, among them, here we will implement in our `LogisticRegression` class the following 3 ways for learning the parameters:
- We will rewrite the logistic regression equation so that we turn it into a least-squares linear regression problem with different labels and then, we use the closed-form formula to find the weights: ![at text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/eq1.png)  
- Like above, we turn logistic into least-squares linear regression, but instead of the closed-form formula, we use stochastic gradient descent (SGD) to minimize the following loss function: ![alt text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/eq4.png) which was obtained by substituting the y in the sum of squared errors loss ![alt text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/eq3.png) with the right-hand side of ![alt text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/eq2.png)  
- We use the maximum likelihood estimation (MLE) method, write the likelihood function, play around with it, restate it as a minimization problem, and apply SGD with the following loss function: ![alt text](https://raw.githubusercontent.com/lazuxd/logistic-regression-with-tensorflow/main/imgs/eq5.png)

In the above equations, X is the input matrix that contains observations on the row axis and features on the column axis; y is a column vector that contains the classification labels (0 or 1); f is the sum of squared errors loss function; h is the loss function for the MLE method.   
If you want to find out more about how we obtained the above equations, please check out the above-linked article.   
So now, this is our goal: translate the above equations into code. And we’ll use TensorFlow for that.   
We plan to use an object-oriented approach for implementation. We'll create a `LogisticRegression` class with 3 public methods: `fit()`, `predict()`, and `accuracy()`.   
Among fit's parameters, one will determine how our model learns. This parameter is named method (not to be confused with a method as a function of a class) and it can take the following strings as values: 'ols_solve' (OLS stands for Ordinary Least Squares), 'ols_sgd’, and 'mle_sgd’.   
To not make the `fit()` method too long, we would like to split the code into 3 different private methods, each one responsible for one way of finding the parameters.   
We will have the `__ols_solve()` private method for applying the closed-form formula.  
In this method and in the other methods that use the OLS approach, we will use the constant EPS to make sure the labels are not exactly 0 or 1, but something in between. That’s to avoid getting plus or minus infinity for the logarithm in the equations above.   
In `__ols_solve()` we first check if X has full column rank so that we can apply this method. Then we force y to be between EPS and 1-EPS. The `ols_y` variable holds the labels of the ordinary least-squares linear regression problem that’s equivalent to our logistic regression problem. Basically, we transform the labels that we have for logistic regression so that they are compliant with the linear regression equations. After that, we apply the closed-form formula using TensorFlow functions.   

```python
EPS = 1e-5
def __ols_solve(self, x, y):
    rows, cols = x.shape
    if rows >= cols == tf.linalg.matrix_rank(x):
        y = tf.math.maximum(self.EPS,
                            tf.math.minimum(tf.cast(y, tf.float32), 1-self.EPS))
        ols_y = -tf.math.log(tf.math.divide(1, y) - 1)
        self.weights = tf.linalg.matmul(
            tf.linalg.matmul(
                tf.linalg.inv(
                    tf.linalg.matmul(x, x, transpose_a=True)
                ),
                x, transpose_b=True),
            ols_y)
    else:
        print('Error! X has not full column rank.')
```

For the 2 SGD-based algorithms, it would be redundant to have them as 2 separate methods since they will have almost all the code the same except for the part where we compute the loss value, as we have 2 different loss functions for them.   
What we’ll do is to create a generic `__sgd()` method that does not rely on a particular loss function. Instead, it will expect as a parameter a function responsible for computing the loss value which the `__sgd()` method will use.   
In this method, we first initialize the weights to a random column vector with values drawn from a normal distribution with mean 0 and a standard deviation of 1/(# of features). The intuition for this std dev is that if we have more features, then we need smaller weights to be able to converge (and not blow up our gradients). Then we go through all the dataset for `iterations` times. At the start of each such iteration, we randomly shuffle our dataset, then for each batch of data, we compute the loss value using the `loss_fn` function taken as a parameter, then use TensorFlow to take the gradient of this loss value with respect to (w.r.t.) `self.weights` and update the weights.   
The loss needs to be computed inside `with tf.GradientTape() as tape:` block. This is to tell TensorFlow to keep track of the operations applied so that it knows how to take the gradient.   
Then, to take the gradient of the loss w.r.t. weights we use `grads = tape.gradient(loss, self.weights)`, and to subtract the gradient multiplied with the learning rate we use `self.weights.assign_sub(learning_rate*grads)`.

```python
def __sgd(self, x, y, loss_fn, learning_rate, iterations, batch_size):
    rows, cols = x.shape
    self.weights = tf.Variable(tf.random.normal(stddev=1.0/cols, shape=(cols, 1)))
    dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size)

    for i in range(iterations):
        dataset.shuffle(buffer_size=1024)
        for step, (xb, yb) in enumerate(dataset):
            with tf.GradientTape() as tape:
                loss = loss_fn(xb, yb)
            grads = tape.gradient(loss, self.weights)
            self.weights.assign_sub(learning_rate*grads)
```

For ‘ols_sgd’ and ‘mle_sgd’ we’ll create 2 private methods: `__sse_loss()` and `__mle_loss()` that compute and return the loss value for these 2 different techniques.   
For these 2 methods, we simply apply the formulas for f and h using TensorFlow’s math functions.   
```python
def __sse_loss(self, xb, yb):
    yb = tf.math.maximum(self.EPS, tf.math.minimum(tf.cast(yb, tf.float32), 1-self.EPS))
    ols_yb = -tf.math.log(tf.math.divide(1, yb) - 1)

    diff = tf.linalg.matmul(xb, self.weights) - ols_yb
    loss = tf.linalg.matmul(diff, diff, transpose_a=True)

    return loss

def __mle_loss(self, xb, yb):
    xw = tf.linalg.matmul(xb, self.weights)
    term1 = tf.linalg.matmul(tf.cast(1-yb, tf.float32), xw, transpose_a=True)
    term2 = tf.linalg.matmul(
        tf.ones_like(yb, tf.float32),
        tf.math.log(1+tf.math.exp(-xw)),
        transpose_a=True)
    return term1+term2
```

So, when `fit()` is called with `method=‘ols_solve’` we call `__ols_solve()`, when `method=‘ols_sgd’` we call `__sgd()` with `loss_fn=self.__sse_loss`, and when `method=’mle_sgd’` we call `__sgd()` with `loss_fn=self.__mle_loss`.   
```python
def fit(self, x, y, method, learning_rate=0.001, iterations=500, batch_size=32):
    x = tf.concat([x, tf.ones_like(y, dtype=tf.float32)], axis=1)
    if method == "ols_solve":
        self.__ols_solve(x, y)
    elif method == "ols_sgd":
        self.__sgd(x, y, self.__sse_loss, learning_rate, iterations, batch_size)
    elif method == "mle_sgd":
        self.__sgd(x, y, self.__mle_loss, learning_rate, iterations, batch_size)
    else:
        print(f'Unknown method: \'{method}\'')

    return self
```

In `predict()` we first check if `fit()` was called previously by looking for the weights attribute (the fit method is the only method that creates it). Then we check if the shapes of the input matrix x and weights vector allow multiplication. Otherwise, return error messages. If everything is OK, we do the multiplication and pass the result through the logistic function.   
```python
def predict(self, x):
    if not hasattr(self, 'weights'):
        print('Cannot predict. You should call the .fit() method first.')
        return

    x = tf.concat([x, tf.ones((x.shape[0], 1), dtype=tf.float32)], axis=1)

    if x.shape[1] != self.weights.shape[0]:
        print(f'Shapes do not match. {x.shape[1]} != {self.weights.shape[0]}')
        return

    xw = tf.linalg.matmul(x, self.weights)
    return tf.math.divide(1, 1+tf.math.exp(-xw))
```

In `accuracy()` we make predictions using the above method. Then check if the shape of the predictions matches that of the true labels, otherwise, we show an error message. After that we make sure that both predictions and the true labels have values of either 0 or 1 by a simple rule: if the value is >= 0.5 consider it a 1, otherwise a 0.   
To compute the accuracy, we check for equality between y and y_hat. This will return a vector of Boolean values. Then cast these Booleans to float (False becomes 0.0, and True becomes 1.0). Then, the accuracy is simply the mean of these values.   
```python
def accuracy(self, x, y):
    y_hat = self.predict(x)

    if y.shape != y_hat.shape:
        print('Error! Predictions don\'t have the same shape as given y')
        return

    zeros, ones = tf.zeros_like(y), tf.ones_like(y)
    y = tf.where(y >= 0.5, ones, zeros)
    y_hat = tf.where(y_hat >= 0.5, ones, zeros)

    return tf.math.reduce_mean(tf.cast(y == y_hat, tf.float32))
```

Here is the full code of the `LogisticRegression` class:

In [None]:
class LogisticRegression:
    EPS = 1e-5
    def __ols_solve(self, x, y):
        rows, cols = x.shape
        if rows >= cols == tf.linalg.matrix_rank(x):
            y = tf.math.maximum(self.EPS, tf.math.minimum(tf.cast(y, tf.float32), 1-self.EPS))
            ols_y = -tf.math.log(tf.math.divide(1, y) - 1)
            self.weights = tf.linalg.matmul(
                tf.linalg.matmul(
                    tf.linalg.inv(
                        tf.linalg.matmul(x, x, transpose_a=True)
                    ),
                    x, transpose_b=True),
                ols_y)
        else:
            print('Error! X has not full column rank.')
    
    def __sgd(self, x, y, loss_fn, learning_rate, iterations, batch_size):
        rows, cols = x.shape
        self.weights = tf.Variable(tf.random.normal(stddev=1.0/cols, shape=(cols, 1)))
        dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size)
        
        for i in range(iterations):
            dataset.shuffle(buffer_size=1024)
            for step, (xb, yb) in enumerate(dataset):
                with tf.GradientTape() as tape:
                    loss = loss_fn(xb, yb)
                grads = tape.gradient(loss, self.weights)
                self.weights.assign_sub(learning_rate*grads)
    
    def __sse_loss(self, xb, yb):
        yb = tf.math.maximum(self.EPS, tf.math.minimum(tf.cast(yb, tf.float32), 1-self.EPS))
        ols_yb = -tf.math.log(tf.math.divide(1, yb) - 1)
        
        diff = tf.linalg.matmul(xb, self.weights) - ols_yb
        loss = tf.linalg.matmul(diff, diff, transpose_a=True)
        
        return loss
    
    def __mle_loss(self, xb, yb):
        xw = tf.linalg.matmul(xb, self.weights)
        term1 = tf.linalg.matmul(tf.cast(1-yb, tf.float32), xw, transpose_a=True)
        term2 = tf.linalg.matmul(
            tf.ones_like(yb, tf.float32),
            tf.math.log(1+tf.math.exp(-xw)),
            transpose_a=True)
        return term1+term2
    
    def fit(self, x, y, method, learning_rate=0.001, iterations=500, batch_size=32):
        x = tf.concat([x, tf.ones_like(y, dtype=tf.float32)], axis=1)
        if method == "ols_solve":
            self.__ols_solve(x, y)
        elif method == "ols_sgd":
            self.__sgd(x, y, self.__sse_loss, learning_rate, iterations, batch_size)
        elif method == "mle_sgd":
            self.__sgd(x, y, self.__mle_loss, learning_rate, iterations, batch_size)
        else:
            print(f'Unknown method: \'{method}\'')
        
        return self
    
    def predict(self, x):
        if not hasattr(self, 'weights'):
            print('Cannot predict. You should call the .fit() method first.')
            return
        
        x = tf.concat([x, tf.ones((x.shape[0], 1), dtype=tf.float32)], axis=1)
        
        if x.shape[1] != self.weights.shape[0]:
            print(f'Shapes do not match. {x.shape[1]} != {self.weights.shape[0]}')
            return
        
        xw = tf.linalg.matmul(x, self.weights)
        return tf.math.divide(1, 1+tf.math.exp(-xw))
    
    def accuracy(self, x, y):
        y_hat = self.predict(x)
        
        if y.shape != y_hat.shape:
            print('Error! Predictions don\'t have the same shape as given y')
            return
        
        zeros, ones = tf.zeros_like(y), tf.ones_like(y)
        y = tf.where(y >= 0.5, ones, zeros)
        y_hat = tf.where(y_hat >= 0.5, ones, zeros)
        
        return tf.math.reduce_mean(tf.cast(y == y_hat, tf.float32))

Now we want to see how our `LogisticRegression` class performs on this heart disease dataset.

In [None]:
def print_acc(model):
    print(f'Train accuracy = {model.accuracy(x_train, y_train)} ; '+
          f'Test accuracy = {model.accuracy(x_test, y_test)}')

### Using 'ols_solve' method

In [None]:
lr_ols_solve = LogisticRegression().fit(x_train, y_train, 'ols_solve')
print_acc(lr_ols_solve)

### Using 'ols_sgd' method

In [None]:
lr_ols_sgd = LogisticRegression().fit(x_train, y_train, 'ols_sgd')
print_acc(lr_ols_sgd)

### Using 'mle_sgd' method

In [None]:
lr_mle_sgd = LogisticRegression().fit(x_train, y_train, 'mle_sgd')
print_acc(lr_mle_sgd)