**NOTE: This notebook is written for the Google Colab platform. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook.** 



In [None]:
#@title -- Installation of Packages -- { display-mode: "form" }
import sys
!{sys.executable} -m pip install git+https://github.com/michalgregor/class_utils.git

In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, KBinsDiscretizer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, mean_absolute_error
from torch.autograd import Variable
import matplotlib.pyplot as plt
import torch

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
from class_utils.download import download_file_maybe_extract
download_file_maybe_extract("https://www.dropbox.com/s/p5q7gzupa2ndw55/sigmoid_regression_data.csv?dl=1", directory="data")

# also create a directory for storing any outputs
import os
os.makedirs("output", exist_ok=True)

## Simgoid Regression using PyTorch's Autodiff

Having shown how autodiff work in PyTorch, we can now attempt to apply it to sigmoid regression, which we have already solved using symbolic gradients. We will start by loading the corresponding dataset from a CSV file.



In [None]:
#@title -- Loading and Preprocessing the Data: X_train, Y_train, X_test, Y_test -- { display-mode: "form" }
df = pd.read_csv("data/sigmoid_regression_data.csv")

kbins = KBinsDiscretizer(6, encode='ordinal')
y_stratify = kbins.fit_transform(df[['y']])

df_train, df_test = train_test_split(
    df, stratify=y_stratify, test_size=0.3, random_state=4)

plt.scatter(df_train['x'], df_train['y'], marker='x', label="training data")
plt.scatter(df_test['x'], df_test['y'], c='r', label="testing data")
plt.xlabel('x')
plt.ylabel('y')
plt.grid(ls='--')
plt.legend()

categorical_inputs = []
numeric_inputs = ['x']
output = 'y'

input_preproc = make_column_transformer(
    (make_pipeline(
        SimpleImputer(strategy="most_frequent"),
        OrdinalEncoder()),
     categorical_inputs),
    
    (make_pipeline(
        SimpleImputer(),
        StandardScaler()),
     numeric_inputs)
)

X_train = input_preproc.fit_transform(df_train[categorical_inputs+numeric_inputs])
Y_train = df_train[[output]].values

X_test = input_preproc.transform(df_test[categorical_inputs+numeric_inputs])
Y_test = df_test[[output]].values

As we know, PyTorch operates on tensors rather than plain array and so we need to wrap our data first.



In [None]:
X_train_t = torch.as_tensor(X_train)
Y_train_t = torch.as_tensor(Y_train)
X_test_t = torch.as_tensor(X_test)
Y_test_t = torch.as_tensor(Y_test)

### The Sigmoid Function

Let us recall that the sigmoid curve is defined as follows:
\begin{equation}
\sigma(x) = \frac{1}{1 + e^{-x}}.
\end{equation}

Since our sigmoid can be shifted or its steepness can change, we will pipe the input to the sigmoid through a linear transform and we will learn its parameters $a$ and $c$ from data. Our regression model will therefore look as follows:
\begin{align}
u &= ax + c \
\sigma(u) &= \frac{1}{1 + e^{-u}}.
\end{align}

Or if we fold it into a single function:
\begin{equation}
f(x, a, c) = \frac{1}{1 + e^{-ax - c}}
\end{equation}

Let us now define our regression model using PyTorch operations.



In [None]:
def sigmoid_model(X, a, c):
    return torch.sigmoid(X*a + c)

### The Loss and the Variables

Let us use the mean squared error as our loss function. We can define it as follows:



In [None]:
def compute_loss(Y, y):
    return ((y - Y)**2).mean()

We also need to create variables `a` and `c` (we wrap the tensors as variables because we are going to be updating them) and specify the learning rate.



In [None]:
a = Variable(torch.as_tensor(np.random.uniform(0, 1)), requires_grad=True)
c = Variable(torch.as_tensor(np.random.uniform(0, 1)), requires_grad=True)
learning_rate = 0.1

### Optimization using Gradient Descent

We can now write a loop that will optimize our model using gradient descent. Recall that the gradient can be computed simply by doing the forward run and then calling `backward()` on the output – the loss in our case.

We also need to make sure that:

* We stop tracking the gradients (using `with torch.no_grad():`) when we update parameters `a` and `c`: otherwise PyTorch will try to make this part of the computational graph too, which is going to fail.
* Zero the gradient of each variable out after each epoch. This is because gradients accumulate and the new gradients would just be added to those from the previous epoch otherwise.


In [None]:
for epoch in range(2500):
    y = sigmoid_model(X_train_t, a, c)
    loss = compute_loss(Y_train_t, y)
    loss.backward()
    
    with torch.no_grad():
        a -= learning_rate * a.grad
        c -= learning_rate * c.grad
        
    a.grad.zero_()
    c.grad.zero_()
    
    if epoch % 100 == 0:
        print("Epoch {}; loss {}.".format(epoch, loss.item()))

This gives us some values for `a` and `c`.



In [None]:
print("a = {}\nc = {}\nloss = {}".format(
    a.item(), c.item(), loss.item()))

Let's see what our regression curve is going to look like.



In [None]:
xx = torch.linspace(-5, 5, 100)
yy = torch.sigmoid(xx*a + c)

plt.plot(xx.detach().numpy(), yy.detach().numpy())
plt.xlabel('x')
plt.ylabel('y')
plt.grid(ls='--')

plt.scatter(X_train, Y_train)

### Using a Built-in Optimizer

Luckily, when using PyTorch, we do not have to write our own optimization procedures by hand. PyTorch has several of the best-known optimizers built in. If we wanted to use the `Adam` optimizer for instance, we would simply instantiate it with the tensors that it is supposed to update and run its `step()` method at each epoch. 

Naturally, gradients still need to be zeroed out at each epoch, which is now done using optimizer's `zero_grad()` method. Also, we do not have to define the mean squared error by hand either: PyTorch also has all the most common loss functions.



In [None]:
a = Variable(torch.as_tensor(np.random.uniform(0, 1)), requires_grad=True)
c = Variable(torch.as_tensor(np.random.uniform(0, 1)), requires_grad=True)
optimizer = torch.optim.Adam([a, c], lr=0.1)

In [None]:
for epoch in range(500):
    optimizer.zero_grad()
    
    y = sigmoid_model(X_train_t, a, c)
    loss = torch.nn.functional.mse_loss(Y_train_t, y)
    loss.backward()
    
    optimizer.step()
    
    if epoch % 100 == 0:
        print("Epoch {}; loss {}.".format(epoch, loss.item()))

In [None]:
print("a = {}\nc = {}\nloss = {}".format(
    a.item(), c.item(), loss.item()))

And we can again inspect our regression curve.



In [None]:
xx = torch.linspace(-5, 5, 100)
yy = torch.sigmoid(xx*a + c)

plt.plot(xx.detach().numpy(), yy.detach().numpy())
plt.xlabel('x')
plt.ylabel('y')
plt.grid(ls='--')

plt.scatter(X_train, Y_train)