<a href="https://www.kaggle.com/code/yno3fm36xqnnc8/logistic-regression-the-jax-way?scriptVersionId=139391077" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## Logistic Regression the JAX Way

In [1]:
import pandas as pd
import jax
import jax.numpy as jnp
from collections import namedtuple

The goal of this notebook is to demonstrate how to do logistic regression with the JAX library. I'm sure there are other, better ways to do this, but this is a good start. For this example, let's use the Titanic dataset found on Kaggle. 

In [2]:
test_data = pd.read_csv("/kaggle/input/titanic/test.csv")
train_data = pd.read_csv("/kaggle/input/titanic/train.csv")

pclasses = train_data['Pclass']
pclasses = jnp.array(pclasses).reshape((-1, 1))

survived = train_data['Survived']
survived = jnp.array(survived).reshape((-1, 1))

Our logistic regression model requires two parameters, the weight $w$ and the bias $b$. We will regress over a single feature: passenger class number.

In [3]:
LogisticRegressionParams = namedtuple('LogisticRegressionParams', 'w b')
model_params = LogisticRegressionParams(jnp.zeros([1, 2]), jnp.zeros([2]))

First we define the `predict` function. It takes the model parameters `params` and some regressors `x`, and uses them to create a single prediction. The model implemented here uses the `softmax` function to determine a probability distribution over the two possible states, $0$ (dead) and $1$ (living). Specifically, the value $${z} = {w}{x} + {b}$$ is computed. In our case, this leaves us with a two-dimensional vector which is fed to the softmax function, mapping $(u,v)$ to $(\frac{\exp{u}}{\exp{u} +\exp{v}}, \frac{\exp{v}}{\exp{u} +\exp{v}})$.

Observe the `@jax.jit` decorator. This tells JAX to just-in-time compile our prediction function. Not all functions can be jitted, see [this](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html#why-can-t-we-just-jit-everything) for more.

In [4]:
@jax.jit
def predict(params: LogisticRegressionParams, x: jnp.array):
    z = params.w.transpose() @ x + params.b
    return jax.nn.softmax(z)

The `vpredict` function shows how we can vectorize functions with JAX, allowing us to compute predictions in batches.

In [5]:
@jax.jit
def vpredict(params, regressors):
    f = jax.vmap(predict, in_axes=(None, 0))
    return f(params, regressors)

The `nll` function computes the negative log-likelihood of the data as a function of `params`. That is, it computes the probability of the observed data given the model. The `take_along_axis` function is used to index the predictions, retrieving the probability of the particular occurence. Finally the mean is taken across the whole batch.

In [6]:
@jax.jit
def nll(params: LogisticRegressionParams, regressors: jnp.array, labels: jnp.array):
    probs = vpredict(params, regressors)
    log_probs = jnp.log(probs)
    return -jnp.take_along_axis(log_probs, labels, 1).mean()

Now it's time to train the model. This is accomplished by gradient descent. The gradient of the negative log-likelihood function `nll` is determined using `jax.grad`. The gradient over the entire training set is computed and the model parameters are updated according to the rule
$$
p \leftarrow p - \eta \nabla{\ell},
$$
where $\eta$ is the learning rate (set here to $0.01$).

In [7]:
learning_rate = 1e-2
loss_grad_fn = jax.grad(nll)
for i in range(1_000):
    if i % 100 == 0:
        print(nll(model_params, pclasses, survived))
    grads = loss_grad_fn(model_params, pclasses, survived)
    model_params = LogisticRegressionParams(model_params.w - learning_rate * grads[0], model_params.b - learning_rate * grads[1])

0.6931472
0.6381312
0.6353234
0.63285196
0.6306094
0.6285747
0.62672895
0.6250547
0.6235363
0.62215906


Now that the model is trained, let's see how well it performs.

In [8]:
preds = vpredict(model_params, pclasses).argmax(axis=1).reshape((-1, 1))
accuracy = 1.0 - abs(preds-survived).mean()
print(f"accuracy is {100.0*accuracy:.2f}%")

accuracy is 67.90%


Finally, we generate predictions and save it to `/kaggle/working/submission.csv`.

In [9]:
pclasses = test_data['Pclass']
pclasses = jnp.array(pclasses).reshape((-1, 1))
test_preds = vpredict(model_params, pclasses).argmax(axis=1).reshape((-1, 1))
test_data['Survived'] = test_preds
test_data['Survived'] = test_data['Survived'].apply(lambda x: int(x))
test_data[['PassengerId', 'Survived']].set_index("PassengerId").to_csv("/kaggle/working/submission.csv")
test_data

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,0
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S,0
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,0
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,0
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,0
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S,0
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C,1
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S,0
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S,0
