# Binary cross-entropy loss function

The binary cross-entropy function, which will be used throughout the analysis, is defined as follows:

$$
L(y, z) = -y \log(\sigma(z)) - (1 - y) \log(1 - \sigma(z)),
$$
where $\sigma(z) = \frac{1}{1+e^{-z}}$ is called a *sigmoid* function.

We will usually treat this function as a function of $z$ having $y$ as parameter:
$$
L(y, z) = g_y(z)
$$

We start off by calculating derivative of $g$:
$$
g_y'(z) = \frac{\partial}{\partial z}{L(y, z)} = -y\frac{\sigma'(z)}{\sigma(z)} - (1-y)\frac{-\sigma'(z)}{1-\sigma(z)} = -y\frac{\sigma'(z)}{\sigma(z)} + (1-y)\frac{\sigma'(z)}{1-\sigma(z)}
$$
We have:
$$\frac{\sigma'(z)}{\sigma(z)} = \frac{-\frac{-e^{-z}}{(1+e^{-z})^2}}{\frac{1}{1+e^{-z}}} = \frac{e^{-z}}{1+e^{-z}} = \frac{1}{1+e^z} = \sigma(-z)
$$
and also:
$$\frac{\sigma'(z)}{1-\sigma(z)} = \frac{-\frac{-e^{-z}}{(1+e^{-z})^2}}{1-\frac{1}{1+e^{-z}}} = \frac{\frac{e^{-z}}{(1+e^{-z})^2}}{\frac{e^{-z}}{1+e^{-z}}} = \frac{1}{1+e^{-z}} = \sigma(z)$$
Therefore we obtain the following result:
$$
g_y'(z) = \frac{\partial}{\partial z}{L(y, z)} = -y\frac{\sigma'(z)}{\sigma(z)} - (1-y)\frac{-\sigma'(z)}{1-\sigma(z)} = -y\sigma(-z) + (1-y)\sigma(z) = \sigma(z) - y(\sigma(z) + \sigma(-z))
$$
Our aim is to find:
$$
\lambda^* = \arg\min_{\lambda} \sum_{i=1}^{n} L(y_i, \lambda).
$$
Keeping in mind that $\sum_{i=1}^{n}{y_i} = m$, we can write:
$$
\frac{\partial}{\partial \lambda}\sum_{i=1}^{n} L(y_i, \lambda) = \sum_{i=1}^{n}{g_{y_i}'(\lambda)} = \sum_{i=1}^{n}{\left(\sigma(\lambda) - y_i(\sigma(\lambda) + \sigma(-\lambda))\right)} = n\sigma(\lambda) + m(\sigma(\lambda) + \sigma(-\lambda)) = k\sigma(\lambda) - m\sigma(-\lambda)
$$
We therefore have to find $\lambda^*$ such that:
$$k\sigma(\lambda^*) - m\sigma(-\lambda^*) = 0$$
$$k\sigma(\lambda^*) = m\sigma(-\lambda^*)$$
$$\frac{\sigma(\lambda^*)}{\sigma(-\lambda^*)} = \frac{m}{k}$$
$$\frac{\frac{1}{1 + e^{-\lambda^*}}}{\frac{1}{1 + e^{\lambda^*}}} = \frac{m}{k}$$
$$\frac{1 + e^{\lambda^*}}{1 + e^{-\lambda^*}} = \frac{m}{k}$$
$$\frac{e^{\lambda^*} + e^{2\lambda^*}}{1 + e^{\lambda^*}} = \frac{m}{k}$$
Setting $u = e^{\lambda^*}$, we obtain a quadratic equation:
$$u^2 + u\left(1-\frac{m}{k}\right) - \frac{m}{k} = 0$$

In [None]:
import numpy as np
from scipy.optimize import newton

def sigma(z: np.ndarray) -> np.ndarray:
    return 1./(1.+np.exp(-z))

def L(z: np.ndarray, y: np.ndarray) -> np.ndarray:
    return -y * np.log(sigma(z)) - (1. - y) * np.log(1. - sigma(z))

def L_prime(z: np.ndarray, y: np.ndarray) -> np.ndarray:
    return sigma(z) - y * (sigma(z) + sigma(-z))

In [30]:
np.random.seed(0)

n = 1000 # number of 0-1 labels
y = np.random.binomial(1, 0.1, n)
m = np.sum(y)
k = n - m
print(f'Number of "1": {m}, number of "0": {k}, total number of labels: {m+k}.')
f = np.random.normal(0, 1, n)

Number of "1": 108, number of "0": 892, total number of labels: 1000.


# Scenario A

In [40]:
newton(lambda x: k*sigma(x) - m*sigma(-x), x0 = 0.0)

-2.1113349054557897