# Cross-entropy loss function
I'd like to implement a method to compute the loss function for a given set of training examples and coefficients.  
The method will be defined as `def crossEntropyLoss(X_train, y_train, coef)` where  
- `X_train` is a 2d array of input variables corresponding to the design matrix of the training set.
- `y_train` is a 1d array containing the output values of the examples.
- `coef` is a 1d array containing the coefficients $\mathbf{w}$.

I'll start by defining a scalar product function to compute each $\mathbf{w}^T\mathbf{x}^{(i)}$.

In [1]:
def scalar_product(a, b):
    return sum(x * y for x, y in zip(a, b))

a = [1,2,3]
b = [4,5,6]
scalar_product(a, b) # 1*4 + 2*5 + 3*6 = 32

32

Next, for a logistic regression model, we have,
$$\text{logit}(p(1|\mathbf{x}^{(i)}, \mathbf{w})) = \mathbf{w}^T\mathbf{x}^{(i)}$$
$$\iff p(1|\mathbf{x}^{(i)}, \mathbf{w}) = \frac{e^{\mathbf{w}^T\mathbf{x}^{(i)}}}{1 + e^{\mathbf{w}^T\mathbf{x}^{(i)}}} $$
$$\iff p(0|\mathbf{x}^{(i)}, \mathbf{w}) = 1 - p(1|\mathbf{x}^{(i)}, \mathbf{w}) = \frac{1}{1 + e^{\mathbf{w}^T\mathbf{x}^{(i)}}} $$
So I will create a sigmoid function $\sigma(\theta) \coloneqq \frac{e^\theta}{1 + e^\theta} $ to calculate probabilities.


In [2]:
from math import e
def sigmoid_function(theta):
    return (e ** theta)/(1+ e**theta)

sigmoid_function(0) # = 0.5

0.5

Finally, the loss function is defined as
$$ E(\mathbf{w}) = -\sum_{i = 1}^{N}{\mathcal{L}(\mathbf{w})} = -\sum_{i = 1}^{N}{\ln(p_{y^{(i)}})}, $$
where $p_{y^{(i)}} = p(y^{(i)}| \mathbf{x}^{(i)}, \mathbf{w})$, $y^{(i)} \in \{0, 1\}$  
This is equivalent to
$$ E(\mathbf{w}) = -\sum_{i = 1}^{N}{ y^{(i)} \ln(p(1| \mathbf{x}^{(i)}, \mathbf{w})) + (1-y^{(i)}) \ln(1 - p(1| \mathbf{x}^{(i)}, \mathbf{w}))}$$
I will build the function in this manner.

In [3]:
from math import log
def crossEntropyLoss(X_train, y_train, coeff):
    total = 0
    N = len(X_train)
    for i in range(N):
        p_1 = sigmoid_function(scalar_product(X_train[i], coeff))
        total += y_train[i] * log(p_1) + (1 - y_train[i]) * log(1-p_1)
    return -1 * total

Given some training data, we can test this function. For weights $\mathbf{w} = \mathbf{0}$, we can see that
$$ \forall \mathbf{X} = (\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots,  \mathbf{x}^{(N)})^T, \mathcal{y} = \begin{bmatrix} 1 \\0 \\1 \end{bmatrix} \implies E(\mathbf{0}) = -3\ln(\frac{1}{2}) \approx 2.0794 $$


In [4]:
X_train = [[1, 0.1, 0.1], [1, 0.1, 0.2], [1, 0.2, 0.2]]
y_train = [1, 0, 1]
coeff = [0, 0, 0]
crossEntropyLoss(X_train, y_train, coeff)

2.0794415416798357

That checks out, so we can use this to go about finding a minimal value for $E(\mathbf{w})$ by varying $\mathbf{w}$ in the future. Here is the condensed code, and another test for weights $\mathbf{w} = \begin{bmatrix} 0.1 \\0.2 \\0.3 \end{bmatrix}$.


In [5]:
from math import e, log
def crossEntropyLoss(X_train, y_train, coeff):
    total = 0
    N = len(X_train)
    for i in range(N):
        wTx = sum(x * y for x, y in zip(X_train[i], coeff))
        p_1 = (e ** wTx)/(1+ e**wTx)
        total += y_train[i] * log(p_1) + (1 - y_train[i]) * log(1-p_1)
    return -1 * total

X_train = [[1, 0.1, 0.1], [1, 0.1, 0.2], [1, 0.2, 0.2]]
y_train = [1, 0, 1]
coeff = [0.1, 0.2, 0.3]
crossEntropyLoss(X_train, y_train, coeff)

2.006287642011906