Linear Regression: Squared Error

$J = \sum_n (t_n - y_n)^2$

Assumes Gaussian-distributed error, because log(Gaussian) = squared function.

Logistic regression error can't be Gaussian, because:
- Target is only 0 or 1
- Output is only a number between 0 and 1
- We want
    - $0$ cost if correct, $> 0$ if not correct, and bigger cost wheen more wrong.
    
    
- "Cross-entropy assumes the same kind of distribution you'd get with a coin toss"

### Cross-Entropy error

$J = -{t * log(y) + (1 - t) * log (1 - y)}$ (t = target, y = output of logistic)

- Only one of the two terms contribute for a given y.
- log y $\Rightarrow$ number between 0 and -inf, since y is between 0 and 1.
- E.g.:
    - $t = 1, y = 1 \Rightarrow 0$
    - $t = 0, y = 0 \Rightarrow 0$
    - $t = 1, y = 0.9 \Rightarrow 0.11$
    - $t = 1, y = 0.5 \Rightarrow 0.69$
    - $t = 1, y = 0.1 \Rightarrow 2.3$
- But wait, the $J$ above was for a single example. We want this for all predicted examples:
$J = - \sum_n {t_n * log(y_n) + (1 - t_n) * log (1 - y_n)}$

In [1]:
import numpy as np

N = 100
D = 2

# Gaussian centered at 0 (std = 1)
X = np.random.randn(N, D)

# Clever trick: Centering at (-2, -2)
X[:50, :] = X[:50, :] - 2 * np.ones((50, D))

# Centering at (2, 2)
X[50:, :] = X[50:, :] + 2 * np.ones((50, D))

# Array of targets
# First 50 at 0 and next 50 at 1
T = np.array([0]*50 + [1]*50).T 

ones = np.array([[1] * N]).T
Xb = np.concatenate((ones, X), axis=1)
w = np.random.randn(D + 1)

# Calculate the model output
z = Xb.dot(w)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

Y = sigmoid(z)

In [3]:
def cross_entropy(T, Y):
    return - (T * np.log(Y) + (1 - T) * np.log(1 - Y)).sum()

In [4]:
print(cross_entropy(T, Y))

331.950104435


In [5]:
def cross_entropy2(T, Y):
    E = 0
    for i in range(N):
        if T[i] == 1:
            E -= np.log(Y[i])
        else:
            E -= np.log(1 - Y[i])
    return E
print(cross_entropy2(T, Y))

331.950104435


In [6]:
# Use the closed form solution and see how well it does
# (works because both classes are Gaussian distributed and
#     variances = 1)
# So the wts depend only on the means

# Was calculated analytically using the closed form solution
w2 = np.array([0, 4, 4])

z2 = Xb.dot(w2)
Y2 = sigmoid(z2)
print(cross_entropy(T, Y2))
print(cross_entropy2(T, Y2))

0.0830137837387
0.0830137837387


In [None]:
# Notice how low the error is when using the closed form solution.