## Notebook for trying out error and accuracy functions

given a predicted and a target output 

Typical loss functions (also called “objective functions” or “scoring functions”) include:

    Binary cross-entropy
    Categorical cross-entropy
    Sparse categorical cross-entropy
    Mean Squared Error (MSE)
    Mean Absolute Error (MAE)
    Standard Hinge
    Squared Hinge

In [161]:
import numpy as np 

#### Kullback–Leibler divergence
vhttps://math.stackexchange.com/questions/4511868/gradient-of-kl-divergence

In [162]:
pred = np.array([[0., 0.8], [0.3, 0.], [1., 0.],  [0.5, 0.]])
tar  = np.array([[1., 0.], [0., 1.], [0., 1.], [1., 0.]])

epsilon = 1e-15  # Small constant to prevent log(0)
# Clip predicted probabilities to avoid log(0) or log(1)
pred = np.clip(pred, epsilon, 1 - epsilon)
tar = np.clip(tar, epsilon, 1 - epsilon)

error = np.log(pred/tar)+1

error

array([[-33.53877639,  35.31563284],
       [ 34.33480359, -33.53877639],
       [ 35.53877639, -33.53877639],
       [  0.30685282,   1.        ]])

#### My hinge

In [191]:
pred = np.array([[0., 0.8], [0.3, 0.], [1., 0.],  [0.5, 0.]])
tar  = np.array([[1., 0.], [0., 1.], [0., 1.], [1., 0.]])


pred = np.array([[0., 0.8, 0.], [0.3, 0., 0.], [0., 0., 0.5],  [0.5, 0., 0.]])
tar  = np.array([[0., 1., 0.], [0., 1., 0.], [1., 0., 0.], [0., 1., 0.]])


pred = np.where(pred == 0, -1, pred)
tar = np.where(tar == 0, -1, tar)

zeros = np.zeros_like(tar)

honk = np.concatenate((zeros, 1-tar*pred),axis=1)


# my hinge
error = np.sum(1-tar*pred, axis=1)
error

array([0.2, 3.3, 3.5, 3.5])

#### Hinge

"It is intended for use with binary classification where the target values are in the set {-1, 1}.

The hinge loss function encourages examples to have the correct sign, assigning more error when there is a difference in the sign between the actual and predicted class values." 
(https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/)

In [192]:
pred = np.array([[0., 0.8], [0.3, 0.], [1., 0.],  [0.5, 0.]])
tar  = np.array([[1., 0.], [0., 1.], [0., 1.], [1., 0.]])

pred = np.where(pred == 0, -1, pred)
tar = np.where(tar == 0, -1, tar)

zeros = np.zeros_like(tar)

honk = np.concatenate((zeros, 1-tar*pred),axis=1)


# og hinge 
error = np.max(honk, axis=1)

error


array([2. , 2. , 2. , 0.5])




❌ does care when entirely wrong class is activated but is not reflected in sign of error AND does not take into the 0.5 into account when: out = [0.5 0] and tar = [0 1]

#### Mean square 

In [193]:
pred = np.array([[0., 0.8], [0.9, 0.], [0., 0.5],  [0.5, 0.]])
tar  = np.array([[0., 1.], [0., 1.], [1., 0.], [1., 0.]])

print(np.sum( tar - pred  , axis=1))

[0.2 0.1 0.5 0.5]


❌ error the same even though entirely wrong class was activated 

#### Cross Entropy Loss

In [194]:
pred = np.array([[0., 0.8], [0.3, 0.], [0., 0.5],  [0.5, 0.]])
tar  = np.array([[0., 1.], [0., 1.], [1., 0.], [1., 0.]])

pred = np.array([[0., 0.8, 0.], [0.3, 0., 0.], [0., 0., 0.5],  [0.5, 0., 0.]])
tar  = np.array([[0., 1., 0.], [0., 1., 0.], [1., 0., 0.], [0., 1., 0.]])

epsilon = 1e-15  # Small constant to prevent log(0)

# Clip predicted probabilities to avoid log(0) or log(1)
pred = np.clip(pred, epsilon, 1 - epsilon)
tar = np.clip(tar, epsilon, 1 - epsilon)
losses = []
for p,t in zip(pred,tar):
    # Calculate cross-entropy loss
    print("here")
    print (t * np.log(p))# + (1 - t) * np.log(1 - p))
    loss = - np.sum(t * np.log(p))# + (1 - t) * np.log(1 - p))
   # print("loss", loss)
    # Normalize by the number of examples
    num_examples = len(tar)
   # print("num", num_examples)
    loss /= num_examples
    losses.append(loss)


def cross_entropy_loss(p, t):
    # Calculate cross-entropy loss
    
    return - np.sum(t * np.log(p), axis=1) / len(t)


def cross_entropy_loss_prime(p,t):
    # https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/
    return  np.sum(p- t, axis=1) / len(t)


print(losses)
print(cross_entropy_loss(pred,tar))
print(cross_entropy_loss_prime(pred,tar))

here
[-3.45387764e-14 -2.23143551e-01 -3.45387764e-14]
here
[-1.20397280e-15 -3.45387764e+01 -3.45387764e-14]
here
[-3.45387764e+01 -3.45387764e-14 -6.93147181e-16]
here
[-6.93147181e-16 -3.45387764e+01 -3.45387764e-14]
[0.055785887828569636, 8.634694098727671, 8.634694098727671, 8.634694098727671]
[0.05578589 8.6346941  8.6346941  8.6346941 ]
[-0.05  -0.175 -0.125 -0.125]


In [195]:

pred = np.array([[0., 0.8, 0.], [0.3, 0., 0.], [0., 0., 0.5]])
tar  = np.array([[0., 1., 0.], [0., 0.3, 0.], [1., 0., 0.]])

epsilon = 1e-15  # Small constant to prevent log(0)

# Clip predicted probabilities to avoid log(0) or log(1)
pred = np.clip(pred, epsilon, 1 - epsilon)
tar = np.clip(tar, epsilon, 1 - epsilon)
losses = []
for cIDX in pred[0]: # for class in classses 
    for p,t in zip(pred,tar):
        # Calculate cross-entropy loss
        loss = - np.sum(t * np.log(p) + (1 - t) * np.log(1 - p))
    # print("loss", loss)
        # Normalize by the number of examples
        num_examples = len(tar)
    # print("num", num_examples)
        loss /= num_examples
        losses.append(loss)
print(losses)

[0.07438118377142738, 3.572769287470658, 11.743974525156878, 0.07438118377142738, 3.572769287470658, 11.743974525156878, 0.07438118377142738, 3.572769287470658, 11.743974525156878]


❌ does not take into the 0.3 into account when: out = [0.3 0] and tar = [0 1]

if target is zero but output is still activated to some degree it gets ignored.

"Cross-entropy with one-hot encoding implies that the target vector is all 0, except for one 1. So all of the zero entries are ignored and only the entry with 1 is used for updates. You can see this directly from the loss, since 0×log(something positive)=0, implying that only the predicted probability associated with the label influences the value of the loss" (https://stats.stackexchange.com/questions/377966/cross-entropy-loss-for-one-hot-encoding)

#### Coefficient of Determination

In [196]:
pred = np.array([[1., 0.], [1., 0.], [0., 1.]])
tar  = np.array([[0., 1.], [0., 1.], [1., 0.]])

corr_matrix = np.corrcoef(tar, pred)
corr = corr_matrix[0,1]
R_sq = corr**2
 
print(R_sq)

0.9999999999999998


In [197]:
pred = np.array([[0., 1.], [0., 1.], [1., 0.]])
tar  = np.array([[0., 1.], [0., 1.], [1., 0.]])

corr_matrix = np.corrcoef(tar, pred)
corr = corr_matrix[0,1]
R_sq = corr**2
 
print(R_sq)

0.9999999999999998


❌ does not take one-hot into account at all

In [198]:
prediction = np.array([[3.5409444e-09, 0.0000000e+00],
                        [3.6693652e-06, 0.0000000e+00],
                        [0.0000000e+00, 9.3634579e-01],
                        [3.8932901e-09, 0.0000000e+00],
                        [4.0344894e-06, 0.0000000e+00],
                        [4.7976482e-01, 0.0000000e+00],
                        [2.8737641e-09, 0.0000000e+00],
                        [2.9779881e-06, 0.0000000e+00],
                        [3.5413006e-01, 0.0000000e+00]])

prediction = np.array([[3.5409444e-09, 0.0000000e+00],
                        [3.6693652e-09, 0.0000000e+00],
                        [0.0000000e+00, 9.9993634579e-01],
                        [3.8932901e-09, 0.0000000e+00],
                        [4.0344894e-06, 0.0000000e+00],
                        [4.7976482e-09, 0.0000000e+00],
                        [2.8737641e-09, 0.0000000e+00],
                        [2.9779881e-09, 0.0000000e+00],
                        [3.5413006e-09, 0.0000000e+00]])

target = np.array([[0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.],
                    [0., 1.]])


sample_test_accuracy =  target == np.round(prediction, 0)
print(sample_test_accuracy)
sample_test_accuracy = np.mean(sample_test_accuracy)
print(sample_test_accuracy)

[[ True False]
 [ True False]
 [ True  True]
 [ True False]
 [ True False]
 [ True False]
 [ True False]
 [ True False]
 [ True False]]
0.5555555555555556
