# Multiclass logarithmic loss

A quick look into the behavior and definition of the logloss as used in the Kaggle Tabular Data competition of May 2021. The logloss uses natural logs rather than log2 (as in https://en.wikipedia.org/wiki/Cross_entropy). In addition to that an epsilon value is used to prevent returning infinite loss values. The definition used in the competition is exactly the one from sklearn. The Tensorflows implementation is similar (with a higher epsilon), as is the one in LightGMB (https://github.com/microsoft/LightGBM/blob/master/src/metric/multiclass_metric.hpp).

# Loading data

In [53]:
import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train['target']

0        Class_2
1        Class_1
2        Class_1
3        Class_4
4        Class_2
          ...   
99995    Class_1
99996    Class_2
99997    Class_3
99998    Class_2
99999    Class_3
Name: target, Length: 100000, dtype: object

In [6]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(train['target'].unique())
le.classes_

array(['Class_1', 'Class_2', 'Class_3', 'Class_4'], dtype=object)

In [7]:
target = le.transform(train['target'])
target

array([1, 0, 0, ..., 2, 1, 2])

# Submission

In [71]:
def submit(test_pred, filename):
    submission = pd.DataFrame(test_pred, columns=le.classes_)
    submission.insert(0, 'id', test['id'])
    submission.to_csv(filename, index=False)

# Baseline model: majority class

In [72]:
from sklearn.metrics import log_loss

In [73]:
import numpy as np
labels, frequencies = np.unique(target, return_counts=True)
(labels, frequencies)

(array([0, 1, 2, 3]), array([ 8490, 57497, 21420, 12593]))

In [74]:
frequencies/len(train)

array([0.0849 , 0.57497, 0.2142 , 0.12593])

Calculated value:
* y = class 1 (57.5% of cases): logloss = 0
* y != class 1 (42.5% of cases): logloss = log(10^-15) = -34.5

In [75]:
pred = [[0.0,1.0,0.0,0.0]]*len(train)

In [76]:
import math
print(f'logloss:{-math.log(10**-15)*(1-frequencies[1]/len(train))}')

logloss:14.68001613112889


Calculate with sklearn

In [77]:
log_loss(target, pred, labels=[0,1,2,3])

14.68001613112889

Calculate with tensorflow, uses a non-configurable(?) epsilon of 10^-7.

In [78]:
import math
print(f'logloss:{-math.log(10**-7)*(1-frequencies[1]/len(train))}')

logloss:6.850674194526815


In [79]:
from tensorflow.keras.losses import SparseCategoricalCrossentropy
scc = SparseCategoricalCrossentropy()
scc(target, pred).numpy()

6.8506813

## Check with leaderboard

In [80]:
test_pred = [[0.0,1.0,0.0,0.0]]*len(test)
submit(test_pred, 'baseline_majority.csv')

Score is 14.62209, seems to match

# Baseline model: a priori probabilities

Calculated value:
* y = class 0 (8.49% of cases): logloss=log(0.0849)=-2.46
* y = class 1 (57.5% of cases): logloss = log(0.57497)=-0.55
* etc

This is the best single-point estimator.

In [81]:
pred = frequencies/len(train)

In [82]:
print(f'logloss:{-np.sum(pred * np.log(pred))}')

logloss:1.118576829482654


In [83]:
pred = np.tile(frequencies/len(train), (len(train),1))
pred

array([[0.0849 , 0.57497, 0.2142 , 0.12593],
       [0.0849 , 0.57497, 0.2142 , 0.12593],
       [0.0849 , 0.57497, 0.2142 , 0.12593],
       ...,
       [0.0849 , 0.57497, 0.2142 , 0.12593],
       [0.0849 , 0.57497, 0.2142 , 0.12593],
       [0.0849 , 0.57497, 0.2142 , 0.12593]])

In [84]:
log_loss(target, pred, labels=[0,1,2,3])

1.1185768294826535

In [85]:
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.losses import Reduction
scc = SparseCategoricalCrossentropy()
scc(target, pred).numpy()

1.1185764074325562

## Check with leaderboard

In [86]:
test_pred = pred = np.tile(frequencies/len(test), (len(test),1))
submit(test_pred, 'baseline_apriori.csv')