### LightGBM with Focal Loss for Multiclass classification problems

Let me show how to adapt the Focal Loss implementation for binary classification to a multiclass classification problem.

The idea is to face the problem using the Binary Cross Entropy With Logits (borrowing from `Pytorch` notation `BCEWithLogitsLoss`). 

$$
loss = -[y_{\text true} \cdot log\sigma(x) + (1-y_{\text true}) \cdot log(1-\sigma(x))] 
$$

Where $\sigma$ is the sigmoid function

For example, let's assume we have a problem with 10 classes and we have two samples/observations

In [1]:
import numpy as np

y_true = np.random.choice(11, (1,2))
# from -2 to 2 to illustrate the fact the preds coming from lightGBM when using custom losses are NOT probs
y_pred = np.random.uniform(low=-2, high=2, size=(2, 10))

In [2]:
# labels
y_true

array([[6, 8]])

In [3]:
#Â predictions
y_pred

array([[ 1.29824142, -1.3932109 , -0.3560161 , -1.83911858,  1.25744599,
         1.6930721 , -0.41016591, -0.76641368,  0.91205306,  1.38041321],
       [ 1.49556617, -1.1040521 ,  0.5010648 , -1.1783269 , -0.63015764,
        -0.56899891, -0.76107954, -1.66164642, -0.23192115, -1.93675266]])

In [4]:
def sigmoid(x): return 1./(1. +  np.exp(-x))

In [5]:
# labels one-hot encoded
y_true_oh = np.eye(10)[y_true][0]

In [6]:
y_true_oh

array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

In [7]:
# BCEWithLogitsLoss
( -( y_true_oh * np.log(sigmoid(y_pred)) + (1-y_true_oh) * np.log(1-sigmoid(y_pred)) ) ).mean()

0.7787502349207979

### Multiclass Focal Loss 

In [8]:
def focal_loss_lgb(y_pred, dtrain, alpha, gamma, num_class):
    """
    Focal Loss for lightgbm

    Parameters:
    -----------
    y_pred: numpy.ndarray
        array with the predictions
    dtrain: lightgbm.Dataset
    alpha, gamma: float
        See original paper https://arxiv.org/pdf/1708.02002.pdf
    num_class: int
        number of classes
    """
    a,g = alpha, gamma
    y_true = dtrain.label
    # N observations x num_class arrays
    y_true = np.eye(num_class)[y_true.astype('int')]
    y_pred = y_pred.reshape(-1,num_class)
    # alpha and gamma multiplicative factors with BCEWithLogitsLoss
    def fl(x,t):
        p = 1/(1+np.exp(-x))
        return -( a*t + (1-a)*(1-t) ) * (( 1 - ( t*p + (1-t)*(1-p)) )**g) * ( t*np.log(p)+(1-t)*np.log(1-p) )
    partial_fl = lambda x: fl(x, y_true)
    grad = derivative(partial_fl, y_pred, n=1, dx=1e-6)
    hess = derivative(partial_fl, y_pred, n=2, dx=1e-6)
    # flatten in column-major (Fortran-style) order
    return grad.flatten('F'), hess.flatten('F')

And that's it really. Now one would want/need the corresponding evalulation function.

In [10]:
def focal_loss_lgb_eval_error(y_pred, dtrain, alpha, gamma, num_class):
    """
    Focal Loss for lightgbm

    Parameters:
    -----------
    y_pred: numpy.ndarray
        array with the predictions
    dtrain: lightgbm.Dataset
    alpha, gamma: float
        See original paper https://arxiv.org/pdf/1708.02002.pdf
    num_class: int
        number of classes
    """
    a,g = alpha, gamma
    y_true = dtrain.label
    y_true = np.eye(num_class)[y_true.astype('int')]
    y_pred = y_pred.reshape(-1,num_class)
    p = 1/(1+np.exp(-y_pred))
    loss = -( a*y_true + (1-a)*(1-y_true) ) * (( 1 - ( y_true*p + (1-y_true)*(1-p)) )**g) * ( y_true*np.log(p)+(1-y_true)*np.log(1-p) )
    # a variant can be np.sum(loss)/num_class
    return 'focal_loss', np.mean(loss), False

### EXAMPLE

In [12]:
import numpy as np
import lightgbm as lgb

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import  accuracy_score
from scipy.misc import derivative

# very inadequate dataset as is perfectly balanced, but just to illustrate
iris = datasets.load_iris()
X_org = iris.data
y_org = iris.target

# shuffle...makes me feel good
x = np.hstack([X_org,y_org.reshape(-1, 1)])
np.random.shuffle(x)

X = x[:, :4]
y = x[:, 4]

This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.


In [13]:
X_tr, X_val, y_tr, y_val = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)
lgbtrain = lgb.Dataset(X_tr, y_tr, free_raw_data=True)
lgbeval = lgb.Dataset(X_val, y_val)

In [19]:
focal_loss = lambda x,y: focal_loss_lgb(x, y, 0.25, 2., 3)
eval_error = lambda x,y: focal_loss_lgb_eval_error(x, y, 0.25, 2., 3)
params  = {'learning_rate':0.001, 'num_boost_round':5, 'num_class':3}
# model = lgb.train(params, lgbtrain, fobj=focal_loss)
model = lgb.train(params, lgbtrain, valid_sets=[lgbeval], fobj=focal_loss, feval=eval_error)

[1]	valid_0's focal_loss: 0.101055
[2]	valid_0's focal_loss: 0.101026
[3]	valid_0's focal_loss: 0.100997
[4]	valid_0's focal_loss: 0.100968
[5]	valid_0's focal_loss: 0.100939


In [20]:
accuracy_score(y_val, np.argmax(sigmoid(model.predict(X_val)), axis=1))

0.9333333333333333