Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Fenchel-Young losses

This package implements loss functions useful for probabilistic classification. More specifically, it provides

  • drop-in replacements for PyTorch loss functions
  • drop-in replacements for TensorFlow loss functions
  • scikit-learn compatible classifiers

The package is based on the Fenchel-Young loss framework [1,2,3].

Tsallis losses

Notice from the center plot that sparsemax and Tsallis are able to produce exactly zero (sparse) probabilities unlike the logistic (softmax) loss.

Supported Fenchel-Young losses

  • Multinomial logistic loss
  • One-vs-all logistic loss
  • Sparsemax loss (sparse probabilities!)
  • Tsallis losses (sparse probabilities!)

Sparse means that some classes have exactly zero probability, i.e., these classes are irrelevant.

Tsallis losses are a family of losses parametrized by a positive real value α. They recover the multinomial logistic loss with α=1 and the sparsemax loss with α=2. Values of α between 1 and 2 enable to interpolate between the two losses.

In all losses above, the ground-truth can either be a n_samples 1d-array of label integers (each label should be between 0 and n_classes-1) or a n_samples x n_classes 2d-array of label proportions (each row should sum to 1).


scikit-learn compatible classifier:

import numpy as np
from sklearn.datasets import make_classification
from fyl_sklearn import FYClassifier

X, y = make_classification(n_samples=10, n_features=5, n_informative=3,
                           n_classes=3, random_state=0)
clf = FYClassifier(loss="sparsemax"), y)

Drop-in replacement for PyTorch losses:

import torch
from fyl_pytorch import SparsemaxLoss

# integers between 0 and n_classes-1, shape = n_samples
y_true = torch.tensor([0, 2])
# model scores, shapes = n_samples x n_classes
theta = torch.tensor([[-2.5, 1.2, 0.5],
                      [2.2, 0.8, -1.5]])
loss = SparsemaxLoss()
# loss value (caution: reversed convention compared to numpy and tensorflow)
print(loss(theta, y_true))
# predictions (probabilities) are stored for convenience
# can also recompute them from theta
# label proportions are also allowed
y_true = torch.tensor([[0.8, 0.2, 0],
                       [0.1, 0.2, 0.7]])
print(loss(theta, y_true))

Drop-in replacement for tensorflow losses:

import tensorflow as tf
from fyl_tensorflow import sparsemax_loss, sparsemax_predict

# integers between 0 and n_classes-1, shape = n_samples
y_true = tf.constant([0, 2])
# model scores, shapes = n_samples x n_classes
theta = tf.constant([[-2.5, 1.2, 0.5],
                     [2.2, 0.8, -1.5]])
# loss value
print(sparsemax_loss(y_true, theta))
# predictions (probabilities)
# label proportions are also allowed
y_true = tf.constant([[0.8, 0.2, 0],
                      [0.1, 0.2, 0.7]])
print(sparsemax_loss(y_true, theta))


The TensorFlow implementation requires the installation of TensorFlow-addons (<>) Simply copy relevant files to your project.


[1]SparseMAP: Differentiable Sparse Structured Inference. Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie. In Proc. of ICML 2018. [arXiv]
[2]Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms. Mathieu Blondel, André F. T. Martins, Vlad Niculae. In Proc. of AISTATS 2019. [arXiv]
[3]Learning with Fenchel-Young Losses. Mathieu Blondel, André F. T. Martins, Vlad Niculae. Preprint. [arXiv]


  • Mathieu Blondel, 2018


Probabilistic classification in PyTorch/TensorFlow/scikit-learn with Fenchel-Young losses







No releases published


No packages published