Skip to content

Commit

Permalink
DOC Expand multilabel in decision function in glossary (scikit-learn#…
Browse files Browse the repository at this point in the history
  • Loading branch information
lucyleeow authored and glemaitre committed Sep 12, 2022
1 parent 8925933 commit 6558e24
Showing 1 changed file with 13 additions and 12 deletions.
25 changes: 13 additions & 12 deletions doc/glossary.rst
Expand Up @@ -1177,18 +1177,19 @@ Methods
predicted class. Columns are ordered according to
:term:`classes_`.
multilabel classification
Scikit-learn is inconsistent in its representation of multilabel
decision functions. Some estimators represent it like multiclass
multioutput, i.e. a list of 2d arrays, each with two columns. Others
represent it with a single 2d array, whose columns correspond to
the individual binary classification decisions. The latter
representation is ambiguously identical to the multiclass
classification format, though its semantics differ: it should be
interpreted, like in the binary case, by thresholding at 0.

TODO: `This gist
<https://gist.github.com/jnothman/4807b1b0266613c20ba4d1f88d0f8cf5>`_
highlights the use of the different formats for multilabel.
Scikit-learn is inconsistent in its representation of :term:`multilabel`
decision functions. It may be represented one of two ways:

- List of 2d arrays, each array of shape: (`n_samples`, 2), like in
multiclass multioutput. List is of length `n_labels`.

- Single 2d array of shape (`n_samples`, `n_labels`), with each
'column' in the array corresponding to the individual binary
classification decisions. This is identical to the
multiclass classification format, though its semantics differ: it
should be interpreted, like in the binary case, by thresholding at
0.

multioutput classification
A list of 2d arrays, corresponding to each multiclass decision
function.
Expand Down

0 comments on commit 6558e24

Please sign in to comment.