-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ignore_class to sparse crossentropy and IoU #16712
Conversation
f1cdbfb
to
69eddd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Wouldn't ignore_class
be a more explicit / precise name for this argument?
I named it after torch's, hopping that people would make the association transparently. Furthermore, I think caffe called it
Should I use the same parameter order in the modules (losses, metrics)?
I fixed it. Let me know if it still needs improvement. |
49dce10
to
b3ab864
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
Some pixels in the segmentation map were ignored, which means cross-entropy values weren't computed for them (I only pass valid pixels for Here is an example with a batch of one sample, that is a segmentation map of size (2, 2) containing up to three classes: y_true = [
[[ 0, 2],
[-1, -1]]]
y_pred = [
[[[1.0, 0.0, 0.0], [0.0, 0.5, 0.5]],
[[0.2, 0.5, 0.3], [0.0, 1.0, 0.0]]]] If we were to ignore valid_mask = [[ # L5621
[True, True],
[False, False],
]]
target = [0, 2] # L5622 (select valid pixel labels)
output = tf.math.log(y_pred) # L5585, not from_logits
output = [
[ 0. , -inf, -inf],
[ -inf, -0.6931, -0.6931]] # L5623 (select probabilities associated with valid pixels)
res = [0., 0.6931] # L5631 (tf.nn.softmax_crossentropy...)
res = [
[[ 0., 0.6931],
[ 0., 0. ]]] # L5639 (reconstruct samples with tf.scatter_nd)
res = [0.3466] # L5647 (average amongst valid pixels -- the two in the top row)
For reference, pytorch works similarly (implementation at aten/src/ATen/native/LossNLL.cpp):
This "issue" isn't as apparent because the low level loss function is also responsible for doing the reduction: >>> y_true = torch.tensor([
... [[ 0, 2],
... [-1, -1]]])
>>> y_pred = torch.from_numpy(np.asarray([
... [[1.0, 0.0, 0.0], [0.0, 0.5, .5]],
... [[0.2, 0.5, 0.3], [0.0, 1.0, 0.0]]]).transpose((2, 0, 1))[np.newaxis, ...])
>>>
>>> loss=torch.nn.CrossEntropyLoss(ignore_index=-1)
>>> loss(torch.log(y_pred), y_true)
tensor(0.3466, dtype=torch.float64)
>>> loss=torch.nn.CrossEntropyLoss(ignore_index=-1, reduction='none')
>>> loss(torch.log(y_pred), y_true)
tensor([[[-0.0000, 0.6931],
[ 0.0000, 0.0000]]], dtype=torch.float64) |
I understand that the batch (first axis) should not be reduced in distributed environments. However, considering that we don't split a single sample in multiple replicas and that the default reduction is called I implemented it as you asked. I reused existing I admit that We probably need to write an example in the docstring of how to handle these in custom training loops, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. The changes all look good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the answers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, thank you for the contribution. LGTM
Imported from GitHub PR keras-team/keras#16712 Properly fixes #6118 and #5911. #### Summary * Add the `ignore_index: Optional[int]` parameter to the following functions/constructors: - `backend.sparse_categorical_crossentropy` - `lossses.sparse_categorical_crossentropy` - `metrics.SparseCategoricalCrossentropy` - `metrics._IoUBase` - `metrics.IoU` - `metrics.MeanIoU` - `metrics.OneHotIoU` - `metrics.OneHotMeanIoU` * Add `sparse_labels: bool` and `sparse_preds: bool` parameters in `_IoUBase`, `IoU`, `MeanIoU` metric classes. * Add `sparse_preds:bool` to the `OneHotIoU` and `OneHotMeanIoU` metric classes. * Refactor: A replicated code section shared among `backend.categorical_crossentropy`, `backend.sparse_categorical_crossentropy`, and `backend.binary_crossentropy` into a single function named `_get_logits`. #### Goals 1. **ignore_index**: In segmentation problems, some pixels in segmentation maps might not represent valid categorical labels. Examples: - object boundaries are marked with void category, as the annotators disagree on which label to attribute - small maps are padded with the *void* class to conform with the sizes of larger ones after `Dataset#padded_batch` - specific objects out of the context of the problem, such as the hood of a car being captured by a static camera - pseudo-labels (originated from weakly supervised strategies) might contain pixels/regions where label is uncertain It's common to attribute the label `-1` or `255` and ignore these pixels during training. This PR implements this feature by masking the target and the output signals, only computing the metrics over the valid pixels. Moreover, it mirrors PyTorch's [CrossEntropyLoss(ignore_index=-100)](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html). 2. **sparse_preds**: `IoU` and `MeanIoU` assumes both `target` and `output` are sparse signals, where categories are represented as natural integers. Conversely, `OneHotIoU` and `OneHotMeanIoU` assume both are probability distribution vectors. This is far from what I believe to be the most obvious case: sparse segmentation labels and dense output vectors: ```py >>> classes = 20 >>> model = Sequential([ >>> ResNet50V2(input_shape=[512, 512, 3], include_top=False, pooling=None, weights=None), >>> Conv2D(classes, kernel_size=1, activation='softmax', name='predictions') >>> ]) >>> print(model.output.shape) (None, 16, 16, 20) ``` So now IoU can be easily used as this: ```py model.compile(opt='sgd', loss='sparse_categorical_crossentropy', metrics=[ MeanIoU(classes, sparse_preds=False, ignore_index=-1) ]) ``` #### Limitations Currently, `backend.sparse_categorical_crossentropy` only reduces the dimension containing the logits, and the result is reshaped into the original output shape (except for the last axis) if the information is available. However, when a pixel is not valid, its associated cross-entropy value is not available and reshape cannot occur without creating a ragged tensor. Therefore, when `ignore_index is not None` (and only then), I opted to sum all cross-entropy values over the axes `range(1, output_rank-1)` and divide by the number of valid pixels (similar to what pytorch does). In this case, the output tensor will have `shape=[output_shape[0]]=[batch_size]`. An alternative would be to return a flatten array containing only valid entries, though the batch information would be lost and the user would have difficulties if they had per-sample operations being applied to these loss values. Copybara import of the project: -- b7f02816b5320855ae528971766fdcaad7134a9b by lucasdavid <lucasolivdavid@gmail.com>: Add ignore_index crossentropy and IoU -- 1589a843bac4390c8377db05cbd6ae650b6210cc by lucasdavid <lucasolivdavid@gmail.com>: Remove duplicate convert_to_tensor -- 70f7fb6789e1a7e030737a46847b24b892965e4e by lucasdavid <lucasolivdavid@gmail.com>: Rename ignore_index to ignore_label, update docs -- db9f76ac8d1945630061582b03381939349bb59a by lucasdavid <lucasolivdavid@gmail.com>: Implement masked loss reduction -- 4f1308112f4188c4e14fdf3a59af8fe5f30db61f by lucasdavid <lucasolivdavid@gmail.com>: Update docs Merging this change closes #16712 PiperOrigin-RevId: 461661376
PiperOrigin-RevId: 463204427
PiperOrigin-RevId: 463204427
@fchollet is the PR being rolled back? Did I break something? |
@lucasdavid @fchollet Keras and TF are not in sync with respect to this new feature:
So, although the documentation mentions the parameter, Tensorflow doesn't support it yet, while Keras does. |
@visionscaper I thought these two synchronized automatically... Maybe this has something to do with PR #16851? A suspicious thing happened during this PR:
I don't have the necessary access permissions to see what failed in capybara's logs, but all tests cases pass in my machine and in the GPU and CPU CIs. |
@visionscaper, I believe it's now working in tf-nightly: import numpy as np
import tensorflow as tf
print(tf.__version__)
print(tf.keras.losses.sparse_categorical_crossentropy(
np.random.randint(-1, 10, size=[40, 1]),
np.random.randn(40, 10),
ignore_class=-1
))
print(tf.keras.losses.SparseCategoricalCrossentropy(ignore_class=-1)(
np.random.randint(-1, 10, size=[40, 1]),
np.random.randn(40, 10)
)) 2.11.0-dev20221011
<tf.Tensor: shape=(40,), dtype=float64, numpy=
array([17.46086028, 1.64064914, 16.93660172, 2.25964914, 3.02902916,
2.38092942, 16.97059782, 16.5225091 , 0. , 17.21726741,
1.47735013, 16.65561627, 16.58555793, 0. , 1.94407448,
17.10451514, 2.04504283, 16.77500514, 2.10482156, 16.75909515,
1.45206898, 1.36361875, 16.52289682, 16.88043939, 17.54349066,
17.05594301, 2.36618914, 1.87394029, 17.44958865, 1.42225717,
17.27105659, 0. , 17.47401625, 17.47470669, 17.42965694,
17.33150152, 1.03149344, 1.56646177, 16.54211899, 17.32527626])>
<tf.Tensor: shape=(), dtype=float64, numpy=10.997069327410369> |
Thanks for letting me know @lucasdavid! |
Well the ignore labels work but it is extremely slow. |
I used a very similar version of this one to train over Pascal VOC 2012 and I didn't see any performance issues. |
I noticed that Accuracy metrics don't have an arg like BackgroundI used |
@miticollo I don't think there's a reason for it. Maybe people just did not hit this use-case enough times to come around and implement it. In any case, a wrapper should be fine. |
Fix #6118
Fix #5911
Relates to keras-team/tf-keras#617
Summary
ignore_class: Optional[int]
parameter to the following functions/constructors:backend.sparse_categorical_crossentropy
lossses.sparse_categorical_crossentropy
metrics.SparseCategoricalCrossentropy
metrics._IoUBase
metrics.IoU
metrics.MeanIoU
metrics.OneHotIoU
metrics.OneHotMeanIoU
sparse_y_true: bool
andsparse_y_pred: bool
parameters in_IoUBase
,IoU
,MeanIoU
metric classes.sparse_y_pred:bool
to theOneHotIoU
andOneHotMeanIoU
metric classes and refactor these classes to reuse more of the base class.backend.categorical_crossentropy
,backend.sparse_categorical_crossentropy
, andbackend.binary_crossentropy
into a single function named_get_logits
.Goals
ignore_class: In segmentation problems, some pixels in segmentation maps might not represent valid categorical labels. Examples:
Dataset#padded_batch
It's common to attribute the label
-1
or255
and ignore these pixels during training. This PR implements this feature by masking the target and the output signals, only computing the metrics over the valid pixels. Moreover, it mirrors PyTorch's CrossEntropyLoss(ignore_index=-100).sparse_y_pred:
IoU
andMeanIoU
assumes bothtarget
andoutput
are sparse signals, where categories are represented as natural integers. Conversely,OneHotIoU
andOneHotMeanIoU
assume both are probability distribution vectors. This is far from what I believe to be the most obvious case: sparse segmentation labels and dense output vectors:So now IoU can be easily used as this:
Limitations
Currently,backend.sparse_categorical_crossentropy
only reduces the dimension containing the logits, and the result is reshaped into the original output shape (except for the last axis) if the information is available.However, when a pixel is not valid, its associated cross-entropy value is not available and reshape cannot occur without creating a ragged tensor. Therefore, when
ignore_index is not None
(and only then), I opted to sum all cross-entropy values over the axesrange(1, output_rank-1)
and divide by the number of valid pixels (similar to what pytorch does). In this case, the output tensor will haveshape=[output_shape[0]]=[batch_size]
. An alternative would be to return a flatten array containing only valid entries, though the batch information would be lost and the user would have difficulties if they had per-sample operations being applied to these loss values.No visible limitations now.
backend.sparse_categorical_crossentropy
will set the_keras_mask
property in the loss Tensor, which will be used during the reduction procedure to mask out invalid pixels.