Add support for PolyLoss (ICLR 2022) #467

acforvs · 2022-12-09T20:08:48Z

Closes #466

I focused on implementing the Poly-1 cross entropy loss with alpha-label smoothing since, as it is shown in the paper, the first term plays the most important role. Also, I crafted a small test showing that the behavior is indeed similar to the one of the softmax_cross_entropy when \epsilon = 0

I didn't add it to optax/__init__.py and docs/api.rst files yet (would it be ok to do that in a separate PR?)

Please, let me know what you think!

mkunesch

Thanks a lot for this contribution - this looks great! Thanks especially for adding the maths and recommended parameters.

I've added a few comments. I'll do another pass to compare the maths to the paper later.

mkunesch · 2022-12-10T14:31:57Z

optax/_src/loss.py

+      logits: Unnormalized log probabilities, with shape `[..., num_classes]`.
+      labels: Valid probability distributions (non-negative, sum to 1), e.g a
+        one hot encoding specifying the correct class for each input;
+        must have a shape broadcastable to `[..., num_classes]``


Typo: ` instead of . at the end.

mkunesch · 2022-12-10T14:34:59Z

optax/_src/loss.py

+      labels: Valid probability distributions (non-negative, sum to 1), e.g a
+        one hot encoding specifying the correct class for each input;
+        must have a shape broadcastable to `[..., num_classes]``
+      epsilon: The coefficient of the first polynomial term (default = 2.0).


Nit: Let's remove the default value from here as it is already visible to the user in the function signature.

mkunesch · 2022-12-10T14:36:34Z

optax/_src/loss.py

+        - For the ImageNet 2d image classification, epsilon = 2.0
+        - For the 2d Instance Segmentation and object detection, epsilon = -1.0
+        - It is also recommended to adjust this value
+          based on the task and dataset at hand. For example, one can use


Nit: let's make this slightly more concise, e.g.:
"... adjust this value based on the task, e.g. using grid search."

mkunesch · 2022-12-10T14:36:59Z

optax/_src/loss.py

+        must have a shape broadcastable to `[..., num_classes]``
+      epsilon: The coefficient of the first polynomial term (default = 2.0).
+        According to the paper, the following values are recommended:
+        - For the ImageNet 2d image classification, epsilon = 2.0


Nit: Let's add a full stop at the end of each bullet.

mkunesch · 2022-12-10T14:38:25Z

optax/_src/loss.py

+        - It is also recommended to adjust this value
+          based on the task and dataset at hand. For example, one can use
+          simple grid search to achieve it.
+      alpha: The smoothing factor, the greedy category with be assigned


I don't understand this part "the greedy category with be assigned" ... is there a typo and it should be "will be"?

My bad, I copypasted the description from here without going through it

Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

It looks like everything except for the "alpha: The smoothing factor" can be removed

Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

Yes, please, that would be great - thanks a lot! I also saw some other style guide violations in the docstrings above (e.g. capitalization) - if you have time to fix these too in the same or a different PR that would be great but no worries if not of course!

mkunesch · 2022-12-10T14:39:28Z

optax/_src/loss.py

+        probability `(1-alpha) + alpha / num_categories` (default = 0.0)
+
+    Returns:
+      poly loss between each prediction and the corresponding target


Please capitalize the first letter: Poly loss.

optax/_src/loss.py

mkunesch · 2022-12-19T11:32:07Z

optax/_src/loss_test.py

+  )
+  def test_equals_to_cross_entropy_when_eps0(self, logits, labels):
+    np.testing.assert_allclose(
+        self.variant(loss.poly_loss_cross_entropy)(logits, labels, 0., 0.),


Nit: please use a keyword argument for epsilon here so that it is immediately obvious to the reader that epsilon is being set to zero (you can then just leave alpha at the default or set it using a kwarg too - or we might remove it anyway).

optax/_src/loss.py

mkunesch · 2022-12-19T11:45:54Z

optax/_src/loss_test.py

+
+  @chex.all_variants
+  @parameterized.parameters(
+      dict(eps=2, alpha=0, expected=4.531657285679147),


Since we are testing to atol 1e-4 we can remove some of the figures from the expected results.

acforvs · 2022-12-19T13:36:02Z

Thanks for the review, I updated the code!

acforvs · 2023-05-17T14:54:08Z

Hey @mkunesch @hbq1, are there any updates about this PR?

mkunesch

Hi! Sorry for dropping the ball on this! I've added a few small nits, but we we can also address those during merging so I've also approved.

Thanks a lot!

mkunesch · 2023-01-01T17:35:36Z

optax/_src/loss_test.py

+      dict(eps=0, expected=2.8990),
+      dict(eps=-0.5, expected=2.4908),
+      dict(eps=1.15, expected=3.8378),
+      dict(eps=2, expected=4.5317),


I think this is now a duplicate of line 514.

mkunesch · 2023-01-01T17:36:10Z

optax/_src/loss_test.py

+      dict(eps=0, expected=np.array([0.1698, 0.8247])),
+      dict(eps=-0.5, expected=np.array([0.0917, 0.7168])),
+      dict(eps=1.15, expected=np.array([0.3495, 1.0731])),
+      dict(eps=2, expected=np.array([0.4823, 1.2567])),


(as above, this is now a duplicate of the first parameter combination)

mkunesch · 2023-01-01T17:38:03Z

optax/_src/loss.py

+    \frac{1}{N + 1} \cdot (1 - P_t)^{N + 1} + \ldots = \\
+    - \log(P_t) + \sum_{j = 1}^N \epsilon_j \cdot (1 - P_t)^j
+
+  This function provides a simplified version of the :math:`L_{Poly-N}`


typo/nit: I'd personally remove the the before :math:L_{Poly-N}``, i.e.

This function provides a simplified version of :math:`L_{Poly-N}`

mkunesch · 2023-01-01T17:38:41Z

optax/_src/loss.py

+
+    Args:
+      logits: Unnormalized log probabilities, with shape `[..., num_classes]`.
+      labels: Valid probability distributions (non-negative, sum to 1), e.g a


typo: e.g -> e.g.. This is also incorrect in the softmax_cross_entropy - I can correct it there or you can include it in this PR too.

mkunesch · 2023-01-01T17:45:53Z

optax/_src/loss.py

+        According to the paper, the following values are recommended:
+        - For the ImageNet 2d image classification, epsilon = 2.0.
+        - For the 2d Instance Segmentation and object detection, epsilon = -1.0.
+        - It is also recommended to adjust this value


Nit: this line break looks early to me, let's break as close to 80 characters as possible.

mkunesch · 2023-01-01T17:49:54Z

optax/_src/loss.py

+        - It is also recommended to adjust this value
+          based on the task and dataset at hand. For example, one can use
+          simple grid search to achieve it.
+      alpha: The smoothing factor, the greedy category with be assigned


Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

Yes, please, that would be great - thanks a lot! I also saw some other style guide violations in the docstrings above (e.g. capitalization) - if you have time to fix these too in the same or a different PR that would be great but no worries if not of course!

optax/_src/loss.py

mkunesch · 2023-01-01T17:52:30Z

optax/_src/loss.py

+  cross_entropy = softmax_cross_entropy(logits=logits, labels=labels)
+  poly_loss = cross_entropy + epsilon * one_minus_pt
+
+  return poly_loss


Nit: let's return the result of the calculation directly in the line above - in this case there is no readability reason to assign to a named variable since the function name and docstring already document what is being returned.

acforvs · 2023-05-17T16:20:47Z

@mkunesch addressed the comments. The tests seem to be failing locally though, but it looks like the problem is in ctc_loss_with_forward_probs

I'll do another pass in a separate PR to fix the style in other docstrings if that's ok

mkunesch · 2023-06-07T16:14:00Z

Hi! Thanks a lot for the changes and sorry for the delay!

The tests seem to be failing locally though, but it looks like the problem is in ctc_loss_with_forward_probs

Yes, that was an unrelated problem. We've fixed that error now - could you sync with master? That should make the checks pass so that we can merge.

I'll do another pass in a separate PR to fix the style in other docstrings if that's ok

Sounds good, thanks a lot!

acforvs · 2023-06-07T16:47:41Z

Hi, no worries!
I synced my local branch with master, but it didn't trigger the tests for some reason - could you re-run them please?

acforvs · 2023-06-14T16:21:20Z

Hi @mkunesch, thanks for the review! Should I also update https://github.com/deepmind/optax/blob/master/docs/api.rst#L494 and https://github.com/deepmind/optax/blob/master/optax/__init__.py#L96?

mkunesch · 2023-06-14T17:43:52Z

Ah, yes please - I forgot to check that in my review. Thanks a lot!

acforvs · 2023-06-14T19:18:45Z

Added here: #537

add poly_loss_cross_entropy

5a8a644

mkunesch reviewed Dec 19, 2022

View reviewed changes

fix description, remove alpha smoothing from poly_loss_cross_entropy

5f72d22

mkunesch approved these changes May 17, 2023

View reviewed changes

acforvs and others added 2 commits May 17, 2023 19:53

Merge branch 'deepmind:master' into poly-loss

d61a932

address nits

04a081d

mkunesch approved these changes Jun 7, 2023

View reviewed changes

Merge branch 'deepmind:master' into poly-loss

be72990

copybara-service bot merged commit f527be8 into google-deepmind:master Jun 14, 2023
4 checks passed

acforvs deleted the poly-loss branch June 14, 2023 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PolyLoss (ICLR 2022) #467

Add support for PolyLoss (ICLR 2022) #467

acforvs commented Dec 9, 2022

mkunesch left a comment

mkunesch Dec 10, 2022

mkunesch Dec 10, 2022

mkunesch Dec 10, 2022

mkunesch Dec 10, 2022

mkunesch Dec 10, 2022

acforvs Dec 19, 2022 •

edited

Loading

mkunesch Jan 1, 2023

mkunesch Dec 10, 2022

mkunesch Dec 19, 2022

mkunesch Dec 19, 2022

acforvs commented Dec 19, 2022

acforvs commented May 17, 2023

mkunesch left a comment •

edited

Loading

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

mkunesch Jan 1, 2023

acforvs commented May 17, 2023

mkunesch commented Jun 7, 2023 •

edited

Loading

acforvs commented Jun 7, 2023

acforvs commented Jun 14, 2023

mkunesch commented Jun 14, 2023

acforvs commented Jun 14, 2023

Add support for PolyLoss (ICLR 2022) #467

Add support for PolyLoss (ICLR 2022) #467

Conversation

acforvs commented Dec 9, 2022

mkunesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acforvs Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acforvs commented Dec 19, 2022

acforvs commented May 17, 2023

mkunesch left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acforvs commented May 17, 2023

mkunesch commented Jun 7, 2023 • edited Loading

acforvs commented Jun 7, 2023

acforvs commented Jun 14, 2023

mkunesch commented Jun 14, 2023

acforvs commented Jun 14, 2023

acforvs Dec 19, 2022 •

edited

Loading

mkunesch left a comment •

edited

Loading

mkunesch commented Jun 7, 2023 •

edited

Loading