Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PolyLoss (ICLR 2022) #467

Merged
merged 5 commits into from
Jun 14, 2023

Conversation

acforvs
Copy link
Contributor

@acforvs acforvs commented Dec 9, 2022

Closes #466

I focused on implementing the Poly-1 cross entropy loss with alpha-label smoothing since, as it is shown in the paper, the first term plays the most important role. Also, I crafted a small test showing that the behavior is indeed similar to the one of the softmax_cross_entropy when \epsilon = 0

I didn't add it to optax/__init__.py and docs/api.rst files yet (would it be ok to do that in a separate PR?)

Please, let me know what you think!

Copy link
Member

@mkunesch mkunesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution - this looks great! Thanks especially for adding the maths and recommended parameters.

I've added a few comments. I'll do another pass to compare the maths to the paper later.

logits: Unnormalized log probabilities, with shape `[..., num_classes]`.
labels: Valid probability distributions (non-negative, sum to 1), e.g a
one hot encoding specifying the correct class for each input;
must have a shape broadcastable to `[..., num_classes]``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ` instead of . at the end.

labels: Valid probability distributions (non-negative, sum to 1), e.g a
one hot encoding specifying the correct class for each input;
must have a shape broadcastable to `[..., num_classes]``
epsilon: The coefficient of the first polynomial term (default = 2.0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's remove the default value from here as it is already visible to the user in the function signature.

- For the ImageNet 2d image classification, epsilon = 2.0
- For the 2d Instance Segmentation and object detection, epsilon = -1.0
- It is also recommended to adjust this value
based on the task and dataset at hand. For example, one can use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's make this slightly more concise, e.g.:
"... adjust this value based on the task, e.g. using grid search."

must have a shape broadcastable to `[..., num_classes]``
epsilon: The coefficient of the first polynomial term (default = 2.0).
According to the paper, the following values are recommended:
- For the ImageNet 2d image classification, epsilon = 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's add a full stop at the end of each bullet.

- It is also recommended to adjust this value
based on the task and dataset at hand. For example, one can use
simple grid search to achieve it.
alpha: The smoothing factor, the greedy category with be assigned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part "the greedy category with be assigned" ... is there a typo and it should be "will be"?

Copy link
Contributor Author

@acforvs acforvs Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I copypasted the description from here without going through it

Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

It looks like everything except for the "alpha: The smoothing factor" can be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

Yes, please, that would be great - thanks a lot! I also saw some other style guide violations in the docstrings above (e.g. capitalization) - if you have time to fix these too in the same or a different PR that would be great but no worries if not of course!

probability `(1-alpha) + alpha / num_categories` (default = 0.0)

Returns:
poly loss between each prediction and the corresponding target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capitalize the first letter: Poly loss.

optax/_src/loss.py Outdated Show resolved Hide resolved
)
def test_equals_to_cross_entropy_when_eps0(self, logits, labels):
np.testing.assert_allclose(
self.variant(loss.poly_loss_cross_entropy)(logits, labels, 0., 0.),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please use a keyword argument for epsilon here so that it is immediately obvious to the reader that epsilon is being set to zero (you can then just leave alpha at the default or set it using a kwarg too - or we might remove it anyway).

optax/_src/loss.py Show resolved Hide resolved

@chex.all_variants
@parameterized.parameters(
dict(eps=2, alpha=0, expected=4.531657285679147),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are testing to atol 1e-4 we can remove some of the figures from the expected results.

@acforvs
Copy link
Contributor Author

acforvs commented Dec 19, 2022

Thanks for the review, I updated the code!

@acforvs
Copy link
Contributor Author

acforvs commented May 17, 2023

Hey @mkunesch @hbq1, are there any updates about this PR?

Copy link
Member

@mkunesch mkunesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry for dropping the ball on this! I've added a few small nits, but we we can also address those during merging so I've also approved.

Thanks a lot!

dict(eps=0, expected=2.8990),
dict(eps=-0.5, expected=2.4908),
dict(eps=1.15, expected=3.8378),
dict(eps=2, expected=4.5317),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now a duplicate of line 514.

dict(eps=0, expected=np.array([0.1698, 0.8247])),
dict(eps=-0.5, expected=np.array([0.0917, 0.7168])),
dict(eps=1.15, expected=np.array([0.3495, 1.0731])),
dict(eps=2, expected=np.array([0.4823, 1.2567])),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as above, this is now a duplicate of the first parameter combination)

\frac{1}{N + 1} \cdot (1 - P_t)^{N + 1} + \ldots = \\
- \log(P_t) + \sum_{j = 1}^N \epsilon_j \cdot (1 - P_t)^j

This function provides a simplified version of the :math:`L_{Poly-N}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo/nit: I'd personally remove the the before :math:L_{Poly-N}``, i.e.

This function provides a simplified version of :math:`L_{Poly-N}`


Args:
logits: Unnormalized log probabilities, with shape `[..., num_classes]`.
labels: Valid probability distributions (non-negative, sum to 1), e.g a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: e.g -> e.g.. This is also incorrect in the softmax_cross_entropy - I can correct it there or you can include it in this PR too.

According to the paper, the following values are recommended:
- For the ImageNet 2d image classification, epsilon = 2.0.
- For the 2d Instance Segmentation and object detection, epsilon = -1.0.
- It is also recommended to adjust this value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this line break looks early to me, let's break as close to 80 characters as possible.

- It is also recommended to adjust this value
based on the task and dataset at hand. For example, one can use
simple grid search to achieve it.
alpha: The smoothing factor, the greedy category with be assigned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I removed the smoothing itself, this issue is now fixed in this PR. Should I submit a separate PR to fix this line for the smooth_labels as well?

Yes, please, that would be great - thanks a lot! I also saw some other style guide violations in the docstrings above (e.g. capitalization) - if you have time to fix these too in the same or a different PR that would be great but no worries if not of course!

optax/_src/loss.py Outdated Show resolved Hide resolved
cross_entropy = softmax_cross_entropy(logits=logits, labels=labels)
poly_loss = cross_entropy + epsilon * one_minus_pt

return poly_loss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's return the result of the calculation directly in the line above - in this case there is no readability reason to assign to a named variable since the function name and docstring already document what is being returned.

@acforvs
Copy link
Contributor Author

acforvs commented May 17, 2023

@mkunesch addressed the comments. The tests seem to be failing locally though, but it looks like the problem is in ctc_loss_with_forward_probs

I'll do another pass in a separate PR to fix the style in other docstrings if that's ok

@mkunesch
Copy link
Member

mkunesch commented Jun 7, 2023

Hi! Thanks a lot for the changes and sorry for the delay!

The tests seem to be failing locally though, but it looks like the problem is in ctc_loss_with_forward_probs

Yes, that was an unrelated problem. We've fixed that error now - could you sync with master? That should make the checks pass so that we can merge.

I'll do another pass in a separate PR to fix the style in other docstrings if that's ok

Sounds good, thanks a lot!

@acforvs
Copy link
Contributor Author

acforvs commented Jun 7, 2023

Hi, no worries!
I synced my local branch with master, but it didn't trigger the tests for some reason - could you re-run them please?

@copybara-service copybara-service bot merged commit f527be8 into google-deepmind:master Jun 14, 2023
4 checks passed
@acforvs
Copy link
Contributor Author

acforvs commented Jun 14, 2023

@mkunesch
Copy link
Member

Ah, yes please - I forgot to check that in my review. Thanks a lot!

@acforvs
Copy link
Contributor Author

acforvs commented Jun 14, 2023

Added here: #537

@acforvs acforvs deleted the poly-loss branch June 14, 2023 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for PolyLoss (ICLR 2022)
2 participants