Skip to content

Conversation

@ksalama
Copy link
Contributor

@ksalama ksalama commented Dec 2, 2020

No description provided.

@ksalama
Copy link
Contributor Author

ksalama commented Dec 2, 2020

@fchollet - Thank you so much for merging my previous PR. I am not sure why the introduction section was not added to the MD file. I created this PR to add the introduction

Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com>
Copy link
Contributor

@8bitmp3 8bitmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ksalama I have a few proposals to improve this intro, if you don't mind.

I think here you're describing the results of an experiment (by saying it "outperforms"). Maybe it'd be more useful for the readers to first learn about the gist of this supervised contrastive learning (SCL) and how it works in 1-2 sentences.

Then, you could finish off this small introductory paragraph with the "outperforms" statement, while also being explicit about how SCL outperforms the traditional vanilla cross-entropy supervised learning stuff (judging by Table 2 and 3 on page 7 of https://arxiv.org/pdf/2004.11362.pdf, SCL "outperforms" i.t.o. accuracy by a margin).

I think the cool thing about SCL is that it extends the previous (?) self-supervised approach to supervised learning - you should definitely highlight it here ("the self-supervised batch contrastive approach to the fully-supervised" - page 1, https://arxiv.org/pdf/2004.11362.pdf). SCL "contrasts the set of all samples from the same class as positives against the negatives from the remainder of the batch" (page 2, figure 3).

On page 4 under "Method", the paper actually summarizes what SCL does:

"Given an input batch of data, we first apply data augmentation twice to obtain two copies of the batch. Both copies are forward propagated through the encoder network to obtain a 2048-dimensional normalized embedding. During training, this representation is further propagated through a projection network that is discarded at inference time. The supervised contrastive loss is computed on the outputs of the projection network. To use the trained model for classification, we train a linear classifier on top of the frozen representations using a cross-entropy loss."

I recommend you summarize it in a human-friendly way to appeal to non-academics.

I can assist you with that, if you need help.

Anyway, these are just suggestions.

Comment on lines 13 to 15
[Supervised Contrastive Learning](https://arxiv.org/abs/2004.11362)
(Prannay Khosla et al.) is a training methodology that outperforms
plain crossentropy-supervised training on classification tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ksalama I have a few proposals to improve this intro, if you don't mind.

I think here you're describing the results of an experiment (by saying it "outperforms"). Maybe it'd be more useful for the readers to first learn about the gist of this supervised contrastive learning (SCL) and how it works in 1-2 sentences.

Then, you could finish off this small introductory paragraph with the "outperforms" statement, while also being explicit about how SCL outperforms the traditional vanilla cross-entropy supervised learning stuff (judging by Table 2 and 3 on page 7 of https://arxiv.org/pdf/2004.11362.pdf, SCL "outperforms" i.t.o. accuracy by a margin).

I think the cool thing about SCL is that it extends the previous (?) self-supervised approach to supervised learning - you should definitely highlight it here ("the self-supervised batch contrastive approach to the fully-supervised" - page 1, https://arxiv.org/pdf/2004.11362.pdf). SCL "contrasts the set of all samples from the same class as positives against the negatives from the remainder of the batch" (page 2, figure 3).

On page 4 under "Method", the paper actually summarizes what SCL does:

"Given an input batch of data, we first apply data augmentation twice to obtain two copies of the batch. Both copies are forward propagated through the encoder network to obtain a 2048-dimensional normalized embedding. During training, this representation is further propagated through a projection network that is discarded at inference time. The supervised contrastive loss is computed on the outputs of the projection network. To use the trained model for classification, we train a linear classifier on top of the frozen representations using a cross-entropy loss."

I recommend you summarize it in a human-friendly way to appeal to non-academics.

I can assist you with that, if you need help.

Anyway, these are just suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@8bitmp3 Thanks a lot for suggestion. Please feel free to provide a intro text that you think it could be simple and useful, and will be happy to commit your suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksalama Would it fair to say that the the supervised contrastive learning paper introduces a way of training that includes the supervised contrastive loss? Also, we could say that the method offers a two-stage framework that enhances the image classification performance (borrowed from: https://github.com/sayakpaul/Supervised-Contrastive-Learning-in-TensorFlow-2). I also like how they worded it here: "Learn how to map the normalized encoding of samples belonging to the same category closer and the samples belonging to the other classes farther." (https://wandb.ai/authors/scl/reports/Improving-Image-Classifiers-With-Supervised-Contrastive-Learning--VmlldzoxMzQwNzE). We could rephrase it with attribution. cc @fchollet

Copy link
Contributor

@8bitmp3 8bitmp3 Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear with me, there's a talk by one of the sponsors at the NeurIPS conference today - and probably next week by the paper's authors - that cover(s) contrastive and supervised contrastive learning. I'll take some notes and revise the intro to make it more useful for the readers. cc @fchollet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@8bitmp3 Sounds good. I would merge this basic introduction to the .md file so that the example page on the website will have "an" introduction (as it is currently hasn't!). Then we can update the introduction as you suggest.

Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com>
[Supervised Contrastive Learning](https://arxiv.org/abs/2004.11362)
(Prannay Khosla et al.) is a training methodology that outperforms
plain crossentropy-supervised training on classification tasks.
supervised training on classification tasks with cross-entropy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Keras API, "crossentropy" is a single word

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it @fchollet

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Any changes should be first applied to the .py file, then replicated in the md and ipynb files.

@fchollet
Copy link
Contributor

fchollet commented Dec 4, 2020

Also note that I have fixed the issue with the intro not showing up. The reason why it happened is that it was part of the same block of text as the header. I've added a test that makes sure we'll catch this sort of issue in the future.

@ksalama
Copy link
Contributor Author

ksalama commented Dec 6, 2020

@fchollet - I have update the introduction in the .py, .md, and .ipynb files.

Probable error in example "TemporalSoftmax" (keras-team#320)
Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@ksalama
Copy link
Contributor Author

ksalama commented Dec 8, 2020

@fchollet - I have updated the intro in the three files

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@fchollet fchollet merged commit de7ea52 into keras-team:master Dec 8, 2020
ksalama added a commit to ksalama/keras-io that referenced this pull request Dec 19, 2020
Add an introduction section to the MD file (keras-team#321)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants