New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add example of self-supervised SimCLR training - V2 #50
Conversation
Hi @DevinCheung , sorry for my late reply and thank you very much for your contribution. The current version is still an unstable beta version and we are updating the API for better usability. Could you integrate your example with the latest API? I will notify you when the new API is merged into main branch. :) |
Meanwhile, it will be good to rebase the commits into one commit so that the git log will look cleaner. Since this PR is newer, I will close #42 |
Hi, @FrankLeeeee Sure, please notify me when the updated API is ready :) |
Hi, @FrankLeeeee I notice the API is updated. Are all updates finished already? |
@DevinCheung Hi Devin. Yes, the API is merged. The documentation and example code will be updated by today. You may refer to the new doc later. |
Hi, Frank @FrankLeeeee I have integrated my example into the latest API. Thanks for your evaluation. Please feel free to contact me if anything is needed. :) |
Hi @DevinCheung , thanks for your update! The SimCLR looks like a fantastic example for us. However, can you refer to the new documentation and the examples folder to sync with latest code writing style? We hope to introduce minimum interference to the code writing process of a normal PyTorch user, so we are no longer having model, optimizer, loss etc. in the config file unless someone really needs it (e.g. for pipeline model partitioning) and knows how to build it from config. It will be great if you can refactor your code so that things look consistent :) Meanwhile, rebasing the commits into only one commit will do us a great favour as the git log will look cleaner and more meaningful! Feel free to reach me if you need any sort of help! |
Hi @FrankLeeeee , sorry, I did not quite get it. Did you mean I need to modify my config file for consistency? Specifically, try not to contain the definitions of "model, optimizer, loss, etc"? In my case, I do define my own models and a loss function. And I arrange the config file in a similar style as the "vit-b16" example. It would be appreciated if you could help make it more clear of which to modify :) |
Hi @DevinCheung , sorry for my late reply. You may take a look at the vit example which uses the latest api. You can follow this style. Perhaps I wasn't clear last, definitions of "model, optimizer, loss, etc" meant configuration in my reply. We would like to keep the configuration file light-weight so as to not intervene the user habit of writing in PyTorch. Do let me know if you have any issue :) |
Hi, @FrankLeeeee The example has been synced to the latest code writing style. Please take time to have a look. @me if further modification is needed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DevinCheung , this is an awesome example and I have viewed all the file changes! There are some areas I think is worth our attention.
- untypical path should be avoided, for example
../../../../../datasets
in your code. Try to use./dataset
oros.environ['DATA']
. Do talk about where the data will be downloaded in yourREADME.md
as well. setup.py
should not be changed usually. I understand that you comment it out probably because you encounter some issue during installation. You might want to create an issue in the github and we will look into that issue.- The readme file can be more detailed as many people are not from the self-supervised learning background and may not understand what is going on in this example. Perhaps more details (e.g. short introduction or a hyperlink to some web page) will help such as the definition of linear evaluation, PreAct, t-SNE, etc.
- To keep a clean git log for future back-track, you should first squash your commits into one commit so that it looks cleaner. Then, you can rebase with your upstream main branch.
Hope you will find these suggestions useful :)
Hi, @FrankLeeeee , README has already been detailed. |
@DevinCheung Thanks for your contribution, I will close this PR. |
The previous version uses Nvidia DALI to create a dataloader. I found that data augmentations in DALI are different from those of torchvision. As a result, the desired performance could not be achieved. In this version, dataloader is implemented with colossalai.nn.data and torchvision. The final linear evaluation accuracy could be up to 85.4%.