Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New classification datasets support for FLAVA #5108

Closed
14 tasks done
NicolasHug opened this issue Dec 17, 2021 · 24 comments
Closed
14 tasks done

New classification datasets support for FLAVA #5108

NicolasHug opened this issue Dec 17, 2021 · 24 comments

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Dec 17, 2021

To support our colleagues' work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.

CC-ing @pmeier and @jdsgomes as previously discussed. We're on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022.
I'm also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?

Implementing a new dataset

Implementing a dataset consists of 2 main things:

  • The dataset class with a root, split, transform and target_transform parameter. When available we should also support a download parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the MNIST class
  • A test class which will generate automatic tests, e.g. this one for MNIST.

If there's some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.

For contritbutors

If you're interesting in taking one of the datasets above, please comment below with "I'm working on dataset X" so that others don't pick the same! :)

cc @pmeier

@NicolasHug NicolasHug changed the title New classification datasets support for FLAV New classification datasets support for FLAVA Dec 17, 2021
@pmeier
Copy link
Collaborator

pmeier commented Dec 17, 2021

I'm going to take DTD and Oxford Pets.

FER2013 This is a Kaggle dataset, so I'm not sure we'll be able to support download (but maybe)

Nope. Downloads from Kaggle are currently not supported, since they require login. For now I would simply not add a download flag. Later in the new style datasets, we can provide them as

class ManualDownloadResource(OnlineResource):

@abhi-glitchhg
Copy link
Contributor

Can I try the Stanford cars dataset?

@jdsgomes
Copy link
Contributor

I am taking the Food 101 now now.

@fibbonnaci
Copy link

I was planning on taking the Stanford Cars dataset. @abhi-glitchhg if you're taking it, then I'll try the Food101 dataset

@fibbonnaci
Copy link

I am taking the Food 101 now now.

Dang, I'm a few seconds late. I'll try PCAM then.

@zhiqwang
Copy link
Contributor

I was planning on taking the Flowers-102.

@oke-aditya
Copy link
Contributor

I am planning to take SUN dataset.
I'm unsure of my time and bandwidth as I would be working in office from next month.
Any contributor can supersede me 😄

@sumukhaithal6
Copy link
Contributor

I am planning to work on the GSTRB dataset.

@frgfm
Copy link
Contributor

frgfm commented Dec 17, 2021

Coming late to the party, but I'd be keen to take care of EuroSAT 👍
Glad to hear the dataset zoo is extending 😄

@pmeier
Copy link
Collaborator

pmeier commented Dec 17, 2021

@ everyone who volunteered to take a dataset: thanks a lot! @NicolasHug will be out until next year, so feel free to ping me on PRs.

@yiwen-song
Copy link
Contributor

I'll take FGVC-Aircraft :)

@puhuk
Copy link
Contributor

puhuk commented Dec 21, 2021

I'll take Country211 :)

jdsgomes added a commit to jdsgomes/vision that referenced this issue Dec 21, 2021
jdsgomes added a commit that referenced this issue Dec 22, 2021
* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 31fadbe.

* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 4e3d900.

* Add Food101 Dataset

Addresses #5108.
cc @pmeier @NicolasHug

* Remove unecessary Path contructor calls

* Remove unecessary Path contructor callsi and fix types

* Fix tests

* Address PR comments from @pmeier

* Fix bug in tests and in food101 dataset

* Fix bug in tests and in food101 dataset

* Update torchvision/datasets/food101.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
@saswatpp
Copy link
Contributor

@oke-aditya Mind if I take the SUN dataset task, please ?

@oke-aditya
Copy link
Contributor

oke-aditya commented Dec 23, 2021

Sure. Go ahead

@pmeier
Copy link
Collaborator

pmeier commented Dec 28, 2021

I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.

facebook-github-bot pushed a commit that referenced this issue Dec 29, 2021
Summary:
* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 31fadbe.

* Adding multiweight support for shufflenetv2 prototype models

* Revert "Adding multiweight support for shufflenetv2 prototype models"

This reverts commit 4e3d900.

* Add Food101 Dataset

Addresses #5108.
cc pmeier NicolasHug

* Remove unecessary Path contructor calls

* Remove unecessary Path contructor callsi and fix types

* Fix tests

* Address PR comments from pmeier

* Fix bug in tests and in food101 dataset

* Fix bug in tests and in food101 dataset

* Update torchvision/datasets/food101.py

Reviewed By: prabhat00155

Differential Revision: D33351107

fbshipit-source-id: de2a0df07397be82605ee5b700c96297ec3394d5

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
puhuk added a commit to puhuk/vision that referenced this issue Dec 30, 2021
puhuk added a commit to puhuk/vision that referenced this issue Dec 30, 2021
@frgfm
Copy link
Contributor

frgfm commented Jan 6, 2022

I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.

Oh nice, I read about those prototypes and was curious to play around with it 😁
Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes? or do you mean changing already implemented ones to use the prototypes?

@pmeier
Copy link
Collaborator

pmeier commented Jan 6, 2022

Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes?

Exactly. Let me know if you hit any roadblocks as I'm eager to get feedback.

@jdsgomes
Copy link
Contributor

jdsgomes commented Jan 6, 2022

Hello @zhiqwang 👋
Are you still planning to work on the Flowers-102?
If you are no longer interested or don't have time thats obviously ok, but we can put it up for grabs since we are aiming to finish this month.

@pmeier
Copy link
Collaborator

pmeier commented Jan 6, 2022

Same for @fibbonnaci and the PCAM dataset.

@zhiqwang
Copy link
Contributor

zhiqwang commented Jan 6, 2022

Hi @jdsgomes , I'm working on this now, and hope to submit the PR today.

@NicolasHug
Copy link
Member Author

Thanks a lot of offering to help with the prototypes @frgfm . Let me know which one(s) you're trying to implement so we don't overlap :) . On my side I'll give try to GTSRB.

@pmeier
Copy link
Collaborator

pmeier commented Jan 9, 2022

Hey @fibbonnaci, PCAM is the last dataset that does not have a PR up yet. Are you working on that? If yes please push a PR even if you are not done, so we can help out and accelerate this. Otherwise, I'll send one myself.

NicolasHug added a commit that referenced this issue Jan 10, 2022
* Add Country211 dataset

To addresses issue #5108.

* Add Country211 dataset

To addresses issue #5108.

* Update country211.py

* Update country211.py

* Code review reflected

Reflect code review

* Update test_datasets.py

* Update with review

Update with review

* inherit from ImageFolder

* Update test/test_datasets.py

* Docstring + minor test update

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Nicolas Hug <nicolashug@fb.com>
@NicolasHug
Copy link
Member Author

As discussed with @fibbonnaci offline, I'll take over the PCAM dataset.

facebook-github-bot pushed a commit that referenced this issue Jan 17, 2022
Summary:
* Add Country211 dataset

To addresses issue #5108.

* Add Country211 dataset

To addresses issue #5108.

* Update country211.py

* Update country211.py

* Code review reflected

Reflect code review

* Update test_datasets.py

* Update with review

Update with review

* inherit from ImageFolder

* Update test/test_datasets.py

* Docstring + minor test update

Reviewed By: NicolasHug

Differential Revision: D33618167

fbshipit-source-id: 04de3c5290b966ff97f21ea32b2f678079aa2a6c

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Nicolas Hug <nicolashug@fb.com>
@NicolasHug
Copy link
Member Author

Looks like we're all done

Thank you so much everyone who submitted a dataset, your help is much appreciated!

Tons of thanks to @pmeier in particular for all your help with submissions and the reviews!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests