New classification datasets support for FLAVA

To support our colleagues' work on the [FLAVA](https://arxiv.org/abs/2112.04482) paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.

- [x] [Food 101](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) @jdsgomes #5119
- [x] [Stanford Cars](http://ai.stanford.edu/~jkrause/cars/car_dataset.html) @abhi-glitchhg #5166
- [x] [FGVC Aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/) @sallysyw #5178
- [x] [DTD](https://www.robots.ox.ac.uk/~vgg/data/dtd/). A good starting point is [this PR](https://github.com/pytorch/vision/pull/743) from @pmeier #5115 
- [x] [Oxford Pets](https://www.robots.ox.ac.uk/~vgg/data/pets/). This one also comes with ROIs and segmentation masks, which would be nice to support. We could do something similar to [CelebA](https://github.com/pytorch/vision/blob/main/torchvision/datasets/celeba.py) with a `target_type` parameter. @pmeier #5116 
- [x] [Flowers-102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html). @zhiqwang #5177 
- [x] [EuroSAT](https://github.com/phelber/eurosat) @frgfm #5114
- [x] [GSTRB](https://paperswithcode.com/dataset/gtsrb). The homepage is timing out for me, but download links can be found [here](https://github.com/facebookresearch/vissl/blob/484cdecd1a71cb457d8ea74942603b907a23d39d/extra_scripts/datasets/create_gtsrb_data_files.py#L43-L53) @sumukhaithal6 #5117
- [X] [PCAM](https://github.com/basveeling/pcam) @NicolasHug https://github.com/pytorch/vision/pull/5203
- [x] [Clevr Counts](https://cs.stanford.edu/people/jcjohns/clevr/). See also [here](https://github.com/facebookresearch/vissl/blob/484cdecd1a71cb457d8ea74942603b907a23d39d/extra_scripts/datasets/create_clevr_count_data_files.py#L53-L54) for what we exactly need @pmeier #5130
- [x] [FER2013](https://paperswithcode.com/dataset/fer2013) This is a Kaggle dataset, so I'm not sure we'll be able to support download ~(but maybe)~ @pmeier #5120
- [x] [Sun397](https://vision.princeton.edu/projects/2010/SUN/) @saswatpp #5132
- [x] [Country211](https://paperswithcode.com/dataset/country211). Apparently download link is [here](https://github.com/openai/CLIP/blob/main/data/country211.md) @puhuk #5138
- [x] [Rendered SST2](https://github.com/openai/CLIP/commit/efe8cbbdf3e594999706558de021de55e04cab5f) @jdsgomes  #5220

CC-ing @pmeier and @jdsgomes as previously discussed. We're on a **fairly short timeline** for this work, and ideally we would get all these in by **end of January 2022.**
I'm also wondering whether this is something that our open source contributors  @oke-aditya @frgfm @zhiqwang   could be interested in 🚀 ?

### Implementing a new dataset

Implementing a dataset consists of 2 main things:

- The dataset class with a `root`, `split`, `transform` and `target_transform` parameter. When available we should also support a `download` parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the [MNIST](https://github.com/pytorch/vision/blob/eac3dc7bab436725b0ba65e556d3a6ffd43c24e1/torchvision/datasets/mnist.py#L19:L19) class
- A test class which will generate automatic tests, e.g. [this one for MNIST](https://github.com/pytorch/vision/blob/eac3dc7bab436725b0ba65e556d3a6ffd43c24e1/test/test_datasets.py#L1393:L1393).


If there's some ambiguity in the choices to make, the reference to follow is the [VISSL](https://github.com/facebookresearch/vissl/tree/main/extra_scripts/datasets) where most of these datasets are already supported. 

### For contritbutors

If you're interesting in taking one of the datasets above, please comment below with "I'm working on dataset X" so that others don't pick the same! :)

cc @pmeier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New classification datasets support for FLAVA #5108

Implementing a new dataset

For contritbutors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New classification datasets support for FLAVA #5108

Description

Implementing a new dataset

For contritbutors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions