Skip to content

Conversation

detkov
Copy link
Contributor

@detkov detkov commented Apr 13, 2022

There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose annotation_level to be variant, then unique number of classes is going to be not 102, but 100, and with manufacturer level it is not 41, but 30.

Initially was explored in issue #5809.

There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose `annotation_level` to be `variant`, then unique number of classes is going to be not 102, but 100, and with `manufacturer` level it is not 41, but 30.
"""`FGVC Aircraft <https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/>`_ Dataset.
The dataset contains 10,200 images of aircraft, with 100 images for each of 102
The dataset contains 10,000 images of aircraft, with 100 images for each of 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy, they even got the number of samples wrong on their website? 🙄 Thanks for checking and fixing!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the website is reporting the original values, are we doing something wrong in the code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no:

$ cat variants.txt | wc -l
100
$ cat families.txt | wc -l
70
$ cat manufacturers.txt | wc -l
30
$ ls images | wc -l
10000

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton @detkov. LGTM!

@pmeier pmeier linked an issue Apr 13, 2022 that may be closed by this pull request
@pmeier pmeier requested a review from NicolasHug April 13, 2022 10:10
@pmeier pmeier mentioned this pull request Apr 14, 2022
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @detkov and @pmeier

@NicolasHug NicolasHug merged commit 10bb5b1 into pytorch:main Apr 14, 2022
@github-actions
Copy link

Hey @NicolasHug!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request May 5, 2022
Summary:
There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose `annotation_level` to be `variant`, then unique number of classes is going to be not 102, but 100, and with `manufacturer` level it is not 41, but 30.

Reviewed By: jdsgomes, NicolasHug

Differential Revision: D36095715

fbshipit-source-id: b96d8f1b9abbfff8091380f056707af1255bbf22

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Number of classes in FGVCAircraft differs from documentation
4 participants