-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Fix FGVCAircraft dataset documentation #5814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose `annotation_level` to be `variant`, then unique number of classes is going to be not 102, but 100, and with `manufacturer` level it is not 41, but 30.
"""`FGVC Aircraft <https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/>`_ Dataset. | ||
The dataset contains 10,200 images of aircraft, with 100 images for each of 102 | ||
The dataset contains 10,000 images of aircraft, with 100 images for each of 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, they even got the number of samples wrong on their website? 🙄 Thanks for checking and fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the website is reporting the original values, are we doing something wrong in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no:
$ cat variants.txt | wc -l
100
$ cat families.txt | wc -l
70
$ cat manufacturers.txt | wc -l
30
$ ls images | wc -l
10000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton @detkov. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @NicolasHug! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose `annotation_level` to be `variant`, then unique number of classes is going to be not 102, but 100, and with `manufacturer` level it is not 41, but 30. Reviewed By: jdsgomes, NicolasHug Differential Revision: D36095715 fbshipit-source-id: b96d8f1b9abbfff8091380f056707af1255bbf22 Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
There is a mistake in FGVCAircraft dataset documentation. If you load dataset and sum up 'trainval' and 'test' splits, there are not 10 200, but 10 000 images. Also mistake is in the number of classes: if we choose
annotation_level
to bevariant
, then unique number of classes is going to be not 102, but 100, and withmanufacturer
level it is not 41, but 30.Initially was explored in issue #5809.