-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Make CUB200 labels 0-indexed. #6702
Conversation
CUB200 dataset in `torchvision.prototype.datasets` module formed labels using file paths. This resulted in labels being 1-indexed (1-200) instead of 0-indexed (0-199). Similar issue occurred with Flowers102 (`torchvision.datasets` module, #5766).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kdexd!
The pleasure is mine! On a related note, my ongoing project requires evaluating large vision models on a suite of ~25 image classification datasets, specifically a big subset of datasets used to evaluate CLIP models (see Figure below). Some datasets are not yet implemented in class TorchvisionWrapIterDataPipe(dpi.IterDataPipe):
def __init__(self, name: str, root: str | Path, split: str, **kwargs):
# Get the dataset from `torchvision.datasets` module.
DatasetClass = getattr(torchvision.datasets, name)
self._inner = DatasetClass(str(root), split, download=True, **kwargs)
# Wrap the dataset as a MapDataPipe, then convert to iterable.
_dp = MapToIterConverter(SequenceWrapper(self._inner))
self._dp = hint_sharding(hint_shuffling(_dp))
def __len__(self):
return len(self._inner)
def __iter__(self):
for image, label in self._dp:
yield {"image": image, "label": label}
FGVCAircraft = partial(TorchvisionWrapIterDataPipe, "FGVCAircraft")
Flowers102 = partial(TorchvisionWrapIterDataPipe, "Flowers102")
STL10 = partial(TorchvisionWrapIterDataPipe, "STL10", folds=0)
RenderedSST2 = partial(TorchvisionWrapIterDataPipe, "RenderedSST2") Some other datasets like |
Thanks a lot for the interest in contributing. Could you open an issue so this won't get lost? As of now, our new API is not finalized and thus it makes little sense to port more datasets now. However, we are working hard to get this done and would notify you when we can start adding new datasets. |
Summary: CUB200 dataset in `torchvision.prototype.datasets` module formed labels using file paths. This resulted in labels being 1-indexed (1-200) instead of 0-indexed (0-199). Similar issue occurred with Flowers102 (`torchvision.datasets` module, #5766). Reviewed By: datumbox Differential Revision: D40138731 fbshipit-source-id: ce42adf4a3ae8e25110db06f2421b24c5169cfc4
Bug description:
CUB200 dataset in
torchvision.prototype.datasets
module formed labels using file paths. This resulted in labels being 1-indexed (1-200) instead of 0-indexed (0-199). A related issue from the past was with Flowers102 (torchvision.datasets
module, #5766).This PR simply shifts the labels and makes them zero-indexed for consistency with other datasets.