Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing CelebA in docs #2107

Merged
merged 1 commit into from Apr 15, 2020
Merged

add missing CelebA in docs #2107

merged 1 commit into from Apr 15, 2020

Conversation

edgarriba
Copy link
Contributor

this PR add to docs the reference to CelebA dataset in torchvision.
code is here: https://github.com/pytorch/vision/blob/master/torchvision/datasets/celeba.py

but no reference for some reason

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Edgar!

@fmassa fmassa merged commit 1b9f251 into pytorch:master Apr 15, 2020
@edgarriba
Copy link
Contributor Author

@fmassa I observed a couple of things when trying to use this dataset. Hosting the data in gdrive completely blocks to use the dataset when the quota have exceed. This comes from authors website, not torchvision problem. However to make it more functional (from tv) would be great to provide an alternative or at least specify that in the docs offering alternative ways to use this dataset. In my case after spending some time trying to understand why I couldn't use it out of the box, I ended up downolading from other source and using ImageFolder.

@fmassa
Copy link
Member

fmassa commented Apr 15, 2020

We won't be hosting the dataset ourselves (it's outside of the scope of torchvision). If downloading from Google Drive is the only official way, then I think this is the way we should move forward.

If there are other official sources with the dataset, then it would be worth considering.

@edgarriba
Copy link
Contributor Author

gotcha, just making the point that giving good interfaces to some datasets in specific format but not assuring data accessibility could lead the API to be can of useless if I want to use it but I cannot access to the data. Talking after facing this issue in this specific dataset, don't have any other opinion for the others :)

we had similar discussions in kornia e.g. to host weights and stuff to assure that the user will always have a fully functional API.

@fmassa
Copy link
Member

fmassa commented Apr 15, 2020

I understand. The download functionality is meant to be a helper to get started, but if the user can't access the data because the original dataset disappeared (as was the case with ImageNet for example #1457), we remove the download functionality and let the user download it by themselves (if at all possible).

From the README of torchvision https://github.com/pytorch/vision#disclaimer-on-datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants