Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add PyTorch image classification example #13134

Merged
merged 18 commits into from
Sep 2, 2021

Conversation

nateraw
Copy link
Contributor

@nateraw nateraw commented Aug 15, 2021

What does this PR do?

Adds PyTorch image classification example. For now, it uses torchvision.datasets.ImageFolder to load local image folders (just like the flax image classification example). In the future, we will switch to using the datasets package's image folder (once it exists).

Marking as draft for now as I'm still working through cleaning up changes I made from this example I wrote earlier that uses datasets instead.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@nateraw nateraw changed the title Add PyTorch image classification example [WIP] Add PyTorch image classification example Aug 15, 2021
@nateraw nateraw marked this pull request as ready for review August 16, 2021 04:53
@nateraw nateraw changed the title [WIP] Add PyTorch image classification example Add PyTorch image classification example Aug 16, 2021
@NielsRogge
Copy link
Contributor

Nice!! Relevant for #13080

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this! I left a few comments.

examples/pytorch/vision/requirements.txt Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
examples/pytorch/vision/run_image_classification.py Outdated Show resolved Hide resolved
@nateraw nateraw force-pushed the torch-image-classification-ex branch from 459e24e to b61ab72 Compare August 31, 2021 06:30
@NielsRogge
Copy link
Contributor

I'll review this PR in detail (thanks for working on this!). Regarding the fixtures for the tests, I've recently moved these files to the hf-internal-testing organization on the hub. This makes it more clear, as otherwise these fixture files are also downloaded when people do a git clone of the library.

run_image_classification.py
--output_dir {tmp_dir}
--model_name_or_path google/vit-base-patch16-224-in21k
--train_dir tests/fixtures/tests_samples/cats_and_dogs/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the cats and dogs as a dataset to the hub under the hf-internal-testing organization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but this is actually testing the fact that the script works on local image folders. I can issue another PR that directly pulls them down into an imagefolder-like cache dir to test. But I'll leave this as-is for now if its not a big deal.

"value if set."
},
)
image_size: Optional[int] = field(default=224, metadata={"help": " The size (resolution) of each image."})
Copy link
Contributor

@NielsRogge NielsRogge Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will images be squared? Or is it just the smaller edge of the image that will be matched to this number (which torchvision's Resize does if you only provide an integer)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, this value is passed directly to torchvision's Resize

Comment on lines 86 to 90
train_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the training data."})
validation_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the validation data."})
train_val_split: Optional[float] = field(
default=0.15, metadata={"help": "Percent to split off of train for validation."}
)
Copy link
Contributor

@NielsRogge NielsRogge Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so you have a training dataset, a validation dataset and a test set. The validation_dir is actually the test set? And it only makes sense to add a train_val_split if you don't provide a validation dataset yourself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no test set here. There's just train + validation. If 'validation' key is not found in dataset, we create a split off of train for validation and set it to the 'validation' key of the dataset dict.

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Just some small comments.

Most importantly, it would be great to remove the fixtures files from this PR in favor of a HuggingFace dataset.

@sgugger
Copy link
Collaborator

sgugger commented Sep 2, 2021

Last nit on my side: can we move the vision folder to be image-classification? We will have other kinds of vision examples in the future.

@nateraw nateraw force-pushed the torch-image-classification-ex branch from 21231cb to d979eeb Compare September 2, 2021 19:08
@nateraw nateraw changed the title Add PyTorch image classification example ✨Add PyTorch image classification example Sep 2, 2021
@nateraw nateraw changed the title ✨Add PyTorch image classification example ✨ Add PyTorch image classification example Sep 2, 2021
@nateraw
Copy link
Contributor Author

nateraw commented Sep 2, 2021

Ok, addressed most of the comments. Merging as-is for now.

@NielsRogge I did not address these two items, however I can in future PRs (if need be):

  • Adding test data to datasets library.
  • Adjusting train/validation/test split logic.

@nateraw nateraw merged commit 76c4d8b into huggingface:master Sep 2, 2021
@nateraw nateraw deleted the torch-image-classification-ex branch September 2, 2021 21:16
@Hecim1984
Copy link

13134

@nateraw nateraw mentioned this pull request Sep 7, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants