✨ Add PyTorch image classification example #13134

nateraw · 2021-08-15T20:22:40Z

What does this PR do?

Adds PyTorch image classification example. For now, it uses torchvision.datasets.ImageFolder to load local image folders (just like the flax image classification example). In the future, we will switch to using the datasets package's image folder (once it exists).

Marking as draft for now as I'm still working through cleaning up changes I made from this example I wrote earlier that uses datasets instead.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

NielsRogge · 2021-08-16T07:16:48Z

Nice!! Relevant for #13080

sgugger

Thanks a lot for adding this! I left a few comments.

examples/pytorch/vision/requirements.txt

examples/pytorch/vision/run_image_classification.py

NielsRogge · 2021-08-31T09:49:10Z

I'll review this PR in detail (thanks for working on this!). Regarding the fixtures for the tests, I've recently moved these files to the hf-internal-testing organization on the hub. This makes it more clear, as otherwise these fixture files are also downloaded when people do a git clone of the library.

NielsRogge · 2021-09-01T15:23:33Z

examples/pytorch/test_examples.py

+            run_image_classification.py
+            --output_dir {tmp_dir}
+            --model_name_or_path google/vit-base-patch16-224-in21k
+            --train_dir tests/fixtures/tests_samples/cats_and_dogs/


Can we add the cats and dogs as a dataset to the hub under the hf-internal-testing organization?

We can, but this is actually testing the fact that the script works on local image folders. I can issue another PR that directly pulls them down into an imagefolder-like cache dir to test. But I'll leave this as-is for now if its not a big deal.

examples/pytorch/vision/README.md

examples/pytorch/vision/run_image_classification.py

NielsRogge · 2021-09-01T15:38:35Z

examples/pytorch/vision/run_image_classification.py

+            "value if set."
+        },
+    )
+    image_size: Optional[int] = field(default=224, metadata={"help": " The size (resolution) of each image."})


Will images be squared? Or is it just the smaller edge of the image that will be matched to this number (which torchvision's Resize does if you only provide an integer)?

You're correct, this value is passed directly to torchvision's Resize

NielsRogge · 2021-09-01T15:42:50Z

examples/pytorch/vision/run_image_classification.py

+    train_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the training data."})
+    validation_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the validation data."})
+    train_val_split: Optional[float] = field(
+        default=0.15, metadata={"help": "Percent to split off of train for validation."}
+    )


Hmm so you have a training dataset, a validation dataset and a test set. The validation_dir is actually the test set? And it only makes sense to add a train_val_split if you don't provide a validation dataset yourself?

there's no test set here. There's just train + validation. If 'validation' key is not found in dataset, we create a split off of train for validation and set it to the 'validation' key of the dataset dict.

NielsRogge

Overall LGTM! Just some small comments.

Most importantly, it would be great to remove the fixtures files from this PR in favor of a HuggingFace dataset.

sgugger · 2021-09-02T15:33:26Z

Last nit on my side: can we move the vision folder to be image-classification? We will have other kinds of vision examples in the future.

examples/pytorch/vision/README.md

nateraw · 2021-09-02T19:29:21Z

Ok, addressed most of the comments. Merging as-is for now.

@NielsRogge I did not address these two items, however I can in future PRs (if need be):

Adding test data to datasets library.
Adjusting train/validation/test split logic.

Hecim1984 · 2021-09-04T00:47:16Z

13134

nateraw changed the title ~~Add PyTorch image classification example~~ [WIP] Add PyTorch image classification example Aug 15, 2021

nateraw marked this pull request as ready for review August 16, 2021 04:53

nateraw changed the title ~~[WIP] Add PyTorch image classification example~~ Add PyTorch image classification example Aug 16, 2021

LysandreJik requested review from sgugger and NielsRogge August 16, 2021 07:13

sgugger approved these changes Aug 30, 2021

View reviewed changes

nateraw force-pushed the torch-image-classification-ex branch from 459e24e to b61ab72 Compare August 31, 2021 06:30