add prototypes for `Caltech(101|256)` datasets #4510

pmeier · 2021-09-30T11:29:59Z

pmeier · 2021-09-30T11:31:26Z

setup.py

        packages=find_packages(exclude=('test',)),
        package_data={
-            package_name: ['*.dll', '*.dylib', '*.so']
+            package_name: ['*.dll', '*.dylib', '*.so', '*.categories']


We need to some file type here, so that these file types are packaged with everything else. They are plain text files, but I've opted to give them a "custom" extension to not accidentally include anything else.

fmassa

LGTM, thanks!

The only minor thing I would change is the np import, otherwise good to merge. Do you prefer to change that in a follow-up PR or directly here?

fmassa · 2021-09-30T11:37:28Z

torchvision/prototype/datasets/_builtin/caltech.py

+from typing import Any, Callable, Dict, List, Optional, Tuple, Union
+import re
+
+import numpy as np


Do we need numpy here? You only use it to cast the array to int64, which you can also do in numpy with a string (or directly add a dtype in torch.as_tensor / torch.tensor)

Unfortunately, we do. The box_coord array has dtype uint16, which torch.as_tensor cannot handle. So either we use numpy to convert to a common dtype first, or we could also convert the values to a list and build a tensor from that. What would you prefer? In any case I'll add a comment explaining what is going on.

fmassa · 2021-09-30T11:39:46Z

torchvision/prototype/datasets/_builtin/caltech.py

+
+        image = decoder(image_buffer) if decoder else image_buffer
+
+        ann = read_mat(ann_buffer)


Does ann contain anything else or just box_coord and obj_contour?

Just box_coord and obj_contour

fmassa · 2021-09-30T11:41:49Z

torchvision/prototype/datasets/_builtin/caltech.py

+        anns_dp = TarArchiveReader(anns_dp)
+        anns_dp = Filter(anns_dp, self._is_ann)
+
+        dp = KeyZipper(


For my understanding: is this efficient compared to our current Caltech implementation based on the old-style datasets?

I'm not sure how to answer if this is efficient or not.

Without the Shuffler we could use Zipper here and this is basically what we are doing now. If both datapipes, i.e. images_dp and anns_dp, are already aligned, KeyZipper only adds the overhead of evaluating key_fn and ref_key_fn for every sample.

~~As an alternative we could put the Shuffler after we use a Zipper and remove the KeyZipper all together.~~ I was wrong: the two archives are not perfectly aligned so we need the KeyZipper anyway.

fmassa · 2021-09-30T11:43:25Z

torchvision/prototype/datasets/_builtin/caltech.py

+        create_categories_file(HERE, self.name, categories)
+
+
+if __name__ == "__main__":


Question: do we want to keep this for all datasets?

The idea behind this is that we should keep track of how we generated the category files. With this it is easy to regenerate if we need to. Just call python -m torchvision.prototype.datasets._builtin.caltech. Maybe we could also do an aggregation to be able to generate "everything", but I would keep that for the future.

Summary: * add prototype for `Caltech256` dataset * silence mypy Reviewed By: prabhat00155, NicolasHug Differential Revision: D31309545 fbshipit-source-id: b1b597c2ac152a77fa5c80f361359d993212ce47

* add prototype for `Caltech256` dataset * silence mypy

add prototype for Caltech256 dataset

36902b1

pmeier added module: datasets prototype labels Sep 30, 2021

pmeier requested a review from fmassa September 30, 2021 11:29

pmeier commented Sep 30, 2021

View reviewed changes

facebook-github-bot added the cla signed label Sep 30, 2021

pmeier mentioned this pull request Sep 30, 2021

add prototype for CIFAR datasets #4511

Merged

fmassa approved these changes Sep 30, 2021

View reviewed changes

pmeier changed the title ~~add prototype for Caltech256 dataset~~ add prototypes for Caltech(101|256) datasets Sep 30, 2021

silence mypy

cd32de6

pmeier merged commit 055708d into pytorch:main Sep 30, 2021

pmeier deleted the datasets/caltech branch September 30, 2021 12:29

mszhanyi pushed a commit to mszhanyi/vision that referenced this pull request Oct 19, 2021

add prototypes for Caltech(101|256) datasets (pytorch#4510)

5e0dc2d

* add prototype for `Caltech256` dataset * silence mypy

cyyever pushed a commit to cyyever/vision that referenced this pull request Nov 16, 2021

add prototypes for Caltech(101|256) datasets (pytorch#4510)

094f15e

* add prototype for `Caltech256` dataset * silence mypy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add prototypes for `Caltech(101|256)` datasets #4510

add prototypes for `Caltech(101|256)` datasets #4510

Uh oh!

pmeier commented Sep 30, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

pmeier Sep 30, 2021

Uh oh!

fmassa left a comment

Uh oh!

fmassa Sep 30, 2021

Uh oh!

pmeier Sep 30, 2021

Uh oh!

fmassa Sep 30, 2021

Uh oh!

pmeier Sep 30, 2021 •

edited

Loading

Uh oh!

fmassa Sep 30, 2021

Uh oh!

pmeier Sep 30, 2021 •

edited

Loading

Uh oh!

fmassa Sep 30, 2021

Uh oh!

pmeier Sep 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		image = decoder(image_buffer) if decoder else image_buffer

		ann = read_mat(ann_buffer)

		create_categories_file(HERE, self.name, categories)


		if __name__ == "__main__":

add prototypes for Caltech(101|256) datasets #4510

add prototypes for Caltech(101|256) datasets #4510

Uh oh!

Conversation

pmeier commented Sep 30, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add prototypes for `Caltech(101|256)` datasets #4510

add prototypes for `Caltech(101|256)` datasets #4510

pmeier commented Sep 30, 2021 •

edited by pytorch-probot bot

Loading

pmeier Sep 30, 2021 •

edited

Loading

pmeier Sep 30, 2021 •

edited

Loading