Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tests for (Dataset|Image)Folder #3477

Merged
merged 16 commits into from
Mar 30, 2021
Merged

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Mar 1, 2021

The failing test is valid. We first _find_classes()

classes, class_to_idx = self._find_classes(self.root)

by scanning the root directory for subdirectories

classes = [d.name for d in os.scandir(dir) if d.is_dir()]

and afterwards collect all samples

samples = self.make_dataset(self.root, class_to_idx, extensions, is_valid_file)

In the case that a subdirectory contains no or no matching files, it is still listed in the classes attribute.


A brute force fix would be to re-iterate over the samples remove all classes that are not present. A more efficient way would be flag all classes that contain at least one sample during make_dataset and simply remove empty ones. But this would require to change the output which is BC breaking. Thoughts?

@fmassa
Copy link
Member

fmassa commented Mar 2, 2021

@pmeier I'm not sure I get this. Is this a bug in the dataset implementation or in the test?

In principle, having an empty folder in the root dir shouldn't be a problem, so I would say I'd change the test to take this into account.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 2, 2021

Depends. Do we allow entries in the classes attribute that do not have a single example? If yes, my assumptions were wrong and I can simply adapt the test. If no, this is a bug.

@pmeier pmeier mentioned this pull request Mar 2, 2021
@pmeier pmeier requested a review from fmassa March 2, 2021 11:58
@fmassa
Copy link
Member

fmassa commented Mar 3, 2021

Following the discussion we had for make_dataset in the other issue, we can move the assert of empty classes to make_dataset, and fix the test so that no class should be empty

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 3, 2021

Blocked by #3496.

empty_classes = available_classes - set(class_to_idx.keys())
empty_classes = set(class_to_idx.keys()) - available_classes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missed in #3496.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 25, 2021

This will close #963 since this adds tests for DatasetFolder which is currently our only untested dataset 🎉

@pmeier pmeier linked an issue Mar 25, 2021 that may be closed by this pull request
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa
Copy link
Member

fmassa commented Mar 25, 2021

There is a conflict, can you look into it?

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #3477 (c39c240) into master (7cc941f) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3477      +/-   ##
==========================================
- Coverage   79.48%   79.47%   -0.02%     
==========================================
  Files         105      105              
  Lines        9822     9822              
  Branches     1582     1582              
==========================================
- Hits         7807     7806       -1     
  Misses       1527     1527              
- Partials      488      489       +1     
Impacted Files Coverage Δ
torchvision/datasets/folder.py 77.55% <100.00%> (-1.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cc941f...c39c240. Read the comment docs.

@fmassa fmassa merged commit 20a771e into pytorch:master Mar 30, 2021
facebook-github-bot pushed a commit that referenced this pull request Apr 2, 2021
Summary:
* add tests for (Dataset|Image)Folder

* lint

* remove old tests

* cleanup

* more cleanup

* adapt tests

* fix make_dataset

* remove powerset

* readd import

Reviewed By: fmassa

Differential Revision: D27433923

fbshipit-source-id: 6ea3fb79f41e255045a642dcadedd8fa813e9dcc
@pmeier pmeier deleted the test-dataset-folder branch October 18, 2021 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for datasets
4 participants