Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to assign classes for ImageFolder and DatasetFolder #2094

Closed
wants to merge 1 commit into from
Closed

Conversation

dwSun
Copy link
Contributor

@dwSun dwSun commented Apr 11, 2020

as mentioned in #1674, there should be a convenient way to assign classes and the order of them.

@fmassa
Copy link
Member

fmassa commented Apr 15, 2020

Thanks for the PR!

I second my point in #1674 (comment) -- you can have arbitrary mappings for the classes with target_transform, and I don't see a reason why it would be more confusing using it in target_transform than in the class itself.

As such, I'm closing the PR but let me know if you disagree

@fmassa fmassa closed this Apr 15, 2020
@dwSun
Copy link
Contributor Author

dwSun commented Apr 15, 2020

@fmassa
Lets make a comparison:
This is code with target_transform:

classes = ["cA1", "ca2", "cB1", "cb2", "cC1", "cc2"]
classes_to_idx = {v: k for k, v in enumerate(classes)}

# pth_classes will be ['cA1', 'cB1', 'cC1', 'ca2', 'cb2', 'cc2']
# These 3 lines of codes assume users already known
# how pytorch organize their classes, what if they don't?
# The document of ImageFolder doesn't explain how it works.
pth_classes = classes[:]
pth_classes.sort()
pth_classes_to_idx = {v: k for k, v in enumerate(pth_classes)}

def target_transform(pth_idx):
    return classes_to_idx(pth_classes[pth_idx])

dataset = ImageFolder("/path/to/root", target_transform=target_transform)

And this is my version with classes as a optional parameter:

classes = ["cA1", "ca2", "cB1", "cb2", "cC1", "cc2"]
dataset = ImageFolder("/path/to/root", classes=classes)

Actually, it took me about 10Mins to track down ImageFolder source code to figure out how it works, and another 20~30Mins to figure out how I can use target_transform to make my model training works as I expected.

@dwSun dwSun changed the title Add ability to assign classes for ImageFolder and DatasetFolde Add ability to assign classes for ImageFolder and DatasetFolder Apr 15, 2020
@fmassa
Copy link
Member

fmassa commented Apr 15, 2020

@dwSun we can improve the documentation of DatasetFolder to clarify that the classes are sorted alphabetically. Could you send a PR?

Then, your target_transform becomes:

class TargetMapper:
    def __init__(self, classes):
        self.classes_to_idx = {v: k for k, v in enumerate(classes)}
        self.pth_classes_to_idx = {v: k for k, v in enumerate(sorted(classes))}

    def __call__(self, target):
        return self.classes_to_idx[self.pth_classes[target]]

target_transform = TargetMapper(['c', 'a', 'b'])

And if you want to change the ordering for some reason, you have full flexibility.

@dwSun
Copy link
Contributor Author

dwSun commented Apr 16, 2020

@fmassa surely we should improve the documentation of DatasetFolder and ImageFolder.
But code with target_transform still involve some skill, it is not aprarently appliable for every user, especially some users who are new to pytorch and python. An optional more convenient way should be helpful for them.

just thinking usage should be more straightforward and simple, I personally prefer this way.

@fmassa
Copy link
Member

fmassa commented Apr 16, 2020

@dwSun I understand your point, but there are several corner cases with passing classes the way you did that IMO don't have a "preferred" solution:

  1. what if classes is a subset of the folders available? Should we pick the subset or raise an error?
  2. what if classes is a superset of the folders available? Should we skip the inexistent folders or raise an error during construction?

In the current implementation, the error of type 2. is only raised during runtime, and it can take a long time before the dataloader gets there (while it would have been preferable to alert the user right during dataset construction).

Plus, the current implementation adds a new (implicit?) behavior to FolderDataset: you can now select a subset of the folders (classes) in it. Is this desired?

For all of those reasons, I think that if the user wants to re-map the classes of the dataset, he is responsible of doing that, as those decisions are pushed to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants