-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🚀 The feature
(First issue/feature request, tried my best to follow the guidelines, apologies if I missed something).
In torchvision.datasets.folder.make_dataset
, we are given the option to use is_valid_file
(or extensions
).
My feature request is to allow is_valid_file
to get the whole path to the file rather than just the filename.
Motivation, pitch
Currently, if we wish to use is_valid_file
, we can only act on the filename without getting the whole path to the file, which means it's currently particularly tricky to open the file and verify whether it meets certain criteria and decide whether it's valid.
Perhaps this was not the intended function initially, but it seems there's an opportunity of improving the possibilities of is_valid_file
by providing it the whole path rather than just the filename.
My exact implementation idea would be the following:
Replace the following snippet:
for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
for fname in sorted(fnames):
if is_valid_file(fname):
path = os.path.join(root, fname)
item = path, class_index
instances.append(item)
with:
for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
for fname in sorted(fnames):
path = os.path.join(root, fname)
if is_valid_file(path):
item = path, class_index
instances.append(item)
This change might break retro-compatibility for users who make use of is_valid_file
, however their fix would be particularly simple as they could add the following line in their is_valid_file
function:
root, fname= os.path.split(path)
where path
is the variable for is_valid_file
and they could continue using fname
in their function (or however they've named it) as previously.
Alternatives
No response
Additional context
One example of how this could be useful is if one wants to use pictures which are meeting certain resolution, channels, or other criteria, in a configurable way, where the criteria could be coded in the is_valid_file
function, without having to delete or move files.