-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to preserve filenames in download_images
#2983
Option to preserve filenames in download_images
#2983
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I'd be happy to accept this PR, but it needs to check if the file already exists, and if so, should add an incrementing number to the end of the filename (before the suffix of course), until it finds a filename that doesn't exist. Otherwise we'll end up overwriting files. If you're not interested in doing this, no problem, just let me know and I'll close the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an incrementing number to the end of the filename
@jph00 sure, I will go ahead and add the suggested change |
Many thanks! :) |
download_images
NameError: name 'download_images' is not defined |
I apologize in advance if I missed a requirement for PR creation. I did my best to make sure I did not miss anything.
The motivation for this change is that I use this utility and the
ImageClassifierCleaner
widget as well and the following situation happened to me.I have a list of URLs for class A and another list of URLs for class B. So I first call
download_images
for class A url list and then for class B url list to download the images.download_images
set numbers as names for the files it downloads. So we end up a 0001.jpg image labeled as A and a 0001.jpg image labeled as B (and so on). The problem that occurred to me is that if when inspecting the images with the cleaner widget, I ended up confirming that 0001.jpg on class A was really of class B, then moving the file to the directory B as it is done in chapter 2 of the book (https://github.com/fastai/fastbook/blob/master/02_production.ipynb) would throw an error because 0001.jpg already exists on directory B. The first solution that I thought was to preserve the filenames when downloading the images, which is what this PR enables. In summary, this PR adds apreserve_filename
option todownload_images
, which is disabled by default and preserves the image file names.I later realized that it would have been muuuuuuch easier to just rename the file when moving it from directory A to directory B... but I thought of that after I made this change locally.
As I described above, this change was not really necessary to solve my original problem, but since I already have done it I went ahead and created this PR. I also found someone that needed this feature in the past, so that also motivated me to go ahead https://forums.fast.ai/t/using-download-images-retaining-source-file-name/39463.
This is a change that I had locally and converting it into a PR did not required much effort. Still, if this does not sound like a good idea, do not hesitate to reject my PR. The learning experience was worth it :)