Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to preserve filenames in download_images #2983

Merged

Conversation

mess-lelouch
Copy link
Contributor

I apologize in advance if I missed a requirement for PR creation. I did my best to make sure I did not miss anything.

The motivation for this change is that I use this utility and the ImageClassifierCleaner widget as well and the following situation happened to me.

I have a list of URLs for class A and another list of URLs for class B. So I first call download_images for class A url list and then for class B url list to download the images.

download_images set numbers as names for the files it downloads. So we end up a 0001.jpg image labeled as A and a 0001.jpg image labeled as B (and so on). The problem that occurred to me is that if when inspecting the images with the cleaner widget, I ended up confirming that 0001.jpg on class A was really of class B, then moving the file to the directory B as it is done in chapter 2 of the book (https://github.com/fastai/fastbook/blob/master/02_production.ipynb) would throw an error because 0001.jpg already exists on directory B. The first solution that I thought was to preserve the filenames when downloading the images, which is what this PR enables. In summary, this PR adds a preserve_filename option to download_images, which is disabled by default and preserves the image file names.

I later realized that it would have been muuuuuuch easier to just rename the file when moving it from directory A to directory B... but I thought of that after I made this change locally.

As I described above, this change was not really necessary to solve my original problem, but since I already have done it I went ahead and created this PR. I also found someone that needed this feature in the past, so that also motivated me to go ahead https://forums.fast.ai/t/using-download-images-retaining-source-file-name/39463.
This is a change that I had locally and converting it into a PR did not required much effort. Still, if this does not sound like a good idea, do not hesitate to reject my PR. The learning experience was worth it :)

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jph00
Copy link
Member

jph00 commented Nov 14, 2020

I'd be happy to accept this PR, but it needs to check if the file already exists, and if so, should add an incrementing number to the end of the filename (before the suffix of course), until it finds a filename that doesn't exist. Otherwise we'll end up overwriting files.

If you're not interested in doing this, no problem, just let me know and I'll close the PR.

Copy link
Member

@jph00 jph00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an incrementing number to the end of the filename

@mess-lelouch
Copy link
Contributor Author

@jph00 sure, I will go ahead and add the suggested change

@jph00
Copy link
Member

jph00 commented Nov 17, 2020

Many thanks! :)

@jph00 jph00 merged commit d0e894b into fastai:master Nov 17, 2020
@jph00 jph00 changed the title Preserving filename downloads in download_images Option to preserve filenames in download_images Nov 17, 2020
@AshutoshRudraksh
Copy link

AshutoshRudraksh commented Oct 5, 2022

NameError: name 'download_images' is not defined
https://colab.research.google.com/drive/1batgL9t9THRTyo03S8qMphnNtLLkG63x#scrollTo=GMQ99cMe8j37&line=2&uniqifier=1
did I do something wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants