Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check disk_usage before download #1551

Closed
Wauplin opened this issue Jul 11, 2023 · 3 comments
Closed

Check disk_usage before download #1551

Wauplin opened this issue Jul 11, 2023 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Jul 11, 2023

First mentioned by @julien-c in #1498 (comment).

huggingface_hub allows to download huge weight files from the Hub using snapshot_download and hf_hub_download. In some cases, the user might ran out of space on their machine. This will cause a OSError: [Errno 28] No space left on device while downloading. In order to prevent users from wasting time downloading a file they cannot save, we can check the disk usage beforehand. Python's shutil.disk_usage is able to check free space on disk. It is from the standard library and seems to be cross-platform (Unix + Windows) so it's convenient enough to trust it 😄 IMO, what we should do is to check free space before downloading a file and trigger a warning in case it's not enough. I wouldn't raise an exception as I'd prefer not to block downloads in case of weird machine setup.

There are 2 paths to check:

  • free space for the temporary file path where bytes are downloaded
  • free space for the folder where file is stored once downloaded

Both locations must be checked before download happens in hf_hub_download (~L1361).

I think checking at individual file level is good enough for now. We might think of having it in snapshot_download as well but that would require to implement some dry-run option first. Also on some Windows machines, symlinks are disabled and files can end up being duplicated. Let's not care about it here as it should remain a corner case.

@martinbrose
Copy link
Contributor

Hi @Wauplin,

I believe I have addressed this with PR #1590.

Please review and let me know if I'm checking at the right spot and if you prefer a different approach / structure.

Cheers

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 13, 2023

Thank for your PR @martinbrose! 🎉 I'm currently off but I'll review it next week.

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 28, 2023

Closed by @martinbrose in #1590 👍

@Wauplin Wauplin closed this as completed Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants