Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validate url #2957

Merged
merged 7 commits into from
Jan 18, 2023
Merged

Fix validate url #2957

merged 7 commits into from
Jan 18, 2023

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Jan 8, 2023

Description

ImgSerializable.serialize would fail on an image url because Path(load_dir) / x would strip out some of the slashes in the url which would cause the url to be treated as a file.

Doing Path(load_dir) / x basically assumes x is a filepath instead of a url but that has not been determined at the time that line is reached so I refactored the logic a bit so that Path(load_dir) / x is only called if x is not a url.

This comes at the cost of two get requests to the external url but the benefit is that we don't have to modify any of the public apis of our util functions.

Checklist:

  • I have performed a self-review of my own code
  • I have added a short summary of my change to the CHANGELOG.md
  • My code follows the style guidelines of this project
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

A note about the CHANGELOG

Hello 👋 and thank you for contributing to Gradio!

All pull requests must update the change log located in CHANGELOG.md, unless the pull request is labeled with the "no-changelog-update" label.

Please add a brief summary of the change to the Upcoming Release > Full Changelog section of the CHANGELOG.md file and include
a link to the PR (formatted in markdown) and a link to your github profile (if you like). For example, "* Added a cool new feature by [@myusername](link-to-your-github-profile) in [PR 11111](https://github.com/gradio-app/gradio/pull/11111)".

If you would like to elaborate on your change further, feel free to include a longer explanation in the other sections.
If you would like an image/gif/video showcasing your feature, it may be best to edit the CHANGELOG file using the
GitHub web UI since that lets you upload files directly via drag-and-drop.

@gradio-pr-bot
Copy link
Collaborator

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2957-all-demos

@freddyaboulton freddyaboulton force-pushed the fix-validate-url branch 2 times, most recently from e344cfd to 756a7e9 Compare January 16, 2023 17:57
@freddyaboulton freddyaboulton marked this pull request as ready for review January 16, 2023 18:09
@abidlabs
Copy link
Member

This works great @freddyaboulton, nice catch!

Regarding the speed issue, we could change the validate_url method to use requests.head() instead of requests.get() so that it does not download the whole file when checking if the file exists:

def validate_url(possible_url: str) -> bool:
    headers = {"User-Agent": "gradio (https://gradio.app/; team@gradio.app)"}
    try:
        return requests.head(possible_url, headers=headers).ok
    except Exception:
        return False

Thoughts? A quick benchmark shows that the time to run validate_url on a 5 MB file drops from 1s to 0.3s if we use head():

import requests
import time

t = time.time()
headers = {"User-Agent": "gradio (https://gradio.app/; team@gradio.app)"}
requests.get("https://edmullen.net/test/rc.jpg", headers=headers).ok
print(time.time() - t)

t = time.time()
requests.head("https://edmullen.net/test/rc.jpg", headers=headers).ok
print(time.time() - t)

@freddyaboulton
Copy link
Collaborator Author

Thanks for the suggestion @abidlabs !

The one snag is that the gradio file route does not support head requests. I made that addition to the routes.py file but that means that we will first try head and if that's not supported we'll try with get.

Let me know what you think before I merge!

@abidlabs
Copy link
Member

Great catch @freddyaboulton, LGTM

@freddyaboulton
Copy link
Collaborator Author

Thank you for the re-review!

@freddyaboulton freddyaboulton merged commit cab8d88 into main Jan 18, 2023
@freddyaboulton freddyaboulton deleted the fix-validate-url branch January 18, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants