New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control which files get moved to cache with gr.set_static_paths #7618
Conversation
🪼 branch checks and previews
Install Gradio from this PR pip install https://gradio-builds.s3.amazonaws.com/d0b054cf9aabcb1d4ad53d8670d2146ad6e8ce2a/gradio-4.20.1-py3-none-any.whl Install Gradio Python Client from this PR pip install "gradio-client @ git+https://github.com/gradio-app/gradio@d0b054cf9aabcb1d4ad53d8670d2146ad6e8ce2a#subdirectory=client/python" |
🦄 change detectedThis Pull Request includes changes to the following packages.
With the following changelog entry.
Maintainers or the PR author can modify the PR title to modify this entry.
|
Hmm instead of introducing additional API could we try the approach of determining this automatically based on whether the files are ever used as an input? I think exposing an API for file management is pretty confusing, and could lead to unintended side effects if done incorrectly. Whereas this is something we should be able to take care for the developer |
2dd88d1
to
5262a4a
Compare
I don't think this will cover the case of gr.Examples or the case where you don't want the default value of the component to be copied over but the component is still used in as input (example code in the PR body) . Also in order to determine if a component is ever used as an input we'd have to wait for all of the events to be defined. At that point the files are already copied to the cache and it's possible you OOM beforehand (rare but possible).
Can you elaborate? |
It would allow people to overwrite files that they shouldn't be overwriting. For example, say you have this demo, and you don't want to copy over import gradio as gr
gr.StaticFiles(paths=["video.mp4"])
def trim_video(input_file):
command = [
'ffmpeg',
'-i', input_file,
'-ss', "00:00:10",
'-t', "00:00:20",
'-c', 'copy',
'video.mp4'
]
subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
demo = gr.Interface(
trim_video,
gr.Video()
gr.Video(),
examples=["video.mp4"],
)
demo.launch() When this demo is used, because
Good point, hmm |
My other hesitation with this API is that it requires you to get the filenames exactly right in different places. If there's a typo in the filepaths in |
What if we instead added a parameter, For the OP in #5350, I think it would be easier than specifying 115 files or even doing a glob |
I tried the ffmpeg example but it wouldn't work because ffmpeg can't edit files in place. You're right that it's possible to overwrite an input file but I think you can already do that outside this PR. You can manually save an output file to the same location as the input file. The next time that file is used as input it won't be re-written to the cache so the next run of the function will use the output from the previous run. We haven't heard of this happening in the wild so I think we're ok to not worry about it.
You can pass in directory names to Interesting point about copy_files. My one hesitation is that it will always not copy over the data to the cache. I'm thinking of an example where you give lots of default data to a About to log off - let's synch tomorrow? |
That's true. Another idea, now that you mention (Sorry to keep suggesting different things, but I think this is more inline with our existing API) |
All good @abidlabs ! Yea I thought about that too and was what I was originally going to implement. But two points related to that approach that made me not implement it:
|
After sleeping on it, I agree @freddyaboulton with your proposed approach. All things considered, it'll probably be the most flexible approach and let us manage static files on an app-level as opposed to a component level like the Nit: I would consider renaming |
e3e6468
to
3c72e8d
Compare
118d652
to
437891b
Compare
Thanks for the feedback and discussion @abidlabs ! I renamed the helper to |
gradio/processing_utils.py
Outdated
@@ -262,6 +262,8 @@ def _move_to_cache(d: dict): | |||
# This makes it so that the URL is not downloaded and speeds up event processing | |||
if payload.url and postprocess and client_utils.is_http_url_like(payload.url): | |||
payload.path = payload.url | |||
elif utils.is_static_file(payload): | |||
payload.path = payload.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
payload.path = payload.path | |
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
gradio/data_classes.py
Outdated
|
||
def __init__(self, paths: list[str | pathlib.Path]) -> None: | ||
self.paths = paths | ||
self.all_paths.extend([pathlib.Path(p).resolve() for p in paths]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this .extend()
behavior a little unintuitive: I would have expected:
gr.set_static_paths([])
to clear the static paths. Especially since we call the method set...
, I think it should replace the previous paths, not extend the previous paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea good point! The reason I decided to do it this way is in the case you're nesting various blocks to make a single demo. I think we can have it replace the previous paths and add an api to extend previous paths later based on user request.
@document() | ||
def set_static_paths(paths: list[str | Path]) -> None: | ||
""" | ||
Set the static paths to be served by the gradio app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand (and testing the code), this method sets the static_paths for all Gradio apps defined until that Python session ends (or set_static_paths
is called again(. For example, if you have multiple Gradio apps in one jupyter notebook, this will affect all of them. We should explicitly describe this behavior here imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea good call!
Very nice and clean implementation @freddyaboulton! I left some suggestions above and we should also add static paths as a 4th kind of files that are served to the user here:
Otherwise lgtm! |
Thanks @abidlabs ! Went ahead and made those changes |
""" | ||
Returns True if the file is a static file (i.e. is is in the static files list). | ||
""" | ||
if not isinstance(file_path, (str, Path, FileData)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I missed something, file_path
is always a FileData
in all invocations of _is_static_file()
so the other cases are not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🔥
Description
Closes: #5350
This PR adds
gr.set_static_paths
, which takes a list of directory or filenames represented as strings orpathlib.Path
objects. Files ingr.set_static_paths
are not moved to the cache if they are encountered during pre/post-process or examples.Example Usage
Set the GRADIO_TEMP_DIR env variable to manually check nothings gets saved during the cache as the interface launches. In this particular example, output files will be saved to the cache.
Alternative APIs I considered before choosing
gr.StaticFiles
gr.StaticFile
. Similar to (Files should now be supplied asfile(...)
in the Client, and some fixes togr.load()
as well #7575). This adds another possible typehint to the init method which is already complicated enough. Also need to make sure to appropriate wrap/unwrap this type everywhere which is not ideal.static_files
parameter togr.Blocks
. This was what I originally thought of doing but won't handle the case of components being instantiated with a value outside of a blocks scope, like ingr.Interface
.🎯 PRs Should Target Issues
Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.
Not adhering to this guideline will result in the PR being closed.
Tests
PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests:
bash scripts/run_all_tests.sh
You may need to run the linters:
bash scripts/format_backend.sh
andbash scripts/format_frontend.sh