Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control which files get moved to cache with gr.set_static_paths #7618

Merged
merged 10 commits into from Mar 7, 2024

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Mar 6, 2024

Description

Closes: #5350

This PR adds gr.set_static_paths, which takes a list of directory or filenames represented as strings or pathlib.Path objects. Files in gr.set_static_paths are not moved to the cache if they are encountered during pre/post-process or examples.

Example Usage

Set the GRADIO_TEMP_DIR env variable to manually check nothings gets saved during the cache as the interface launches. In this particular example, output files will be saved to the cache.

import gradio as gr

gr.StaticFiles(paths=["test/test_files/cheetah1.jpg", "test/test_files/bus.png"])

demo = gr.Interface(
    lambda s: s.rotate(45),
    gr.Image(value="test/test_files/cheetah1.jpg", type="pil"),
    gr.Image(),
    examples=["test/test_files/bus.png"],
)

demo.launch()

Alternative APIs I considered before choosing gr.StaticFiles

  • Wrap the value you don't want to move into the cache with a class like gr.StaticFile. Similar to (Files should now be supplied as file(...) in the Client, and some fixes to gr.load() as well #7575). This adds another possible typehint to the init method which is already complicated enough. Also need to make sure to appropriate wrap/unwrap this type everywhere which is not ideal.
  • Add a static_files parameter to gr.Blocks. This was what I originally thought of doing but won't handle the case of components being instantiated with a value outside of a blocks scope, like in gr.Interface.

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Tests

  1. PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests: bash scripts/run_all_tests.sh

  2. You may need to run the linters: bash scripts/format_backend.sh and bash scripts/format_frontend.sh

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Mar 6, 2024

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-builds.s3.amazonaws.com/d0b054cf9aabcb1d4ad53d8670d2146ad6e8ce2a/gradio-4.20.1-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@d0b054cf9aabcb1d4ad53d8670d2146ad6e8ce2a#subdirectory=client/python"

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Mar 6, 2024

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
gradio patch
gradio_client patch
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Control which files get moved to cache with gr.set_static_paths

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

Hmm instead of introducing additional API could we try the approach of determining this automatically based on whether the files are ever used as an input?

I think exposing an API for file management is pretty confusing, and could lead to unintended side effects if done incorrectly. Whereas this is something we should be able to take care for the developer

@freddyaboulton
Copy link
Collaborator Author

automatically based on whether the files are ever used as an input

I don't think this will cover the case of gr.Examples or the case where you don't want the default value of the component to be copied over but the component is still used in as input (example code in the PR body) . Also in order to determine if a component is ever used as an input we'd have to wait for all of the events to be defined. At that point the files are already copied to the cache and it's possible you OOM beforehand (rare but possible).

I think exposing an API for file management is pretty confusing, and could lead to unintended side effects if done incorrectly.

Can you elaborate?

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

Can you elaborate?

It would allow people to overwrite files that they shouldn't be overwriting. For example, say you have this demo, and you don't want to copy over video.mp4, so you make it a static file:

import gradio as gr

gr.StaticFiles(paths=["video.mp4"])

def trim_video(input_file):
    command = [
        'ffmpeg',
        '-i', input_file,     
        '-ss', "00:00:10",    
        '-t', "00:00:20",       
        '-c', 'copy',         
        'video.mp4'           
    ]
    subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    

demo = gr.Interface(
    trim_video,
    gr.Video()
    gr.Video(),
    examples=["video.mp4"],
)

demo.launch()

When this demo is used, because video.mp4 is also the name of the output file of your function, it wouldn't get moved to cache, and IIUC it would overwrite the example video. Lmk if I'm misunderstanding!

I don't think this will cover the case of gr.Examples or the case where you don't want the default value of the component to be copied over but the component is still used in as input (example code in the PR body)

Good point, hmm

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

My other hesitation with this API is that it requires you to get the filenames exactly right in different places. If there's a typo in the filepaths in gr.StaticFiles(), or if you update the filepath in gr.Examples but not in gr.StaticFiles(), you'll have unexpected behavior

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

What if we instead added a parameter, copy_files=True to gr.Examples and to every Component? Basically it would control whether files generated by that component were copied over to the cache or whether they remained as is (with a warning telling users that reusing files can be problematic if the same file is being modified by the function across runs)

For the OP in #5350, I think it would be easier than specifying 115 files or even doing a glob

@freddyaboulton
Copy link
Collaborator Author

I tried the ffmpeg example but it wouldn't work because ffmpeg can't edit files in place. You're right that it's possible to overwrite an input file but I think you can already do that outside this PR. You can manually save an output file to the same location as the input file. The next time that file is used as input it won't be re-written to the cache so the next run of the function will use the output from the previous run. We haven't heard of this happening in the wild so I think we're ok to not worry about it.

For the OP in #5350, I think it would be easier than specifying 115 files or even doing a glob
My other hesitation with this API is that it requires you to get the filenames exactly right in different places

You can pass in directory names to StaticFiles so you don't have to specify file names exactly. I don't think the typo risk is greater than with allowed_paths, blocked_paths.

Interesting point about copy_files. My one hesitation is that it will always not copy over the data to the cache. I'm thinking of an example where you give lots of default data to a gr.Gallery. It may be fine for the default value to not be copied over to the cache but for user-submitted input/output you may want to copy over to the cache.

About to log off - let's synch tomorrow?

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

Interesting point about copy_files. My one hesitation is that it will always not copy over the data to the cache. I'm thinking of an example where you give lots of default data to a gr.Gallery. It may be fine for the default value to not be copied over to the cache but for user-submitted input/output you may want to copy over to the cache.

That's true. Another idea, now that you mention allowed_paths and blocked_paths, what about adding static_paths as a parameter in launch()? These paths would not get copied over to the cache.

(Sorry to keep suggesting different things, but I think this is more inline with our existing API)

@freddyaboulton
Copy link
Collaborator Author

All good @abidlabs ! Yea I thought about that too and was what I was originally going to implement. But two points related to that approach that made me not implement it:

  • The parameter should be in the init as opposed to launch because launch happens after all the components are defined and they write to the cache.
  • it won't handle the case of components defined outside of a block scope, like the inputs parameter to a gr.Interface.

@abidlabs
Copy link
Member

abidlabs commented Mar 6, 2024

After sleeping on it, I agree @freddyaboulton with your proposed approach. All things considered, it'll probably be the most flexible approach and let us manage static files on an app-level as opposed to a component level like the copy_files parameter, which seems more appropriate.

Nit: I would consider renaming gr.StaticFiles to gr.set_static_paths(). The former seems like a component to me, whereas the latter indicates to me that we are setting a property that will apply for the rest of the session.

@freddyaboulton freddyaboulton marked this pull request as ready for review March 6, 2024 23:46
@freddyaboulton freddyaboulton changed the title Control which files get moved to cache with gr.StaticFiles Control which files get moved to cache with gr.set_static_paths Mar 6, 2024
@freddyaboulton
Copy link
Collaborator Author

Thanks for the feedback and discussion @abidlabs ! I renamed the helper to set_static_paths. Should be good for a formal review.

@@ -262,6 +262,8 @@ def _move_to_cache(d: dict):
# This makes it so that the URL is not downloaded and speeds up event processing
if payload.url and postprocess and client_utils.is_http_url_like(payload.url):
payload.path = payload.url
elif utils.is_static_file(payload):
payload.path = payload.path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload.path = payload.path
pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


def __init__(self, paths: list[str | pathlib.Path]) -> None:
self.paths = paths
self.all_paths.extend([pathlib.Path(p).resolve() for p in paths])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this .extend() behavior a little unintuitive: I would have expected:

gr.set_static_paths([])

to clear the static paths. Especially since we call the method set..., I think it should replace the previous paths, not extend the previous paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good point! The reason I decided to do it this way is in the case you're nesting various blocks to make a single demo. I think we can have it replace the previous paths and add an api to extend previous paths later based on user request.

@document()
def set_static_paths(paths: list[str | Path]) -> None:
"""
Set the static paths to be served by the gradio app.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand (and testing the code), this method sets the static_paths for all Gradio apps defined until that Python session ends (or set_static_paths is called again(. For example, if you have multiple Gradio apps in one jupyter notebook, this will affect all of them. We should explicitly describe this behavior here imo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good call!

@abidlabs
Copy link
Member

abidlabs commented Mar 7, 2024

Very nice and clean implementation @freddyaboulton! I left some suggestions above and we should also add static paths as a 4th kind of files that are served to the user here:

In particular, Gradio apps ALLOW users to access to three kinds of files:

Otherwise lgtm!

@freddyaboulton
Copy link
Collaborator Author

Thanks @abidlabs ! Went ahead and made those changes

"""
Returns True if the file is a static file (i.e. is is in the static files list).
"""
if not isinstance(file_path, (str, Path, FileData)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I missed something, file_path is always a FileData in all invocations of _is_static_file() so the other cases are not necessary.

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🔥

@freddyaboulton freddyaboulton merged commit 0ae1e44 into main Mar 7, 2024
7 checks passed
@freddyaboulton freddyaboulton deleted the 5350-allow-not-saving-cache branch March 7, 2024 20:39
@pngwn pngwn mentioned this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Examples consisting of Video components fill up /tmp
3 participants