New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/file uploader abstractions #7025
Conversation
465d08b
to
4430df9
Compare
738aef1
to
f683cd0
Compare
* Uploaded file manager protocol * Implement MemoryUploadFileManager * Fix failing tests, not related to file uploader * Add some tests for UploadFileRequestHandlerTest * formatter fixes * fix all tests in uploaded_file_request_handler.py * Tweak some comments and add some TODOs * Add another TODO --------- Co-authored-by: Karen Javadyan <kajarenc@gmail.com> Co-authored-by: Vincent Donato <vincent@streamlit.io>
…and FileUploadClient (#6754) * Start supporting configurable file upload URLs in StreamlitEndpoints and FileUploadClient * Clarify some comments and fix formatting * Remove widgetId from file upload related methods * Have csrfRequest assume it always gets an absolute URL
* Add new proto definitions for file upload URL requests/responses * Add server websocket handler for file_urls_request BackMsgs * Rename FileUrls* -> FileURLs* * Install uuid types * Mark resolver fields as readonly * Allow file URLs to be requested via websocket * Add temporary # type: ignore * Remove HTTP endpoints for fetching file URLs * Remove unused variable to appease eslint
* * add `file_delete_url` to `UploadedFileInfo` proto * add `get_upload_urls` to `UploadedFileManager` * `UploadedFileManager` now works with file_id to get_files. * Add todos * remove abstractmothod from get_upload_urls
* Have UploadedFileManager react client fetch and use upload URLs * Fix issues from rebase
* reimplement camera_input to work with new uploaded file manager * call delete endpoint when clear camera_input
* Change some types from Promise<number> to Promise<void> * Remove serverFileId and related fields from FileUploader and friends * Remove serverFileId from CameraInput * Remove more newly unused fields from CameraInput and FileUploader
* extract FileUrls to separate proto * use copy from * use FileURLS also in camera_input * fixes after review
* Run autoformatter on e2e/specs/st_file_uploader.spec.js * Fix some more small type errors * Fix watched cypress route in st_file_uploader.spec.js
docs and tests
* Fix a bunch of FileUploader js unit tests * Fix a bunch of type errors in tests
a2831c4
to
d45a488
Compare
* Wait until file uploaded instead of fix amount of time * fix tests for CameraInput fix tests for DefaultStreamlitEndpoints * fix eslint errors * add uuid to NOTICES
@@ -179,6 +179,8 @@ def set_widget_metadata(self, widget_meta: WidgetMetadata[Any]) -> None: | |||
|
|||
def remove_stale_widgets(self, active_widget_ids: set[str]) -> None: | |||
"""Remove widget state for widgets whose ids aren't in `active_widget_ids`.""" | |||
# TODO(vdonato / kajarenc): Remove files corresponding to an inactive file | |||
# uploader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical, we keep this for future improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments, but overall this LGTM
I won't officially hit the +1 button given so much of this code is mine that I'm pretty sure that I'm not qualified to approve it 😆
def _on_files_updated(self, session_id: str) -> None: | ||
"""Event handler for UploadedFileManager.on_file_added. | ||
Ensures that uploaded files from stale sessions get deleted. | ||
|
||
Notes | ||
----- | ||
Threading: SAFE. May be called on any thread. | ||
""" | ||
if not self._session_mgr.is_active_session(session_id): | ||
# If an uploaded file doesn't belong to an active session, | ||
# remove it so it doesn't stick around forever. | ||
self._uploaded_file_mgr.remove_session_files(session_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I feel like this might be something that we want to keep even in the new world. We still have the mechanism that calls remove_session_files
when a session is shut down, but I'm slightly worried of the possibility that a file corresponding to a now nonexistent session is somehow uploaded so is never cleaned up without this.
Would it be possible to revive this callback function but change it to be called when no corresponding session (active or inactive) exists for a given session_id
? This happens when self._session_mgr.get_session_info(session_id) is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey!
We check that session is active after the file is fully uploaded, and just before storing it into storage.
https://github.com/streamlit/streamlit/pull/7025/files#diff-3e1711ee513b43ba25f0f9118bfe0135a503ed40da6474b7b2c5ddcef9e940f5R101 , so it shouldn't be possible very unlikely to upload file not corresponding to active sesion
So I don't want to make it part of the protocol but will be happy to return to this during refactoring of removing session_id
from URLs, and rethink deleting files mechanism for open source implementation.
@patch("streamlit.elements.widgets.file_uploader._get_upload_files") | ||
def test_deleted_file_omitted(self, get_upload_files_patch): | ||
"""We should omit DeletedFile objects for final user value .""" | ||
|
||
uploaded_files = [DeletedFile(file_id="A")] | ||
get_upload_files_patch.return_value = uploaded_files | ||
|
||
st.file_uploader("foo", accept_multiple_files=True) | ||
result_1: UploadedFile = st.file_uploader("a", accept_multiple_files=False) | ||
result_2: UploadedFile = st.file_uploader("b", accept_multiple_files=True) | ||
|
||
self.assertEqual(result_1, None) | ||
self.assertEqual(result_2, []) | ||
|
||
@patch("streamlit.elements.widgets.file_uploader._get_upload_files") | ||
def test_deleted_files_filtered_out(self, get_upload_files_patch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these two tests essentially identical? I think we can probably remove the first one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, the second test is a superset of the first test, I removed the first one!
def _get_file_recs_for_camera_input_widget( | ||
widget_id: str, widget_value: Optional[FileUploaderStateProto] | ||
) -> List[UploadedFileRec]: | ||
def _get_upload_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we factor this out into a helper function shared by camera_input
and file_uploader
? This function seems to be identical for both of the widgets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it difficult to find a new home for def _get_upload_files
(because it is used only in file_uploader and camera_input, so utils.py
is probably not the best place, so I just now imported _get_upload_files
from file_uploader for camera_input, I think it is good idea, and denote the fact that camra_input essentially a derivative form file_uploader
SomeUploadedFiles = Optional[ | ||
Union[UploadedFile, DeletedFile, List[Union[UploadedFile, DeletedFile]]] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can add None
to the union instead of having an Optional[Union[...]]
to be consistent with camera input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
filtered_value: Union[UploadedFile, List[UploadedFile], None] | ||
|
||
if isinstance(widget_state.value, DeletedFile): | ||
filtered_value = None | ||
elif isinstance(widget_state.value, list): | ||
filtered_value = [ | ||
f for f in widget_state.value if not isinstance(f, DeletedFile) | ||
] | ||
else: | ||
filtered_value = widget_state.value | ||
|
||
return filtered_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it'd be fine to return the value directly from each branch of the if/elif/else statement rather than save it to an intermediate variable that's returned immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -225,7 +229,7 @@ def file_uploader( | |||
*, # keyword-only arguments: | |||
disabled: bool = False, | |||
label_visibility: LabelVisibility = "visible", | |||
) -> SomeUploadedFiles: | |||
) -> Optional[Union[UploadedFile, List[UploadedFile]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same comment about adding None
to the union instead of having this be Optional
(same thing again with L389)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
class DeletedFile(NamedTuple): | ||
"""Represents a deleted file in deserialized values for st.file_uploader and | ||
st.camera_input | ||
Return this from st.file_uploader and st.camera_input deserialize (so they can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should have a newline between the docstring summary and the rest of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Conflicts: # lib/streamlit/elements/widgets/camera_input.py # lib/streamlit/elements/widgets/file_uploader.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 I just have some nits and questions, but no major concern.
@@ -34,6 +34,14 @@ export interface StreamlitEndpoints { | |||
*/ | |||
buildMediaURL(url: string): string | |||
|
|||
/** | |||
* Construct a URL for uploading a file. | |||
* @param url a relative or absolute URL. If `url` is absolute, it will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think in the JSdoc docstring syntax the parameter is separated via a -
, e.g.:
* @param url a relative or absolute URL. If `url` is absolute, it will be | |
* @param url - a relative or absolute URL. If `url` is absolute, it will be |
There are also a few other instances in the PR that could get correct, but it seems that we haven't been following this anyways in the current impl.
} | ||
|
||
message FileUploaderState { | ||
// DEPRECATED | ||
sint64 max_file_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment above, couldn't we just fully remove this via:
sint64 max_file_id = 1; | |
reserved 1; | |
reserved "max_file_id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we are okay to remove / change the type of any proto message field, because of dependencies from other teams.
This is a good question to clarify, to have a solid understanding of the contract we keep for proto messages.
For now, I think the safest thing is just to keep the field as is, with a comment that it is now deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oki, let's keep it as is 👍 But maybe we bring it up in a standup soon to figure out if it would break other dependencies.
// Information on a file uploaded via the file_uploader widget. | ||
message UploadedFileInfo { | ||
// DEPRECATED. | ||
sint64 id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field still used in any way? If not, could we just remove this and add:
sint64 id = 1; | |
reserved 1; | |
reserved "id"; |
all_files: List[UploadedFileRec] = [] | ||
# Make copy of self.file_storage for thread safety, to be sure | ||
# that main storage won't be changed form other thread | ||
file_storage_copy = self.file_storage.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does that mean that all files are also duplicated in memory? So calling this (e.g. via the metrics endpoint), might crash the app because of a sudden memory increase. Not a big deal right now since it isn't used a lot, but if we do more with those stats we might want to revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, now we create a duplicate to collect statistics.
The good news is that this duplicate should be short-living, and Python Garbage Collector should collect it very fast because we don't keep any references to file_storage_copy
, but yes, this could lead to memory usage increase in case of a lot of stored uploaded files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add a TODO comment or something there, so that we double check this once we do more with the stats.
Co-authored-by: Lukas Masuch <Lukas.Masuch@gmail.com>
…ient (#7092) We want to avoid having the notebooks team have to make any code changes on their end due to the interface / constructor changes we're making to StreamlitEndpoints and FileUploadClient. In order to do this, we make the newly added methods/args optional.
This PR change is a reworking of a process to upload files for st.file_uploader and st.camera_input. It introduces the abstraction for UploadedFileManager, the default implementation continues to store files in memory. Now the process of uploading file will consist from two steps Issuing upload file URLs (happen via web socket communication, file_urls_request handler). Upload a file to the URL issued in step 1.
This PR change is a reworking of a process to upload files for st.file_uploader and st.camera_input. It introduces the abstraction for UploadedFileManager, the default implementation continues to store files in memory. Now the process of uploading file will consist from two steps Issuing upload file URLs (happen via web socket communication, file_urls_request handler). Upload a file to the URL issued in step 1.
This PR change is a reworking of a process to upload files for st.file_uploader and st.camera_input. It introduces the abstraction for UploadedFileManager, the default implementation continues to store files in memory. Now the process of uploading file will consist from two steps Issuing upload file URLs (happen via web socket communication, file_urls_request handler). Upload a file to the URL issued in step 1.
Describe your changes
This PR change is a reworking of a process to upload files for
st.file_uploader
andst.camera_input
.It introduces the abstraction for UploadedFileManager, the default implementation continues to store files in memory.
Now the process of uploading file will consist from two steps
file_urls_request
handler).GitHub Issue Link (if applicable)
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.