Offloading of objects during registration is a difficult to debug trap for inexperienced users #4743
fg91
started this conversation in
RFC Incubator
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Let us consider an example where 1) the type engine transports an object by offloading it to blob storage (in this case as a pickle file) and 2) where this object is instantiated not in a task but when calling a task in a workflow:
This workflow fails with:
The reason is that during registration,
pyflyte
does not realize that the object needs to be uploaded to blob storage. The user would have to proactively configure the raw data prefix.I would argue that this example is very difficult to understand and debug for users that don't have a clear understanding of Flyte's data model and too "simple" to let users fall into this trap.
As a user I would want flytekit to 1) realize that during registration, files need to be offloaded to blob storage and 2) the backend to specify a default raw data prefix during registration unless I configured it explicitly in my flyte config file.
How could this be fixed?
In the
FileAccessProvider
, we need to prevent thatput_raw_data
stores offloaded objects locally during registration.pyflyte
request theraw_data_prefix
fromflyteadmin
if the user didn't set it explicitly and "make the file access provider aware of it"?Beta Was this translation helpful? Give feedback.
All reactions