-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend flytekit version hash calculation to be pluggable #2428
base: master
Are you sure you want to change the base?
Extend flytekit version hash calculation to be pluggable #2428
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2428 +/- ##
===========================================
+ Coverage 42.77% 72.09% +29.31%
===========================================
Files 185 181 -4
Lines 18677 18397 -280
Branches 2665 3601 +936
===========================================
+ Hits 7990 13264 +5274
+ Misses 10599 4508 -6091
- Partials 88 625 +537 ☔ View full report in Codecov by Sentry. |
922e239
to
8d0645c
Compare
8d0645c
to
2951085
Compare
@@ -1028,11 +1028,14 @@ def _get_image_names(entity: typing.Union[PythonAutoContainerTask, WorkflowBase] | |||
if isinstance(entity, WorkflowBase): | |||
default_inputs = entity.python_interface.default_inputs_as_kwargs | |||
|
|||
from flytekit.configuration.plugin import get_plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not the ideal place to have an import.
It also seems like the additional_context
(which could probably be named a bit more descriptively?) value should be passed into the register_script
function, shouldn't it? The pattern of pulling in special state like this makes this code a bit harder to test / reason about I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use lazy import here to solve the circular import issue (remote imports plugin, plugin imports remote)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've used this tactic to reduce startup time in Ruby -- but:
- It looks like there's already a
lazy_module
helper in Flytekit - if this is a tactic for dealing with a potential circular dependency, that can be the sign of a flawed design
- Remote now takes a dependency on this plugin helper method -- it's cleaner if you can invert the behavior so that this function doesn't need to know anything about finding special mutation functions in plugins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could rename additional_context
to version_hash_additional_context
? I wouldn't rename it to task_configs
because it does not necessarily contain the task configs, that only happens for a particular use case.
Moving it to a parameter of register_script
seems sensible, but for the moment I'm thinking of keeping the PR as very easy to review/not changing too many methods. Will give it a try at a later point.
edit: will rename to version_hash_additional_context
flytekit/remote/remote.py
Outdated
# The md5 version that we send to S3/GCS has to match the file contents exactly, | ||
# but we don't have to use it when registering with the Flyte backend. | ||
# For that add the hash of the compilation settings to hash of file | ||
version = self._version_from_hash( | ||
md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity) | ||
md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity), *additional_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to have pluggable versioning instead? i.e. the plugin defines a custom version function that gets passed md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity)
if it exists? Would such a function need a dual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that core logic i.e. the default version hash logic, should remain in its section, rather than moved to the plugin methods. This mostly follows the existing pattern I noticed in FlytekitPlugin, where core logic is not often moved to its methods – however there are some exceptions, as I suppose with FlytekitPlugin.get_remote
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
Tracking issue
Closes flyteorg/flyte#5364
Why are the changes needed?
Extends https://github.com/flyteorg/flytekit/pull/2039/files – that PR gives the minimum API surface for configuring the API via
FlytekitPlugin
.This PR increases how
pyflyte
is pluggable by external libraries, specifically for controlling the additional context used to generate the version string hash.What changes were proposed in this pull request?
This can be used with a plugin like:
This will add the task config to the calculation of the version hash.
How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link