-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python hook #95
Python hook #95
Conversation
Added keys pre-build-hook and pre-run-hook to valid calibanconfig configurations (by modifying caliban/config/__init__.py), and added code to caliban/docker/build.py to look for a pre-build hook, run it if found, and only proceed with the build if the hook returns w/ exit code 0.
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
Codecov Report
@@ Coverage Diff @@
## master #95 +/- ##
==========================================
- Coverage 55.72% 54.82% -0.90%
==========================================
Files 31 33 +2
Lines 3180 3283 +103
==========================================
+ Hits 1772 1800 +28
- Misses 1408 1483 +75
Continue to review full report at Codecov.
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general strategy here is good. Some comments
- Python hooks:
We considered having python hooks and decided not to originally.
The pro of having them is that the output is much cleaner to parse.
The main con is that it restricts the user to write their hooks in python (though of course python could just wrap a subprocess call, so not really a restriction). I don't have strong feelings about which way is optimal. One thing to keep in mind in either case, is the hooks will be run from whatever env caliban is installed in. In particular we have to trust the user to maintain an env where their hooks work.
- Results as flags:
I like this approach in principle. I have two main issues with it 1) We now are requiring the main run script to take the hook flags as arguments. This requires the user to synchronize their hooks with their script. Avoiding this would be ideal (could have a way to hide this from the user with something like hooks.initialize_flags
or something.). It also means we trust the user to log this information. In summary, what I don't like is that using hooks in this version requires not just having the hooks folder and changing the config, but changing the main run script in a hook dependent way.
- It seems that there should be a default way that all tags are handled for caliban. This should be uniform across run modes (ie cloud run etc). If we want to make the choice that this is handled by passing everything as a flag to the main script, that is in principle fine, but that is not the case at the moment. @sritchie and @ajslone, what do you guys think about this?
import caliban.util.fs as ufs | ||
|
||
|
||
def perform_prebuild_hooks(caliban_config: Dict[str, Any]) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clear docstring! I feel this is structured a little redundantly / not pythonically. We are running a hook and it returns: in the case of success, a bool saying it succeeded and a dict; in the case of failure, a bool saying failure and a string. Why not just have the hook raise an exception if it doesn't pass and have it return a dict when it passes.
all_outputs = {} | ||
|
||
for hook_name in caliban_config.get('pre_build_hooks', []): | ||
hook = getattr(all_hooks, f'hook_{hook_name}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks like we are imposing a convention where in the caliban config, the hook is named as my_hook
, while we assume it is defined as hook_my_hook
. This seems potentially confusing. Why not just use the function names without the extra hook_ prefix? If you want to adopt this as a soft convention, can just choose to name things that way (but then the hook_ would appear in the config file as well).
the pre_build_hooks or pre_run_hooks entry in the .calibanconfig.json file | ||
|
||
Notes: | ||
1. Hook names must begin with 'hook_' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere do we really want to implement the hook_ convention as a hard constraint?
3) Live in the 'hooks' module in the project's root directory | ||
""" | ||
import sys | ||
sys.path.append('.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a little precarious. Again would be nice to hear @sritchie and @ajslone thoughts about the nicest way to structure this.
Some options
- Have function take a path; have an optional hook folder path in the .calibanconfig.json; use os.os.getcwd from within the build and run calls if no path is given.
- Have a required hook_folder path in .calibanconfig.json. This could be all that's specified and we could move the specific hook specification to a hook_config.json within that directory.
- Have setup involve installing hooks in env.
This PR is an updated version of @ethansdyer and my initial hook PR taking into account both of your comments @sritchie @ajslone . I've tried to give as much detail as required in this document, but please let me know if more is needed.
Status: besides unit tests, this should be good to go. I've done basic tests, and if you both approve the design I'll get the tests in pronto.
Changes to the design:
subprocess
, we now require that the user specify hooks as python functions which Caliban will run by importing ahooks
module (from the project's root directory). Since Caliban can run these as python functions, we do not have to worry about collectingstdout
, and can instead have the hooks just return dictionaries as normal.{'commit': 'sdf9q32rfjhsp9ruw3'}
). Previously, we had added these key value pairs as labels to the job, so in GCP, for example, they would show up next to the job. While this was fine for cloud, it didn't work so well forrun
, so in this PR Caliban instead submits the key value pairs asscript_args
, e.g. for the dictionary above it will do--commit sdf9q32rfjhsp9ruw3
.Overview of how hooks are implemented:
pre_run_hooks
andpre_build_hooks
in their.calibanconfig.json
file. These hooks must be (lists of) python functions, which are part of a module called, appropriately,hooks
.hooks
module and are callable.caliban build
or as part ofcaliban run
, etc.---Caliban will run all the hooks specified as pre_build_hooks. These hooks take no argument. If any of these hooks fails, meaning it returns a dictionary with the pair{'Succeeded': False}
, it gives the error message to the user and aborts the build. If all of the hooks succeed, then it collects the hook outputs and labels the container with these outputs. For example, if one of the hooks outputs the commit hash of the latest git commit, the container built by docker will get labeled with 'commit': 'commit value'.run
,cloud
,gke
), Caliban will run all the specified pre_run_hooks. As in the pre_build_hooks, if any of these fail, Caliban will abort the run and give the error message to the user. If they all succeed, again returning a dictionary of output, Caliban will pass these key/value pairs asscript_args
to the script it's about to run.Things that are not done as well as I'd like:
hooks
folder will live within the project's root directory, which will likely not be on the path):There must be a better way to do this.
2. Validation currently validates that the functions specified in
pre_build_hooks
andpre_run_hooks
are Callable attributes of thehooks
module, but they do not validate, for example, that thepre_build_hooks
take no arguments or that thepre_run_hooks
take a single argument (the container id).3. Because the hooks are now run by python, they require that whatever dependencies the hooks rely on (say, for example, the
docker
module) are installed in the the user's base environment. Not sure this is the best.