Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python hook #95

Closed
wants to merge 30 commits into from
Closed

Python hook #95

wants to merge 30 commits into from

Conversation

ramasesh
Copy link
Collaborator

@ramasesh ramasesh commented Oct 23, 2020

This PR is an updated version of @ethansdyer and my initial hook PR taking into account both of your comments @sritchie @ajslone . I've tried to give as much detail as required in this document, but please let me know if more is needed.

Status: besides unit tests, this should be good to go. I've done basic tests, and if you both approve the design I'll get the tests in pronto.

Changes to the design:

  • The major change to the design from the previous PR is that instead of the user specifying hooks as scripts which Caliban then executes using subprocess, we now require that the user specify hooks as python functions which Caliban will run by importing a hooks module (from the project's root directory). Since Caliban can run these as python functions, we do not have to worry about collecting stdout, and can instead have the hooks just return dictionaries as normal.
  • A second change is what the pre-run hooks do with their outputs (which are dictionaries, e.g. {'commit': 'sdf9q32rfjhsp9ruw3'}). Previously, we had added these key value pairs as labels to the job, so in GCP, for example, they would show up next to the job. While this was fine for cloud, it didn't work so well for run, so in this PR Caliban instead submits the key value pairs as script_args, e.g. for the dictionary above it will do --commit sdf9q32rfjhsp9ruw3.

Overview of how hooks are implemented:

  1. User specification: User specifies pre_run_hooks and pre_build_hooks in their .calibanconfig.json file. These hooks must be (lists of) python functions, which are part of a module called, appropriately, hooks.
  2. Validation: Using Schema, Caliban validates that the specified functions exist within the hooks module and are callable.
  3. Pre_build_hook execution: Before building a container---i.e. with explicitly called with caliban build or as part of caliban run, etc.---Caliban will run all the hooks specified as pre_build_hooks. These hooks take no argument. If any of these hooks fails, meaning it returns a dictionary with the pair {'Succeeded': False}, it gives the error message to the user and aborts the build. If all of the hooks succeed, then it collects the hook outputs and labels the container with these outputs. For example, if one of the hooks outputs the commit hash of the latest git commit, the container built by docker will get labeled with 'commit': 'commit value'.
  4. Pre_run_hook execution: Before running (either run, cloud, gke), Caliban will run all the specified pre_run_hooks. As in the pre_build_hooks, if any of these fail, Caliban will abort the run and give the error message to the user. If they all succeed, again returning a dictionary of output, Caliban will pass these key/value pairs as script_args to the script it's about to run.

Things that are not done as well as I'd like:

  1. Importing the hooks is currently done by the following (since the hooks folder will live within the project's root directory, which will likely not be on the path):
import importlib
sys.path.append('.')
hooks = importlib.import_module('hooks')
sys.path.remove('.')

There must be a better way to do this.
2. Validation currently validates that the functions specified in pre_build_hooks and pre_run_hooks are Callable attributes of the hooks module, but they do not validate, for example, that the pre_build_hooks take no arguments or that the pre_run_hooks take a single argument (the container id).
3. Because the hooks are now run by python, they require that whatever dependencies the hooks rely on (say, for example, the docker module) are installed in the the user's base environment. Not sure this is the best.

ramasesh and others added 29 commits October 10, 2020 14:51
Added keys pre-build-hook and pre-run-hook to valid calibanconfig
configurations (by modifying caliban/config/__init__.py), and added code
to caliban/docker/build.py to look for a pre-build hook, run it if
found, and only proceed with the build if the hook returns w/ exit code
0.
@ramasesh ramasesh requested a review from ajslone October 23, 2020 00:32
@google-cla
Copy link

google-cla bot commented Oct 23, 2020

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@codecov
Copy link

codecov bot commented Oct 23, 2020

Codecov Report

Merging #95 into master will decrease coverage by 0.89%.
The diff coverage is 26.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
- Coverage   55.72%   54.82%   -0.90%     
==========================================
  Files          31       33       +2     
  Lines        3180     3283     +103     
==========================================
+ Hits         1772     1800      +28     
- Misses       1408     1483      +75     
Impacted Files Coverage Δ
caliban/config/__init__.py 100.00% <ø> (ø)
caliban/platform/cloud/core.py 23.00% <14.28%> (-0.30%) ⬇️
caliban/platform/gke/cli.py 21.37% <14.28%> (-0.19%) ⬇️
caliban/platform/run.py 27.86% <16.66%> (-0.58%) ⬇️
caliban/util/schema.py 65.30% <25.00%> (-13.08%) ⬇️
caliban/hooks/util.py 27.50% <27.50%> (ø)
caliban/docker/build.py 32.74% <30.76%> (+0.03%) ⬆️
caliban/hooks/examples/git_hooks.py 36.84% <36.84%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56f96e7...092977f. Read the comment docs.

@google-cla
Copy link

google-cla bot commented Oct 23, 2020

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Copy link
Collaborator

@ethansdyer ethansdyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general strategy here is good. Some comments

  • Python hooks:

We considered having python hooks and decided not to originally.
The pro of having them is that the output is much cleaner to parse.
The main con is that it restricts the user to write their hooks in python (though of course python could just wrap a subprocess call, so not really a restriction). I don't have strong feelings about which way is optimal. One thing to keep in mind in either case, is the hooks will be run from whatever env caliban is installed in. In particular we have to trust the user to maintain an env where their hooks work.

  • Results as flags:

I like this approach in principle. I have two main issues with it 1) We now are requiring the main run script to take the hook flags as arguments. This requires the user to synchronize their hooks with their script. Avoiding this would be ideal (could have a way to hide this from the user with something like hooks.initialize_flags or something.). It also means we trust the user to log this information. In summary, what I don't like is that using hooks in this version requires not just having the hooks folder and changing the config, but changing the main run script in a hook dependent way.

  1. It seems that there should be a default way that all tags are handled for caliban. This should be uniform across run modes (ie cloud run etc). If we want to make the choice that this is handled by passing everything as a flag to the main script, that is in principle fine, but that is not the case at the moment. @sritchie and @ajslone, what do you guys think about this?

import caliban.util.fs as ufs


def perform_prebuild_hooks(caliban_config: Dict[str, Any]) -> Dict[str, str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear docstring! I feel this is structured a little redundantly / not pythonically. We are running a hook and it returns: in the case of success, a bool saying it succeeded and a dict; in the case of failure, a bool saying failure and a string. Why not just have the hook raise an exception if it doesn't pass and have it return a dict when it passes.

all_outputs = {}

for hook_name in caliban_config.get('pre_build_hooks', []):
hook = getattr(all_hooks, f'hook_{hook_name}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks like we are imposing a convention where in the caliban config, the hook is named as my_hook, while we assume it is defined as hook_my_hook. This seems potentially confusing. Why not just use the function names without the extra hook_ prefix? If you want to adopt this as a soft convention, can just choose to name things that way (but then the hook_ would appear in the config file as well).

the pre_build_hooks or pre_run_hooks entry in the .calibanconfig.json file

Notes:
1. Hook names must begin with 'hook_'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere do we really want to implement the hook_ convention as a hard constraint?

3) Live in the 'hooks' module in the project's root directory
"""
import sys
sys.path.append('.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a little precarious. Again would be nice to hear @sritchie and @ajslone thoughts about the nicest way to structure this.

Some options

  • Have function take a path; have an optional hook folder path in the .calibanconfig.json; use os.os.getcwd from within the build and run calls if no path is given.
  • Have a required hook_folder path in .calibanconfig.json. This could be all that's specified and we could move the specific hook specification to a hook_config.json within that directory.
  • Have setup involve installing hooks in env.

@ramasesh ramasesh closed this Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants