Python hook #95

ramasesh · 2020-10-23T00:32:20Z

This PR is an updated version of @ethansdyer and my initial hook PR taking into account both of your comments @sritchie @ajslone . I've tried to give as much detail as required in this document, but please let me know if more is needed.

Status: besides unit tests, this should be good to go. I've done basic tests, and if you both approve the design I'll get the tests in pronto.

Changes to the design:

The major change to the design from the previous PR is that instead of the user specifying hooks as scripts which Caliban then executes using subprocess, we now require that the user specify hooks as python functions which Caliban will run by importing a hooks module (from the project's root directory). Since Caliban can run these as python functions, we do not have to worry about collecting stdout, and can instead have the hooks just return dictionaries as normal.
A second change is what the pre-run hooks do with their outputs (which are dictionaries, e.g. {'commit': 'sdf9q32rfjhsp9ruw3'}). Previously, we had added these key value pairs as labels to the job, so in GCP, for example, they would show up next to the job. While this was fine for cloud, it didn't work so well for run, so in this PR Caliban instead submits the key value pairs as script_args, e.g. for the dictionary above it will do --commit sdf9q32rfjhsp9ruw3.

Overview of how hooks are implemented:

User specification: User specifies pre_run_hooks and pre_build_hooks in their .calibanconfig.json file. These hooks must be (lists of) python functions, which are part of a module called, appropriately, hooks.
Validation: Using Schema, Caliban validates that the specified functions exist within the hooks module and are callable.
Pre_build_hook execution: Before building a container---i.e. with explicitly called with caliban build or as part of caliban run, etc.---Caliban will run all the hooks specified as pre_build_hooks. These hooks take no argument. If any of these hooks fails, meaning it returns a dictionary with the pair {'Succeeded': False}, it gives the error message to the user and aborts the build. If all of the hooks succeed, then it collects the hook outputs and labels the container with these outputs. For example, if one of the hooks outputs the commit hash of the latest git commit, the container built by docker will get labeled with 'commit': 'commit value'.
Pre_run_hook execution: Before running (either run, cloud, gke), Caliban will run all the specified pre_run_hooks. As in the pre_build_hooks, if any of these fail, Caliban will abort the run and give the error message to the user. If they all succeed, again returning a dictionary of output, Caliban will pass these key/value pairs as script_args to the script it's about to run.

Things that are not done as well as I'd like:

Importing the hooks is currently done by the following (since the hooks folder will live within the project's root directory, which will likely not be on the path):

import importlib
sys.path.append('.')
hooks = importlib.import_module('hooks')
sys.path.remove('.')

There must be a better way to do this.
2. Validation currently validates that the functions specified in pre_build_hooks and pre_run_hooks are Callable attributes of the hooks module, but they do not validate, for example, that the pre_build_hooks take no arguments or that the pre_run_hooks take a single argument (the container id).
3. Because the hooks are now run by python, they require that whatever dependencies the hooks rely on (say, for example, the docker module) are installed in the the user's base environment. Not sure this is the best.

Added keys pre-build-hook and pre-run-hook to valid calibanconfig configurations (by modifying caliban/config/__init__.py), and added code to caliban/docker/build.py to look for a pre-build hook, run it if found, and only proceed with the build if the hook returns w/ exit code 0.

…ception).

…for cloud)

google-cla · 2020-10-23T00:32:36Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

codecov · 2020-10-23T00:33:56Z

Codecov Report

Merging #95 into master will decrease coverage by 0.89%.
The diff coverage is 26.92%.

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
- Coverage   55.72%   54.82%   -0.90%     
==========================================
  Files          31       33       +2     
  Lines        3180     3283     +103     
==========================================
+ Hits         1772     1800      +28     
- Misses       1408     1483      +75

Impacted Files	Coverage Δ
caliban/config/__init__.py	`100.00% <ø> (ø)`
caliban/platform/cloud/core.py	`23.00% <14.28%> (-0.30%)`	⬇️
caliban/platform/gke/cli.py	`21.37% <14.28%> (-0.19%)`	⬇️
caliban/platform/run.py	`27.86% <16.66%> (-0.58%)`	⬇️
caliban/util/schema.py	`65.30% <25.00%> (-13.08%)`	⬇️
caliban/hooks/util.py	`27.50% <27.50%> (ø)`
caliban/docker/build.py	`32.74% <30.76%> (+0.03%)`	⬆️
caliban/hooks/examples/git_hooks.py	`36.84% <36.84%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56f96e7...092977f. Read the comment docs.

google-cla · 2020-10-23T00:39:34Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

ethansdyer

I think the general strategy here is good. Some comments

Python hooks:

We considered having python hooks and decided not to originally.
The pro of having them is that the output is much cleaner to parse.
The main con is that it restricts the user to write their hooks in python (though of course python could just wrap a subprocess call, so not really a restriction). I don't have strong feelings about which way is optimal. One thing to keep in mind in either case, is the hooks will be run from whatever env caliban is installed in. In particular we have to trust the user to maintain an env where their hooks work.

Results as flags:

I like this approach in principle. I have two main issues with it 1) We now are requiring the main run script to take the hook flags as arguments. This requires the user to synchronize their hooks with their script. Avoiding this would be ideal (could have a way to hide this from the user with something like hooks.initialize_flags or something.). It also means we trust the user to log this information. In summary, what I don't like is that using hooks in this version requires not just having the hooks folder and changing the config, but changing the main run script in a hook dependent way.

It seems that there should be a default way that all tags are handled for caliban. This should be uniform across run modes (ie cloud run etc). If we want to make the choice that this is handled by passing everything as a flag to the main script, that is in principle fine, but that is not the case at the moment. @sritchie and @ajslone, what do you guys think about this?

ethansdyer · 2020-10-26T19:37:48Z

caliban/hooks/util.py

+import caliban.util.fs as ufs
+
+
+def perform_prebuild_hooks(caliban_config: Dict[str, Any]) -> Dict[str, str]:


Thanks for the clear docstring! I feel this is structured a little redundantly / not pythonically. We are running a hook and it returns: in the case of success, a bool saying it succeeded and a dict; in the case of failure, a bool saying failure and a string. Why not just have the hook raise an exception if it doesn't pass and have it return a dict when it passes.

ethansdyer · 2020-10-26T19:43:00Z

caliban/hooks/util.py

+  all_outputs = {}
+
+  for hook_name in caliban_config.get('pre_build_hooks', []):
+    hook = getattr(all_hooks, f'hook_{hook_name}')


Now it looks like we are imposing a convention where in the caliban config, the hook is named as my_hook, while we assume it is defined as hook_my_hook. This seems potentially confusing. Why not just use the function names without the extra hook_ prefix? If you want to adopt this as a soft convention, can just choose to name things that way (but then the hook_ would appear in the config file as well).

ethansdyer · 2020-10-26T20:47:12Z

caliban/hooks/examples/git_hooks.py

+the pre_build_hooks or pre_run_hooks entry in the .calibanconfig.json file
+
+Notes:
+  1.  Hook names must begin with 'hook_'


As mentioned elsewhere do we really want to implement the hook_ convention as a hard constraint?

ethansdyer · 2020-10-26T20:55:31Z

caliban/hooks/util.py

+    3) Live in the 'hooks' module in the project's root directory
+  """
+  import sys
+  sys.path.append('.')


I agree this is a little precarious. Again would be nice to hear @sritchie and @ajslone thoughts about the nicest way to structure this.

Some options

Have function take a path; have an optional hook folder path in the .calibanconfig.json; use os.os.getcwd from within the build and run calls if no path is given.

Have a required hook_folder path in .calibanconfig.json. This could be all that's specified and we could move the specific hook specification to a hook_config.json within that directory.

Have setup involve installing hooks in env.

ramasesh and others added 29 commits October 10, 2020 14:51

refactor perform_prebuild_hooks

2bcc731

add list of hooks, call using subprocess

fdeaa55

initial working demo of git commit hook

7993b7f

Cleaned up hook error handeling (caught stderr and raised a single ex…

391397e

…ception).

Changed commit hash storing from file to docker image label.

82a87ca

Cleaned up logging.

8c69892

Added git commit hooks.

0bc1e41

Added docker requirement.

0ae32fa

add gitpython and docker to requirements.txt

b20e687

removed docker and git-python from requirements-dev.txt

98ac780

removed container-labels from config

8207b8b

added schema to ensure hooks are valid files

fcf7a56

added comment to config/__init__.py describing hooks

74c159e

refactored logic for extracting image ID

2a65267

added typing to git_commit_hooks.py

49fabe2

pretend things are immutable

5761271

Add better documentation to git hook examples

7203533

change hook name to underscores

6f7f718

add docstring to get_docker_label_args(), define new types

d7eee0a

fix unused imports in caliban/hooks/examples

2ad2b70

combine error messages in git cleanliness hook to be more readable

771c0ac

added fix to stop prerun hooks from being performed in dry_run mode (…

7a0c6e6

…for cloud)

initial working configuration of python hooks

e8cad0b

change hook output logging from .info() to .debug()

0904dde

replace hook examples to be up-to-date

73ae5a9

add pre_run hook output as script_args

00c5a5a

add validation for Hooks

8e09455

remove docker dependency from caliban

fc5504c

ramasesh requested a review from sritchie October 23, 2020 00:32

ramasesh requested a review from ajslone October 23, 2020 00:32

fixes formatting (pre-commit)

092977f

ramasesh requested a review from ethansdyer October 23, 2020 00:47

ethansdyer reviewed Oct 26, 2020

View reviewed changes

ramasesh closed this Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python hook #95

Python hook #95

ramasesh commented Oct 23, 2020 •

edited

Loading

google-cla bot commented Oct 23, 2020

codecov bot commented Oct 23, 2020 •

edited

Loading

google-cla bot commented Oct 23, 2020

ethansdyer left a comment

ethansdyer Oct 26, 2020

ethansdyer Oct 26, 2020

ethansdyer Oct 26, 2020

ethansdyer Oct 26, 2020

		import caliban.util.fs as ufs


		def perform_prebuild_hooks(caliban_config: Dict[str, Any]) -> Dict[str, str]:

Python hook #95

Python hook #95

Conversation

ramasesh commented Oct 23, 2020 • edited Loading

google-cla bot commented Oct 23, 2020

codecov bot commented Oct 23, 2020 • edited Loading

Codecov Report

google-cla bot commented Oct 23, 2020

ethansdyer left a comment

Choose a reason for hiding this comment

ethansdyer Oct 26, 2020

Choose a reason for hiding this comment

ethansdyer Oct 26, 2020

Choose a reason for hiding this comment

ethansdyer Oct 26, 2020

Choose a reason for hiding this comment

ethansdyer Oct 26, 2020

Choose a reason for hiding this comment

ramasesh commented Oct 23, 2020 •

edited

Loading

codecov bot commented Oct 23, 2020 •

edited

Loading