Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement execution hooks framework #189

Closed
wants to merge 46 commits into from
Closed

Implement execution hooks framework #189

wants to merge 46 commits into from

Conversation

csadorf
Copy link
Contributor

@csadorf csadorf commented Dec 6, 2019

Description

This pull request implements an execution hooks framework that enables users to execute certain functions on start, on finish, on success, and on fail of the execution of an operation.

Resolves #28, resolves #14.

Motivation and Context

The motivation for this framework is to enable users to automatically execute certain functions with operations for example for logging and tracking purposes. For example, to log when a specific operation is executed, a user can provide the following function:

@Project.operation
@Project.hooks.on_start(lambda op: logger.info(f"Executing {op}..."))
def foo(job):
    pass

The framework comes with a selection of pre-implemented hook systems to cover all the currently expected use cases, roughly in order of expensiveness:

  1. LogOperation - Log basic information about the execution of operations to a log file.
  2. TrackOperations - Keep detailed metadata records about the state of the project root directory and the operation, including directives, to a log file, optionally in conjunction with git.
  3. SnapshotProject - Create a snapshot of the project root directory to keep track of the code used for the execution of an operation.
  4. TrackWorkspaceWithGit - The workspace is treated as a git repository and automatically committed to before and after the execution of an operation. The can be done either on a per-job basis or workspace global.

A hook system is meant to describe a collection of hook functions that together achieve a specific purpose. The shipped hook systems are implemented as classes that can be installed project-wide via the respective install_hooks function.

High-level API

Hooks can be installed either on a per-operation or a per-project basis. In this way users have the option to execute hook functions either with specific operations or with all operations of a project.

Furthermore, there are two ways that hooks can be installed. One way is directly in Python, for example within the __main__ clause of the project.py file:

# ...

if __name__ == '__main__':
    from flow.hooks import LogOperations
    LogOperations().install_hooks(Project()).main()

Alternatively, hooks can also be installed through the (project) configuration file:

project = my_project
[flow]
hooks=flow.hooks.LogOperations,flow.hooks.SnapshotProject(compress=True)

It is assumed that the entities provided here, are callable, i.e., are either a function or a functor. The first argument must be the instance of FlowProject.

Low-level API

The FlowProject class has a hooks attribute with four lists: on_start, on_finish, on_success, and on_fail, which can be appended to. Hook functions installed in this way are executed for all operations.

Furthermore, hook functions can be installed for individual operations either with a decorator:

@Project.operation
@Project.hook.on_start(my_hook_function)
def op(job):
    pass

or by passing a dictionary to the add_operation() function: project.add_operation(..., hooks=dict(on_start=my_hook_function)).

Custom hook systems

Users can very easily implement their own hook systems and install them in similar manner.
For example:

# myhooks.py
def log_op(operation):
    with open(operation.job.fn('operations.log'), 'a') as logfile:
        logfile.write(f"Executed operation '{operation}'.")

def install(project):
    project.hooks.on_success.append(log_op)
    return project

This could then be installed either in the project module:

# ...

if __name__ == '__main__':
    import myhooks
    myhooks.install(Project()).main()

or via the configuration file:

project = myproject
[flow]
hooks=myhooks.install

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog.

@csadorf csadorf requested a review from a team as a code owner December 6, 2019 12:08
@csadorf csadorf requested review from a team and vyasr and removed request for a team December 6, 2019 12:08
@csadorf csadorf changed the title Execution hooks Implement execution hooks framework Dec 6, 2019
@ghost ghost requested review from cbkerr and removed request for a team December 6, 2019 12:10
@csadorf
Copy link
Contributor Author

csadorf commented Dec 6, 2019

@bdice You were not assigned as reviewer, but might still be interested in this feature.

@csadorf
Copy link
Contributor Author

csadorf commented Dec 6, 2019

Note: This feature won't work with "exec operations", but since these are going away anyways, I think it's safe to introduce this feature regardless.

@csadorf csadorf added the enhancement New feature or request label Dec 6, 2019
@csadorf
Copy link
Contributor Author

csadorf commented Mar 25, 2021

@cbkerr Is this ready for another review?

@csadorf csadorf removed their assignment Mar 25, 2021
flow/project.py Show resolved Hide resolved
@bdice
Copy link
Member

bdice commented Apr 8, 2021

I read through some of this PR with @klywang. We're scheduling a time to dig deeper on this and consider the architecture of the code. I have two broad considerations:

  • It seems like this PR's current design has prioritized generality at the expense of readability and document-ability. I think there are some clear tradeoffs where I would prefer a less-general and more-specific way. Two examples:

    1. The supported hooks are given as a class-level list ["on_start", ..., "on_error"] and class attributes are assigned dynamically with setattr. I think it would be more readable to avoid dynamic attributes and just use self.on_start = .... The supported hook types seem more like pre and post (a short list of known elements; code is duplicated for each) rather than an arbitrary list that would require dynamic assignments.
    2. Some classes use **kwargs and define alternate constructors, rather than having a single path of entry with specifically-named parameters (and calling code that uses those names). This has a slight cost in generality but offers much-more readable code in my opinion.
  • I think we may be able to simplify the "registration" side of this PR after we work through the follow-up implications of Remove FlowProject.add_operation #487 and Fix group name check #382 and begin to reconcile class-level and instance-level data members and try to simplify FlowProject instantiation. I would like to think through those problems first, before diving into this PR further, because I think the simplifications will pay off.

@vyasr
Copy link
Contributor

vyasr commented Apr 15, 2021

@bdice @cbkerr @klywang most of my comments are still relevant after #482 and addressing them should make the merge easier later. I'd prioritize docs/addressing open discussions over further refactoring work on this branch while other work happens on the main branch. I definitely agree that we should aim to simplify before finalizing this feature, but I think writing docs should come first so that @cbkerr and @klywang can clearly articulate the goals of this PR first since understanding how it should work is critical to writing the best version of the code.

@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #189 (1d31725) into master (620b34f) will decrease coverage by 3.79%.
The diff coverage is 33.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #189      +/-   ##
==========================================
- Coverage   73.40%   69.61%   -3.80%     
==========================================
  Files          30       37       +7     
  Lines        2967     3268     +301     
  Branches      563      591      +28     
==========================================
+ Hits         2178     2275      +97     
- Misses        644      845     +201     
- Partials      145      148       +3     
Impacted Files Coverage Δ
flow/hooks/git_util.py 0.00% <0.00%> (ø)
flow/hooks/snapshots.py 0.00% <0.00%> (ø)
flow/hooks/git_workspace_tracking.py 7.69% <7.69%> (ø)
flow/hooks/log_operations.py 26.08% <26.08%> (ø)
flow/hooks/track_operations.py 41.37% <41.37%> (ø)
flow/hooks/__init__.py 66.66% <66.66%> (ø)
flow/hooks/util.py 66.66% <66.66%> (ø)
flow/project.py 75.77% <69.11%> (-0.37%) ⬇️
flow/__init__.py 91.66% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 620b34f...1d31725. Read the comment docs.

setup.py Outdated Show resolved Hide resolved
@bdice bdice mentioned this pull request Apr 30, 2021
16 tasks
@bdice
Copy link
Member

bdice commented Dec 29, 2021

Now that #508 has been merged, we need to decide whether to implement the suggested hooks from this PR. I think the best way to do that would be to close this PR and open a new issue listing the proposed hooks and linking to this PR for reference implementations that could be adapted in separate PRs.

The framework comes with a selection of pre-implemented hook systems to cover all the currently expected use cases, roughly in order of expensiveness:

  1. LogOperation - Log basic information about the execution of operations to a log file.
  2. TrackOperations - Keep detailed metadata records about the state of the project root directory and the operation, including directives, to a log file, optionally in conjunction with git.
  3. SnapshotProject - Create a snapshot of the project root directory to keep track of the code used for the execution of an operation.
  4. TrackWorkspaceWithGit - The workspace is treated as a git repository and automatically committed to before and after the execution of an operation. The can be done either on a per-job basis or workspace global.

@csadorf
Copy link
Contributor Author

csadorf commented Feb 14, 2022

@bdice This can be closed, right?

@bdice
Copy link
Member

bdice commented Feb 14, 2022

@bdice This can be closed, right?

The only reason we left it open was because this PR also implements the specific hook classes that were not included in the general framework that was merged.

@b-butler
Copy link
Member

Closed see #637 for further discussion.

@b-butler b-butler closed this Apr 18, 2022
@klywang klywang mentioned this pull request Apr 19, 2023
6 tasks
@joaander joaander deleted the execution-hooks branch February 2, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for the improved management of provenance and reproducibility Proposal for operation hooks
7 participants