Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implementing a grouping feature to organize flow operations (#114)
* Enables the creation and registering of groups. * The FlowGroup class was created to represent the group concept and store the necessary variables and methods. * The classmethod make_group was created which returns a decorator that adds a _flow_group label to functions for later registration. In addtion, make_group also adds an entry to the new _GROUPS attribute in the _FlowProject class. * A function _register_groups was added that registers groups by using the _GROUPS and _OPERATIONS attributes of the current and parent classes. * The __init__ method of FlowGroup was changed to automatically register groups and add operations at initialization. * Adding function to create group CL arg. Adds the group commandline argument to submit, run, and script. * First functioning group submit. * _main_submit modified to have group path * submit_group function added to submit groups * changes do not directly update status * no checks on if directives work have been made * Adding support for groups in run command. * changed the way that operation names are gathered for run condition to aggregate those chosen from -o and -g options. * Adding script support for group option. * Raise ValueError if -o/--operation and -g/--group are set together * Add logic to include JobGroups as script operations if -o/--operations is set. * Adding support for command options. * Adding an options parameter for groups that appends options to the output of FlowGroup's __call__ method. This allows for you to set options like --num-passes for a group. These options only currently apply during submit and script operations. * Adding new methods to FlowGroup. * FlowGroups now have a complete and eligible method * FlowGroups now have two cmd's run_cmd and exec_cmd * FlowGroups.__call__ now takes a flag on cmd mode * make_group FlowProject method takes correct args and stores them in operation function dict correctly * FlowGroups store operations as a dict now * _register_groups FlowProject method correctly registers operations * Temporary changes made to JobGroup to test FlowGroup function * Adding bundling, exec_mode, op group conversion. * JobGroups now correctly create unique id. * JobOperations are removed as they are redundant. * All _main_script and _main_submit always use group pathway now. * Exec_mode runs and raises error for groups with more than one job. * Bundling works with groups of all kinds. * Changed JobOperation implimentation. * Changed functions that gather operations. * Changed the structure of JobOperations * Add method to FlowGroup to create JobOperations. * Integration of operation and group execution path. * Submit, run , and script all handle groups and operations through -o * FlowGroups are always created with knowledge of path now this is required because there is not an easy way to find where an object is instantiated. The current method uses the stack when using the classmethod make_group, and the class attribute _OPERATION_FUNCTIONS when creating FlowGroups directly from a FlowOperation. * Appropriate errors are raised when undefined behavior might occur. * Adding group support for run command. * Adding support for groups in submit function. * Adding requested revisions * Changing FlowGroups compatible method * Moving submit and JobOperation back to original location * Added a groups property to FlowProject class * Moved all call logic for FlowGroup to __call__ method * Moved exec_mode command-line argument conversion to string to _main functions. * Changed logic in _verify_group_compatibility using python's set type * renamed submit_operations group argument to operations * Changing _verify_group_compatibility logic. * Also doing linting changes that flake8 requested. * The changes to the `_setup_template_environment` were required to commit with the flake8 commit hook. * Adding after condition for groups. * Working on fixing bugs unit tests reveal. * Also adding exec mode to relevant tests. * Changing run behavior always to exec mode. * Since operations are always expanded from groups before-hand, exec mode is the correct mode to be in when calling a run command. * Changing groups behavior to pass tests. * Changing the `get_id` function in `FlowGroup` class to match previous id in the case of single operation groups. * Changing `_dumps_op` and `_loads_op` functions to include `JobOperation` id since it is now an attribute. * Created a function `_gather_FlowGroups` that takes a list of group/operation names and returns a list of instances of `FlowGroups` with removed duplication and error checking. * Add argument `mode` to submit function. Previously a kwarg check was preformed. This changes behavior to use the string which is the behavior elsewhere. * _main_submit was changed to pass the string to submit as well. * Added small changes in tests to match new implementations. * All references to `JobOperation.get_id()` are changed to use the attribute `id`. * Calls to `script` and `submit` that check output have 'exec' mode explicitly set. * Fixing error in retrieving function path. * Including inheritance hierarchy in gathering operation functions. * Adding '--exec' option to `_main_script` test. * Changing yield from to yield with a for loop for py2. * Raising error if operation is added to group twice. * Changing way group operations are joined for getting id. * Adding regex support for selecting groups and operations * had to refactor operation selection to include fullmatch. * Making groups resubmit safe * Also adding necessary tests for group behavior in submit, run, and script * Added `operation_ids` as attribute of JobOperation to hold individual operation ids for resubmitting checks. * Added `eligible_for_submission` method for JobOperaiton to check if all operation are not currently submitted. * Test main_submit and change dynamic tests * For dynamic tests a workflow with multiple steps is tested. For the groups test, it is more important to show that resubmission is not possible. The multistep workflow is shown to work in the previous tests. * Remove redundant expansion of operation names in _main_run * test_main_submit for groups with with multiple calls. This is necessary since one assertIn command cannot be used to ensure all operations are in the command as the ordering in python version 3.5 is not commensurate with python > 3.5. * Change `FlowGroup.get_id` to `FlowGroup._generate_id` * Singleton `FlowGroup`s and move directives to `FlowGroup`s * Currently the singleton groups must have the same name as the operation name. As this is not changable by the user that should be okay. * I had to set the directives 'executable' and 'override_path' for singleton groups with the `@cmd` decorator. This results in some ugly logic that should work, but if there is an easier way should change. * Remove group specific status. * Removed group status from `set_status` command. * Redefined `JobOperation.get_status` to always return a list of operation statuses. * Logic has yet to be moved to `FlowProject`. I did not see a clear way to query whether a particular job-operation pair was currently submitted other than directly searching the project. * The test change was just to correct for the fact that now `get_status` returns a list. * Move schedule status checks to group level. * Also show operation status through querying group status and expanding to composing operations. * Fix cmd-operations. * Initial check for prior group submission * Prevents unnecessary checking if the same group is previously submitted * Create two paths for gathering JobOperations * One path for submission * One path for running * Remove obsolete doc-string entry. There is actually no need to further specify what the id *should* look like. It must be unique, that is the only requirement. * Fix JobOperation.__repr__(). * Flip the FlowOperation constructor calls. * Reorganize the FlowGroup exec mode exception flow. 1. Raise error at the point of assumption. 2. Use dedicated exception class to propagate error. 3. Raise SubmitError only during submission. 4. Raise ScriptError during script generation. * Refactor run-cmd generation and fix one bug. * Simplified logic. * Use the group name as argument to -o in all cases. * Misc. improvements and optimizations. * Change finding of FlowGroup path to use stack. * may need to change again but fixed a bug on my computer. * Keep track of module path during FlowProject object instantiation.. * Change test strings to match new output and new pathfinding method * New pathfinding method solves some problems but doesn't resolve all of them. Namely, for our tests the 'path' is always the test file. * Removing test_group files from branch. * Refactor run logic in FlowGroups. * Remove six.PY2 testing * Fix rebase errors * Remove old function to find operation path * Adding path and entrypoint attributes to FlowProject. * Test work again with minor refactoring to account for this new API. * Addressing new path/entrypoint motif with test_templates * `FlowGroup` and `JobOperation` __repr__ methods fix/add * Update flow/errors.py Co-Authored-By: Carl Simon Adorf <csadorf@umich.edu> * Update docstring JobOperation. * Setup FlowProject._entrypoint as a dict * Using yield from to iterate over operations FlowGroups * Move cmd evaluation to when it is asked for. * Change add_operation of FlowProject to make group as well * Adding entrypoint argument to derived classes for tests * Reverting make_bundles * Fixing docstring for next_operation * Fixing store_bundled bug * str check in JobOperation cmd property * Callable directive support for entrypoints. * Using _collect_operations for _register_groups * Order statuses after grabbing all operation statuses * Change operations to OrderedDict in _gather_flow_groups * Add documentation * Add error catching back to test * Make eligible_for_submission not user facing * Add fork whitelisting from directives * Rebasing missed this * Fix errors introduced by merge. Involved FlowGroups respecting ignore conditions * Fix typos and style errors * Cleanup ``FlowGroup`` some * Change ``FlowGroup.intersects`` to ``FlowGroup.isdisjoint`` Matches set API better * Fix typo in function name * Move checking of conflicting group names earlier Now at the time of group functor name conflict is checked. * Remove ability to specify exec and run command generation * Remove exec mode Templates still broken. * Remove cmd_prefix and get_prefix from templates * All submitted jobs to ignore_conditions * Use `FlowGroup.create_run_job_operations` for run * Add operation specific directives * Add operations to `FlowProject._GROUP_NAMES` prevents users from having a group named the same as an operation since singleton groups with operation name are automatically defined, and shadowing should not be allowed. * Change group registering logic No longer recompute the operation list from `_collect_operations` Also remove ExecCommandError * Submission resource aggregation and group default directives * Change Stampede2Environment for group parallel jobs will currently break with bundling. * Fix bug in correctly handling directives with Groups * Fix group decorator to work with directives * Make function for resolving FlowGroup directives * Update flow/project.py Co-Authored-By: Bradley Dice <bdice@bradleydice.com> * Fix submission resource aggregation bug * Create FlowGroupEntry class Has two decoraters ``__call__`` and ``with_directives``. * Remove directives from FlowGroups * Add return type documentation. * Make default_directives required for create_run_job_operations * Update creation of submission JobOperations Remove directives from _submit_cmd Reduce noise in submission directives when 'ngpu', 'nranks', or 'omp_num_threads' is set to 0 * Add tests for directives handling * fix FlowGroupEntry bug and add documentation * Fix bug in FlowProject._resolve_directives * Fix bug on submission command generation * Apply suggestions from code review Co-Authored-By: Carl Simon Adorf <csadorf@umich.edu> Co-Authored-By: vyasr <vramasub@umich.edu> * Remove ``ScriptError`` (wasn't used). * Add get_id back to ``JobOperation`` and id property * Update *Operation docstrings and ``FlowGroup`` docstrings Also edit other comments * Change _flow_group to _flow_groups * Remove redundant calls to .keys() for dict * the buildin __contains__ or __iter__ is faster. * Modify `~FlowGroupEntry~.with_directives` function * Change default parameter values ``FlowGroup.__init__`` * Change entrpoint priority * Move to string logic into IgnoreConditions * Style changes * Remove redundant deprecated call * Change group test project class names * Fix bug in submission cmd string generation * Apply suggestions from code review Co-Authored-By: Carl Simon Adorf <csadorf@umich.edu> * Return function after FlowGroupEntry decorator Before we did not return the function back at the end * Add FlowGroup to docs * Remove get calls in get_status functions * Edit docstrings and change argument name Use ignore_conditions_on_execution instead of ignore_conditions_on_submit * Fix bug with get_status * Used define_test_project for groups too * Update tests for using groups I had to add operations to the project classes so some tests needed to be updated * Fix entrypoint for group tests * Don't evaluate entrypoint in run_cmd until necessary * Change tests to use mock_project more fully * Fix next_operations tests for new project class * Remove op5 from test project definition * Fix error from inconistent naming * Fix errors with testing * Fix error on CI. Entrypoints require we first cd into the test directory * Revert "Fix error on CI." This reverts commit aee0fe0. * Use __file__ to determine entrypoint for tests * Remove group specific test project * Apply suggestions from code review Co-Authored-By: Bradley Dice <bdice@bradleydice.com> Co-Authored-By: Carl Simon Adorf <csadorf@umich.edu> * Update FlowGroup and FlowProject API docs * Rename methods for creating JobOperation from FlowGroup * Revert previous changes __init__.py and errors.py * make FlowGroupEntry.with_directives take a single dict * Change cmd error checking to init : JobOperation * Update docstrings, help strings, and comments Also one style change * Only add --ignore-conditions when necessary Add logic to FlowGroup._sumbit_cmd * Fix error where groups were not run in parallel * Fix test to work with python 3.5 Status's with groups are sorted by operation name (this is due to operation status having to be grabbed through groups). We sort the output. This just makes the test do the same * Change test_main_status to not be order sensitive for operations * Update flow/project.py Co-Authored-By: Bradley Dice <bdice@bradleydice.com> * Fix FlowGroup.options and __repr__ Add test for __repr__ * Add comment explaining use of lazy cmd in JopOperation * Make TestGroupProject.test_flowgroup_repr not run in Python 3.5 * Make test_flowgroup_repr not order dependent * Update flow/project.py Co-Authored-By: vyasr <vramasub@umich.edu> * Update FlowGroup API docs * Update template tests * final changes * change changelog Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com> Co-authored-by: Bradley Dice <bdice@bradleydice.com> Co-authored-by: vyasr <vyas.ramasubramani@gmail.com>
- Loading branch information