Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add job update service and new job-update(1) command #5409

Merged
merged 19 commits into from
Aug 31, 2023

Commits on Aug 25, 2023

  1. testsuite: fix recursive jobspec-update in job-list test plugin

    Problem: In the jobspec-update-job-list jobtap plugin, a test
    jobspec-update event is posted from the job.state.sched callback.
    However, in the future jobspec-update events will cause a job in SCHED
    state to transition back to PRIORITY, and when the job then goes back
    to SCHED state, the jobspec-update event will be emitted again, the
    job will transition back to PRIORITY again, creating an infinite loop.
    
    Ensure the test plugin only emits a jobspec-update once during the
    test to avoid infinite recursion. This will require the plugin to be
    reloaded to continue working after one job, but at this point that
    is not necessary.
    grondo committed Aug 25, 2023
    Configuration menu
    Copy the full SHA
    ab3e189 View commit details
    Browse the repository at this point in the history
  2. job-manager: move jobs from SCHED->PRIORITY on jobspec-update

    Problem: When the jobspec for a job is modified by a jobspec-update
    event the job may need to be reprioritized, possibly held back from
    the scheduler, or submitted to a different queue.
    
    As with the urgency event, kick jobs in SCHED state back to PRIORITY
    on a jobspec-update event.
    grondo committed Aug 25, 2023
    Configuration menu
    Copy the full SHA
    8f07faa View commit details
    Browse the repository at this point in the history

Commits on Aug 31, 2023

  1. job-manager: add jobspec update convenience functions

    Problem: Jobspec updates will need to be manipulated and applied
    in multiple modules within the job manager, but currently the
    functionality to validate and apply jobspec updates is within static
    functions in event.c and jobtap.c.
    
    Locate some jobspec update functions centrally in job.c so they may
    easily be accessed from other job manager modules.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    5557cd9 View commit details
    Browse the repository at this point in the history
  2. job-manager: use jobspec_apply_updates() in event.c

    Problem: The code to apply jobspec updates from the jobspec-update
    event in the job manager duplicates the job_apply_jobspec_updates()
    function exported from job-manager/job.c.
    
    Use jobspec_apply_jobspec_updates() to apply jobspec updates instead
    of the duplicated code.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    ae52e3a View commit details
    Browse the repository at this point in the history
  3. job-manager: add support functions for job updates

    Problem: The job update service will require assistance of the jobtap
    plugin stack to validate requested updates.
    
    Add a couple jobtap support functions for this purpose:
    
     - jobtap_job_update(): Call job.update.KEY callback to allow a
       single update for KEY.
     - jobtap_validate_updates(); Apply updates to jobspec and call
       the job.validate stack on the modified jobspec.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    63fa4b4 View commit details
    Browse the repository at this point in the history
  4. python: allow Jobspec.setattr() key to start with attributes.

    Problem: The Jobspec setattr() method always prepends 'attributes.'
    the the key argument, but this can be inconvenient when the key
    already contains the 'attributes.' prefix, since that prefix must
    then be removed before calling jobspec.setattr().
    
    Only prepend 'attributes.' to the key argument of setattr() if it
    doesn't already contain that prefix.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    cc045e5 View commit details
    Browse the repository at this point in the history
  5. python: add Jobspec.getattr() method

    Problem: There exists a Jobspec setattr() method which sets an
    attribute based on "dotted key" notation, but no equivalent getattr()
    method to get dotted keys.
    
    Add a Jobspec.getattr() method.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    d2b4d7b View commit details
    Browse the repository at this point in the history
  6. job-manager: validate all job states in limit-* plugins

    Problem: The limit-duration and limit-job-size plugins do not validate
    jobs unless they are in the NEW state, ostensibly because job.validate
    may be called after a plugin reload or job manager restart. However, it
    is no longer the case that job.validate is called in these situations,
    and it may be necessary to call job.validate for jobs beyond the NEW
    state when processing job updates.
    
    Drop the checks for FLUX_JOB_STATE_NEW in the limit-duration and
    limit-job-size plugins.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    996d306 View commit details
    Browse the repository at this point in the history
  7. job-manager: add job update service

    Problem: There is no service in the job manager for requesting the
    update of jobspec or other job parameters.
    
    Add a new update service to the job manager. Job updates can now
    be requested via a job-manager.update RPC, the payload of which
    includes the target jobid and an "updates" object which follows the
    jobspec-update specification in RFC 20.
    
    Updates for a key are only allowed if a plugin callback exists for the
    jobtap topic string "job.update.KEY", and the callback returns success.
    If multiple keys are updated in the same request, they all must be
    allowed or none will be applied.
    
    Once updates have been validated, then the proposed modified jobspec
    is sent through the job.validate plugin call stack. If the new jobspec
    fails to be sucessfully validated, then the updates are rejected and
    an error is returned to the requestor.
    
    Individual plugins may request that the job.validate be skipped for a
    given key by settin a 'validated' flag in the plugin OUT arguements.
    However, the job.validate call will still be made if multiple keys
    are being updated and not all of them set a validated flag.
    
    Once the update is allowed and validated, then a jobspec-update event
    is posted for the job and an empty success response is issued.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    bf56891 View commit details
    Browse the repository at this point in the history
  8. job-manager: add update-duration builtin plugin

    Problem: Once a job is submitted the duration cannot be updated.
    
    Add an update-duration plugin that adds a
    
     job.update.attributes.system.duration
    
    callback so that jobspec duration updates are supported for pending
    jobs. By default, users can update the duration of their own jobs
    up to the currently configured limit, and instance owners can update
    duration to any value.
    
    The ability of the instance owner to bypass limits can be disabled
    by reloading the plugin with the config parameter owner-allow-any=0.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    bd923ea View commit details
    Browse the repository at this point in the history
  9. cmd: add flux-update(1)

    Problem: There is no command line interface to request job updates.
    
    Add the flux-update(1) command, which takes a jobid and one or more
    KEY=VALUE pairs on the command line, and sends an update request to
    the job manager.
    
    Special handling for specific keys is supported for a more convenient
    user interface. Currently, any key which doesn't start with
    `attributes.`, `resources.` or `tasks.` is assumed to be prefixed
    with `attributes.system.`, so `duration=10m` is translated to
    `attributes.system.duration=10` for example.
    
    Key values may also get special handling through existence of
    an `update_{keystr}` method in the JobspecUpdates class, where
    `keystr` is the key with dots replaced by underscore. For now, an
    `update_attributes_system_duration()` function is provided which allows
    'duration' values which support +/-FSD or FSD. When adjusting duration,
    the current jobspec is fetched with any updates applied to get the
    most up-to-date duration.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    7d3718b View commit details
    Browse the repository at this point in the history
  10. completions: add bash completions for flux-update(1)

    Problem: There are no tab completions for the flux-update(1) command.
    
    Add a completion handler for flux-update(1) to
    etc/completions/flux.pre.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    d9544f9 View commit details
    Browse the repository at this point in the history
  11. job-manager: call job.update plugin stack after jobspec-update

    Problem: There is no way for a jobtap plugin to get notified of a
    jobspec update after the jobspec updates have been applied. Jobs only
    transition back to PRIOITY state from SCHED, so the job.state.priority
    callback will not always be sufficient, and subscribing directly to
    the jobspec-update event would require the plugin to manually apply
    updates, and may not capture other ways a jobspec or job might be
    updated in the future.
    
    Introduce a 'job.update' callback topic which is called after
    any jobspec update has been applied. If the job is transitioning
    back to the PRIORITY state, this callback will be called before
    the job.state.priority topic so that plugins may adjust internal
    state that would normally be established prior to the first call
    to job.state.priority.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    37d48ff View commit details
    Browse the repository at this point in the history
  12. testsuite: add job update tests

    Problem: There are no tests of the job update support in flux.
    
    Add a new test, t2290-job-update.t, and helper jobtap plugin,
    job-manager/plugins/update-test.c, and add some basic testing of
    the job update support using `flux update`.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    4fccb67 View commit details
    Browse the repository at this point in the history
  13. doc: add flux-update(1)

    Problem: The flux-update(1) command is not documented.
    
    Add a short manual page for flux-update(1).
    
    Update spelling dictionary as necessary.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    33892fd View commit details
    Browse the repository at this point in the history
  14. job-manager: support immutable job flag

    Problem: It would be useful to disable updates for individual jobs,
    but there is currently no way to do this.
    
    Add an 'immutable' flag to the job manager job structure. Support
    adding this flag via the `set-flags` event.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    61f1e77 View commit details
    Browse the repository at this point in the history
  15. job-manager: deny updates for jobs that bypass update validation

    Problem: When the instance owner updates a guest job in order to bypass
    validation (e.g. to update duration of a job beyond current limits),
    a future job update of a different attribute may fail because the job
    will be revalidated. This causes a confusing error that is unrelated
    to the user's request.
    
    When a job update bypasses validation, the update request is made by
    the instance owner, and the job user is not the instance owner, mark
    the job as immutable to prevent future updates by the job owner. This
    not only results in a less confusing error
    
     "job is immutable due to previous instance owner update"
    
    and also prevents the need to track which attribute updates have
    bypassed validation in past updates, which could be complex and could
    introduce unintended consequences.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    518a726 View commit details
    Browse the repository at this point in the history
  16. testsuite: test update immutability of jobs

    Problem: There are no tests of jobs which have been updated by the
    instance owner and therefore have the immutable flag set.
    
    Test update of guest jobs in t2290-job-update.t before and after an
    instance owner update. Ensure an immutable job cannot be updated by
    the user.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    15fa70f View commit details
    Browse the repository at this point in the history
  17. doc: add note about immutable jobs to flux-update(1)

    Problem: The flux-update(1) man page does not mention that jobs
    updated by the instance owner may become immutable.
    
    Add an explanation of how jobs updated by the instance owner can
    bypass validation, and why this makes the jobs immutable.
    grondo committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    c41d143 View commit details
    Browse the repository at this point in the history