Skip to content

job-manager: jobs in prolog/epilog cannot be canceled #3994

@jameshcorbett

Description

@jameshcorbett

If a jobtap plugin does not remove a prolog, the job becomes a zombie and cannot be killed (actually, trying to kill it gives an error: "job is not running") or canceled, and is stuck perpetually in the "RUN" state.

I'm guessing a similar issue exists for epilogs but I didn't test. Here is a prolog reproducer:

/************************************************************\
 * Copyright 2021 Lawrence Livermore National Security, LLC
 * (c.f. AUTHORS, NOTICE.LLNS, COPYING)
 *
 * This file is part of the Flux resource manager framework.
 * For details, see https://github.com/flux-framework.
 *
 * SPDX-License-Identifier: LGPL-3.0
\************************************************************/

#if HAVE_CONFIG_H
#include "config.h"
#endif

#include <flux/core.h>
#include <flux/jobtap.h>

#define CREATE_DEP_NAME "dws-create"
#define SETUP_PROLOG_NAME "dws-setup"


static int run_cb (flux_plugin_t *p,
                      const char *topic,
                      flux_plugin_arg_t *args,
                      void *arg)
{
    flux_t *h = flux_jobtap_get_flux (p);

    if (flux_jobtap_prolog_start (p, SETUP_PROLOG_NAME) < 0) {
        flux_log_error (h, "Failed to start jobtap prolog for dws");
        return -1;
    }
    return 0;
}

static const struct flux_plugin_handler tab[] = {
    { "job.state.run", run_cb, NULL },
    { 0 },
};

int flux_plugin_init (flux_plugin_t *p)
{
    if (flux_plugin_register (p, "dws-test", tab) < 0)
        return -1;
    return 0;
}

Not sure how much of an issue this is for the prolog/epilog mechanism since the undesirable behavior should only happen when a developer makes a mistake in their jobtap plugin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions