-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Description
If a jobtap plugin does not remove a prolog, the job becomes a zombie and cannot be killed (actually, trying to kill it gives an error: "job is not running") or canceled, and is stuck perpetually in the "RUN" state.
I'm guessing a similar issue exists for epilogs but I didn't test. Here is a prolog reproducer:
/************************************************************\
* Copyright 2021 Lawrence Livermore National Security, LLC
* (c.f. AUTHORS, NOTICE.LLNS, COPYING)
*
* This file is part of the Flux resource manager framework.
* For details, see https://github.com/flux-framework.
*
* SPDX-License-Identifier: LGPL-3.0
\************************************************************/
#if HAVE_CONFIG_H
#include "config.h"
#endif
#include <flux/core.h>
#include <flux/jobtap.h>
#define CREATE_DEP_NAME "dws-create"
#define SETUP_PROLOG_NAME "dws-setup"
static int run_cb (flux_plugin_t *p,
const char *topic,
flux_plugin_arg_t *args,
void *arg)
{
flux_t *h = flux_jobtap_get_flux (p);
if (flux_jobtap_prolog_start (p, SETUP_PROLOG_NAME) < 0) {
flux_log_error (h, "Failed to start jobtap prolog for dws");
return -1;
}
return 0;
}
static const struct flux_plugin_handler tab[] = {
{ "job.state.run", run_cb, NULL },
{ 0 },
};
int flux_plugin_init (flux_plugin_t *p)
{
if (flux_plugin_register (p, "dws-test", tab) < 0)
return -1;
return 0;
}Not sure how much of an issue this is for the prolog/epilog mechanism since the undesirable behavior should only happen when a developer makes a mistake in their jobtap plugin.
Metadata
Metadata
Assignees
Labels
No labels