From 8ff7b1c460d3279ed9ba072e904a7f9a24b0a841 Mon Sep 17 00:00:00 2001 From: Artem Polyakov Date: Fri, 14 Apr 2017 12:25:20 +0700 Subject: [PATCH] orte/pmix: Do not set orted exit status to one from proc abort The fact that application proc called Abort (read failed) doesn't mean that ORTE subsystem has failed - vice versa it does it's work to gracefuly exit the whole application. orted exiting with non-zero status creates a problem for at least plm/slurm environments where orteds are launched via `srun` with "--kill-on-bad-exit" flag. If one of orteds has exited with non- zero status slurm will immediately kill all other orteds. As the result we see a lot of leftover in the `/tmp` directory. (ported from 4af7a0827fa0bfc2d7d22016edd2ed3347fdd5ab) Signed-off-by: Artem Polyakov --- orte/orted/pmix/pmix_server_gen.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/orte/orted/pmix/pmix_server_gen.c b/orte/orted/pmix/pmix_server_gen.c index c9b6b1ed980..57b94fad6db 100644 --- a/orte/orted/pmix/pmix_server_gen.c +++ b/orte/orted/pmix/pmix_server_gen.c @@ -14,7 +14,7 @@ * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2011 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. - * Copyright (c) 2014 Mellanox Technologies, Inc. + * Copyright (c) 2014-2017 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -102,7 +102,6 @@ int pmix_server_abort_fn(opal_process_name_t *proc, void *server_object, p->exit_code = status; } - ORTE_UPDATE_EXIT_STATUS(status); ORTE_ACTIVATE_PROC_STATE(proc, ORTE_PROC_STATE_CALLED_ABORT); /* release the caller */