Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple commits #1931

Merged
merged 8 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .github/workflows/close-stale-issues.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# The idea behind this Action is to prevent the situation where a user
# files a Github Issue, someone asks for clarification / more
# information, but the original poster never provides the information.
# The issue then becomes forgotten and abondoned.
#
# Instead of that scenario, PMIx community members can assign a
# label to Github Issues indicating that we're waiting for the user to
# reply. If too much time elapses with no reply, mark the Issue as
# stale and emit a warning that we'll close the issue if we continue
# to receive no reply. If we timeout again with no reply after the
# warning, close the Issue and emit a comment explaining why.
#
# If the user *does* reply, the label is removed, and this bot won't
# touch the Issue. Specifically: this bot will never mark stale /
# close an Issue that doesn't have the specified label.
#
# Additionally, we are *only* marking stale / auto-closing Github
# Issues -- not Pull Requests.
#
# This is a cron-based Action that runs a few times a day, just so
# that we don't mark stale / close a bunch of issues all at once.
#
# While the actions/stale bot Action used here is capable of removing
# the label when a user replies to the Issue, we actually use a 2nd
# Action (removing-awaiting-user-info-label.yaml) to remove the label.
# We do this because that 2nd Action runs whenever a comment is
# created -- not via cron. Hence, the 2nd Action will remove the
# label effectively immediately when the user replies (vs. up to
# several hours later).

name: Close stale issues
on:
schedule:
# Run it a few times a day so as not to necessarily mark stale /
# close a bunch of issues at once.
- cron: '0 1,5,9,13,17,21 * * *'

jobs:
stale:
runs-on: ubuntu-latest
steps:
# https://github.com/marketplace/actions/close-stale-issues
- uses: actions/stale@v9
with:
# If there are no replies for 14 days, mark the issue as
# "stale" (and emit a warning).
days-before-stale: 14
# If there are no replies for 14 days after becoming stale,
# then close the issue (and emit a message explaining why).
days-before-close: 14

# Never close PRs
days-before-pr-close: -1

# We only close issues with this label
only-labels: "Awaiting response"
close-issue-label: Closed due to no reply

# Messages that we put in comments on issues
stale-issue-message: |
It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.
close-issue-message: |
Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned.

I'm going to close this issue. If I'm wrong and this issue is *not* abandoned, please feel free to re-open it. Thank you!
30 changes: 30 additions & 0 deletions .github/workflows/remove-awaiting-user-info-label.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# This Action is run in conjunction with close-stale-issues.yaml. See
# that file for a more complete description of how they work together.

name: 'Remove "Awaiting response" label when there has been a reply'
on:
issue_comment:
types:
- created

jobs:
build:
runs-on: ubuntu-latest
# From
# https://github.com/marketplace/actions/close-issues-after-no-reply:
# only remove the label if someone replies to an issue who is not
# an owner or collaborator on the repo.
if: |
github.event.comment.author_association != 'OWNER' &&
github.event.comment.author_association != 'COLLABORATOR'
steps:
- name: 'Remove "Awaiting response" label'
uses: octokit/request-action@v2.x
continue-on-error: true
with:
route: DELETE /repos/:repository/issues/:issue/labels/:label
repository: ${{ github.repository }}
issue: ${{ github.event.issue.number }}
label: "Awaiting response"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2 changes: 1 addition & 1 deletion config/oac
Submodule oac updated 1 files
+51 −1 oac_setup_sphinx.m4
5 changes: 3 additions & 2 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
# All Rights reserved.
# Copyright (c) 2021-2023 Nanook Consulting. All rights reserved.
# Copyright (c) 2021 FUJITSU LIMITED. All rights reserved.
# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved.
# Copyright (c) 2023-2024 Jeffrey M. Squyres. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
Expand Down Expand Up @@ -720,7 +720,8 @@ AC_INCLUDES_DEFAULT
# Setup Sphinx processing
#

OAC_SETUP_SPHINX([$srcdir/docs/_build/html/index.html], [])
OAC_SETUP_SPHINX([$srcdir/docs/_build/html/index.html], [],
[$srcdir/docs/requirements.txt])
AC_CHECK_PROGS(PYTHON, [python3 python python2])

AS_IF([test -n "$OAC_MAKEDIST_DISABLE"],
Expand Down
5 changes: 1 addition & 4 deletions src/docs/show-help-files/help-prte.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. -*- rst -*-

Copyright (c) 2021-2023 Nanook Consulting. All rights reserved.
Copyright (c) 2021-2024 Nanook Consulting All rights reserved.
Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved.

$COPYRIGHT$
Expand Down Expand Up @@ -79,9 +79,6 @@ option to the help request as ``--help <option>``.
* - ``--leave-session-attached``
- Do not discard stdout/stderr of remote PRRTE daemons

* - ``--test-suicide <arg0>``
- Direct that the specified daemon suicide after delay

* - ``--display <arg0>``
- Comma-delimited list of options for displaying information

Expand Down
5 changes: 1 addition & 4 deletions src/docs/show-help-files/help-prterun.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. -*- rst -*-

Copyright (c) 2021-2023 Nanook Consulting. All rights reserved.
Copyright (c) 2021-2024 Nanook Consulting All rights reserved.
Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved.

$COPYRIGHT$
Expand Down Expand Up @@ -107,9 +107,6 @@ option to the help request as ``--help <option>``.
- Direct the specified processes to stop at an
application-controlled location

* - ``--test-suicide <arg0>``
- Direct that the specified daemon suicide after delay

* - ``--do-not-launch``
- Perform all necessary operations to prepare to launch the
application, but do not actually launch it (usually used to
Expand Down
47 changes: 1 addition & 46 deletions src/mca/errmgr/base/errmgr_base_fns.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
* Copyright (c) 2014 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2020 IBM Corporation. All rights reserved.
* Copyright (c) 2021-2022 Nanook Consulting. All rights reserved.
* Copyright (c) 2021-2024 Nanook Consulting. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -92,48 +92,3 @@ void prte_errmgr_base_log(int error_code, char *filename, int line)
pmix_output(0, "%s PRTE_ERROR_LOG: %s in file %s at line %d",
PRTE_NAME_PRINT(PRTE_PROC_MY_NAME), errstring, filename, line);
}

void prte_errmgr_base_abort(int error_code, char *fmt, ...)
{
va_list arglist;

/* If there was a message, output it */
va_start(arglist, fmt);
if (NULL != fmt) {
char *buffer = NULL;
pmix_vasprintf(&buffer, fmt, arglist);
pmix_output(0, "%s", buffer);
free(buffer);
}
va_end(arglist);

/* if I am a daemon or the HNP... */
if (PRTE_PROC_IS_MASTER || PRTE_PROC_IS_DAEMON) {
/* whack my local procs */
if (NULL != prte_odls.kill_local_procs) {
prte_odls.kill_local_procs(NULL);
}
/* whack any session directories */
prte_session_dir_cleanup(PRTE_JOBID_WILDCARD);
}

/* if a critical connection failed, or a sensor limit was exceeded, exit without dropping a core
*/
if (PRTE_ERR_CONNECTION_FAILED == error_code || PRTE_ERR_SENSOR_LIMIT_EXCEEDED == error_code) {
prte_ess.abort(error_code, false);
} else {
prte_ess.abort(error_code, true);
}

/*
* We must exit in prte_ess.abort; all implementations of prte_ess.abort
* contain __prte_attribute_noreturn__
*/
/* No way to reach here */
}

int prte_errmgr_base_abort_peers(pmix_proc_t *procs, int32_t num_procs, int error_code)
{
PRTE_HIDE_UNUSED_PARAMS(procs, num_procs, error_code);
return PRTE_ERR_NOT_IMPLEMENTED;
}
30 changes: 9 additions & 21 deletions src/mca/errmgr/base/errmgr_base_frame.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* Copyright (c) 2014-2019 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2020 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2021-2022 Nanook Consulting. All rights reserved.
* Copyright (c) 2021-2024 Nanook Consulting. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -48,27 +48,21 @@

#include "src/mca/errmgr/base/static-components.h"

/*
* Globals
*/
prte_errmgr_base_t prte_errmgr_base = {
.error_cbacks = PMIX_LIST_STATIC_INIT
};

/* Public module provides a wrapper around previous functions */
prte_errmgr_base_module_t prte_errmgr_default_fns = {.init = NULL, /* init */
.finalize = NULL, /* finalize */
.logfn = prte_errmgr_base_log,
.abort = prte_errmgr_base_abort,
.abort_peers = prte_errmgr_base_abort_peers,
.enable_detector = NULL};
prte_errmgr_base_module_t prte_errmgr_default_fns = {
.init = NULL, /* init */
.finalize = NULL, /* finalize */
.logfn = prte_errmgr_base_log
};

/* NOTE: ABSOLUTELY MUST initialize this
* struct to include the log function as it
* gets called even if the errmgr hasn't been
* opened yet due to error
*/
prte_errmgr_base_module_t prte_errmgr = {.logfn = prte_errmgr_base_log};
prte_errmgr_base_module_t prte_errmgr = {
.logfn = prte_errmgr_base_log
};

static int prte_errmgr_base_close(void)
{
Expand All @@ -80,9 +74,6 @@ static int prte_errmgr_base_close(void)
/* always leave a default set of fn pointers */
prte_errmgr = prte_errmgr_default_fns;

/* destruct the callback list */
PMIX_LIST_DESTRUCT(&prte_errmgr_base.error_cbacks);

return pmix_mca_base_framework_components_close(&prte_errmgr_base_framework, NULL);
}

Expand All @@ -95,9 +86,6 @@ static int prte_errmgr_base_open(pmix_mca_base_open_flag_t flags)
/* load the default fns */
prte_errmgr = prte_errmgr_default_fns;

/* initialize the error callback list */
PMIX_CONSTRUCT(&prte_errmgr_base.error_cbacks, pmix_list_t);

/* Open up all available components */
return pmix_mca_base_framework_components_open(&prte_errmgr_base_framework, flags);
}
Expand Down
13 changes: 1 addition & 12 deletions src/mca/errmgr/base/errmgr_private.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* All rights reserved.
* Copyright (c) 2017-2020 Intel, Inc. All rights reserved.
* Copyright (c) 2020 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2021-2022 Nanook Consulting. All rights reserved.
* Copyright (c) 2021-2024 Nanook Consulting. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -48,13 +48,6 @@
*/
BEGIN_C_DECLS

/* define a struct to hold framework-global values */
typedef struct {
pmix_list_t error_cbacks;
} prte_errmgr_base_t;

PRTE_EXPORT extern prte_errmgr_base_t prte_errmgr_base;

/* declare the base default module */
PRTE_EXPORT extern prte_errmgr_base_module_t prte_errmgr_default_fns;

Expand All @@ -63,9 +56,5 @@ PRTE_EXPORT extern prte_errmgr_base_module_t prte_errmgr_default_fns;
*/
PRTE_EXPORT void prte_errmgr_base_log(int error_code, char *filename, int line);

PRTE_EXPORT void prte_errmgr_base_abort(int error_code, char *fmt, ...)
__prte_attribute_format__(__printf__, 2, 3);
PRTE_EXPORT int prte_errmgr_base_abort_peers(pmix_proc_t *procs, int32_t num_procs, int error_code);

END_C_DECLS
#endif
8 changes: 3 additions & 5 deletions src/mca/errmgr/dvm/errmgr_dvm.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
* All rights reserved.
* Copyright (c) 2014-2020 Intel, Inc. All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
* Copyright (c) 2021-2023 Nanook Consulting. All rights reserved.
* Copyright (c) 2021-2024 Nanook Consulting. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -71,9 +71,7 @@ static int finalize(void);
prte_errmgr_base_module_t prte_errmgr_dvm_module = {
.init = init,
.finalize = finalize,
.logfn = prte_errmgr_base_log,
.abort = prte_errmgr_base_abort,
.abort_peers = prte_errmgr_base_abort_peers
.logfn = prte_errmgr_base_log
};

/*
Expand Down Expand Up @@ -488,14 +486,14 @@ static void proc_errors(int fd, short args, void *cbdata)
PRTE_FLAG_SET(jdata, PRTE_JOB_FLAG_ABORTED);
/* kill the job */
_terminate_job(jdata->nspace);
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_FAILED_TO_START);
}
/* if this was a daemon, report it */
if (PMIX_CHECK_NSPACE(jdata->nspace, PRTE_PROC_MY_NAME->nspace)) {
/* output a message indicating we failed to launch a daemon */
pmix_show_help("help-errmgr-base.txt", "failed-daemon-launch",
true, prte_tool_basename);
}
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_FAILED_TO_START);
break;

case PRTE_PROC_STATE_CALLED_ABORT:
Expand Down
28 changes: 1 addition & 27 deletions src/mca/errmgr/errmgr.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* reserved.
* Copyright (c) 2013-2020 Intel, Inc. All rights reserved.
* Copyright (c) 2014 NVIDIA Corporation. All rights reserved.
* Copyright (c) 2021-2022 Nanook Consulting. All rights reserved.
* Copyright (c) 2021-2024 Nanook Consulting. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -99,27 +99,6 @@ typedef int (*prte_errmgr_base_module_finalize_fn_t)(void);
*/
typedef void (*prte_errmgr_base_module_log_fn_t)(int error_code, char *filename, int line);

/**
* Alert - self aborting
* This function is called when a process is aborting due to some internal error.
* It will finalize the process
* itself, and then exit - it takes no other actions. The intent here is to provide
* a last-ditch exit procedure that attempts to clean up a little.
*/
typedef void (*prte_errmgr_base_module_abort_fn_t)(int error_code, char *fmt, ...)
__prte_attribute_format_funcptr__(__printf__, 2, 3);

/**
* Alert - abort peers
* This function is called when a process wants to abort one or more peer processes.
* For example, MPI_Abort(comm) will use this function to terminate peers in the
* communicator group before aborting itself.
*/
typedef int (*prte_errmgr_base_module_abort_peers_fn_t)(pmix_proc_t *procs, int32_t num_procs,
int error_code);

typedef void (*prte_errmgr_base_module_enable_detector_fn_t)(bool flag);

/*
* Module Structure
*/
Expand All @@ -130,11 +109,6 @@ struct prte_errmgr_base_module_2_3_0_t {
prte_errmgr_base_module_finalize_fn_t finalize;

prte_errmgr_base_module_log_fn_t logfn;
prte_errmgr_base_module_abort_fn_t abort;
prte_errmgr_base_module_abort_peers_fn_t abort_peers;

/* start error detector and propagator */
prte_errmgr_base_module_enable_detector_fn_t enable_detector;
};
typedef struct prte_errmgr_base_module_2_3_0_t prte_errmgr_base_module_2_3_0_t;
typedef prte_errmgr_base_module_2_3_0_t prte_errmgr_base_module_t;
Expand Down
Loading
Loading