-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility of resuming multinode jobs #1955
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -492,6 +492,8 @@ struct job { | |
int ji_stderr; /* socket for stderr */ | ||
int ji_ports[2]; /* ports for stdout/err */ | ||
enum bg_hook_request ji_hook_running_bg_on; /* set when hook starts in the background*/ | ||
int ji_msconnected; /* 0 - not connected, 1 - connected */ | ||
pbs_list_head ji_multinodejobs; /* links to recovered multinode jobs */ | ||
#else /* END Mom ONLY - start Server ONLY */ | ||
struct batch_request *ji_pmt_preq; /* outstanding preempt job request for deleting jobs */ | ||
int ji_discarding; /* discarding job */ | ||
|
@@ -592,6 +594,8 @@ struct job { | |
#ifdef PBS_MOM | ||
tm_host_id ji_nodeidx; /* my node id */ | ||
tm_task_id ji_taskidx; /* generate task id's for job */ | ||
int ji_stdout; | ||
int ji_stderr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Initialize these in server/job_func.c:job_alloc() under #ifdef PBS_MOM section. |
||
#if MOM_ALPS | ||
long ji_reservation; | ||
/* ALPS reservation identifier */ | ||
|
@@ -745,6 +749,8 @@ typedef struct infoent { | |
#define IM_EXEC_PROLOGUE 24 | ||
#define IM_CRED 25 | ||
#define IM_PMIX 26 | ||
#define IM_RECONNECT_TO_MS 27 | ||
#define IM_JOIN_RECOV_JOB 28 | ||
|
||
#define IM_ERROR 99 | ||
#define IM_ERROR2 100 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lease initialize ji_msconnected in server/job_func.c:job_alloc() under #ifdef PBS_MOM section.
Initialize also ji_multinodejobs in server/job_func.c:job_alloc() by calling CLEAR_HEAD(ji_multinodejobs).
and also freeing memory allocated to ji_multinodejobs in server/job_func.c:job_free() under the #ifdef PBS_MOM section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I'm already initializing these variables in job_func.c:355,356, should I do something more than that?
Since ji_multinodejobs only contains pointers to jobs that are already managed, will CLEAR_HEAD(ji_multinodejobs) suffice in job_free()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that you were already initializing them in job_func.c under job_alloc() so that's good. For the job_free(), yes, just do the CLEAR_HEAD.