-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue337 #362
Issue337 #362
Conversation
@tpatki: great start! It seems a few minor changes can help me review your PR. You may want to:
Also for commit messages. The convention has been:
|
@tpatki: Sorry I just updated my comments above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tpatki! Generally LGTM, just minor nits on the tests.
t/t1008-runtime-sched-params.t
Outdated
test_expect_success 'sched-params: getting sched params at runtime works' ' | ||
adjust_session_info 4 && | ||
flux hwloc reload ${excl_4N4B} && | ||
flux module load sched sched-once=true && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend explicitly setting the delay-sched and queue-depth arguments on load so that if the defaults ever change, that doesn't affect this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteVwonder: Will do.
t/t1008-runtime-sched-params.t
Outdated
flux wreck sched-params set queue-depth=4 && | ||
timed_sync_wait_job 10 && | ||
verify_1N_nproc_sleep_jobs ${excl_4N4B_nc} | ||
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding a flux wreck sched-params get queue-depth
here to ensure that the value was actually set correctly. Right now, I believe this test just ensures that Flux doesn't crash when you call flux wreck sched-params get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid writing to a file too by just using test flux wreck sched-params get queue-depth -eq 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I only added support for flux wreck sched-params get
, which returns all the scheduler parameters.
I didn't add support for flux wreck sched-params get queue-depth
or flux wreck sched-params get queue-depth,delay-sched
. This would require me to introduce redundant code (similar to sched_params_args, but one that "gets" instead of "sets"), and I didn't want to do that. I think this is fine as a first cut, let me know if you think otherwise.
To support something like flux wreck sched-params get queue-depth,delay-sched
, we will
(1) Either have to change sched_params_args
to have a flag for set/get and have the latter return the sched_params_t
object, which can then be packed and sent back to the requester, or,
(2)Add similiar parsing code in either the subcommand in wreck or the callback in sched.c
I was trying to avoid both of those things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require me to introduce redundant code (similar to sched_params_args, but one that "gets" instead of "sets"), and I didn't want to do that. I think this is fine as a first cut, let me know if you think otherwise.
This makes sense. Thanks @tpatki!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use flux wreck sched-params get | grep queue-depth
to simulate a "get" of a single value? It would be nice to verify the set worked in the tests.
sched/sched.c
Outdated
|
||
flux_log (h, LOG_DEBUG, "return values of sched_params "); | ||
|
||
msg2 = flux_event_encode ("sched.params.returned", NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, it seems like the only client that cares about a parameter query being answered is the client that made the original query. And that client is about to get an RPC response, so this event seems redundant.
Can you elaborate on the use-cases that you had in mind for this event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused about this myself, and was following the convention used by the existing include/exclude/cancel callbacks.
This event (msg2) is used in the flux_send
call that goes out to the broker, which is different than the flux_respond_pack
which contains the data requested by the client. I wasn't sure why the other callbacks had it, and if the response to the client was different than the information sent out to the broker. Maybe @dongahn or @grondo have some clarity here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of exclude and include request, an extra event needs to be generated so that the scheduler will get an resource event and invoke a scheduler loop. For this particular case, we probably won' t need an event. Sorry if it wasn't obvious in the code. (Should have had a comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. Thanks Dong! Yes it wasn't clear to me, and didn't want to break anything or cause a race condition. I'll remove the event/flux_send from both the callbacks I added (and add a comment to capture the above discussion as well).
@dongahn: Shall fix the PR per your suggestions soon (need to head out for a bit but will revisit later today evening). |
Thanks @tpatki |
@dongahn, @SteVwonder: I submitted a separate PR to flux-core for the new |
Codecov Report
@@ Coverage Diff @@
## master #362 +/- ##
==========================================
- Coverage 73.96% 73.96% -0.01%
==========================================
Files 61 61
Lines 10211 10263 +52
==========================================
+ Hits 7553 7591 +38
- Misses 2658 2672 +14
Continue to review full report at Codecov.
|
@dongahn: updated as per your suggestion. Let me know if I missed anything. Thanks! |
sched/sched.c
Outdated
|
||
if (sprms) | ||
rc = sched_params_args (sprms, &(ctx->arg.s_params)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think calling sched_params_args ()
alone will lead to scheduling behavior changes for all parameters. queue-depth
would be okay. But delay-sched
would not. If the sched load option was delay-sched=false
, and the user executes flux wreck sched-param set delay-sched=true
to change it. This alone will not change the scheduler to start delay scheduling.
Generally speaking, as we will add more parameters, we can't expect only changing a field in the sched context will effect the requested behavior.
I think this can be addressed if we refactor adjust_for_sched_params ()
into effect_sched_params ()
Here, you can have a logic to dynamically adjust the behavior based on the given scheduling parameters.
I have to believe you can make it so that the function will work for both load time options as well as dynamically requested options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, I will work on fixing this and also on testing if the change took effect. I was thinking about this as well, thanks for the pointers!
sched/sched.c
Outdated
if (flux_msg_get_userid (msg, &userid) < 0) | ||
goto error; | ||
|
||
flux_log (h, LOG_INFO, "sched_params change requested by user at runtime (%u).", userid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Maybe (%u)
can be next to user?
sched/sched.c
Outdated
goto error; | ||
|
||
flux_log (h, LOG_INFO, "sched_param values requested by user (%u).", userid); | ||
if (flux_request_unpack (msg, NULL, "{}") < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to have bulk get AND individual get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only bulk get for now, lets discuss individual get in person. Take a look at the comment I added to Stephen's question above regarding that. I'm not sure how to avoid duplicate code there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. OK. Yeah I'm ok with bulk get only since this will have be refactored later anyway within flux sched command. Thanks @tpatki.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also think about how to refactor sched_params_args
so we don't write duplicate code for the individual get. I'll add an issue corresponding to this once this PR is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongahn, @SteVwonder: PRs now support individual get as well. Please see test cases.
t/t1008-runtime-sched-params.t
Outdated
#set -x | ||
|
||
test_description='Test dynamic updation of queue depth. | ||
First cut. Ensure jobs are correctly scheduled under different values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "First cut."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor nit. I don't think you need "Frist cut" here.
t/t1008-runtime-sched-params.t
Outdated
timed_sync_wait_job 10 && | ||
verify_1N_nproc_sleep_jobs ${excl_4N4B_nc} | ||
' | ||
test_expect_success 'sched-params: unloaded sched module' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great you have added many tests! You think it will be possible to test whether "set" has led to scheduler behavior changes as well?
@tpatki: Thank you for putting this together so quickly. I have just added some comments inline. |
@dongahn, @SteVwonder, @grondo: Please see the modified PR that addresses the changes requested. I have added a few test cases, and have manually verified each of the cases for |
LGTM. It just needs a rebase to master. Thanks, @tpatki! This was more work than I thought it would be, but this should come in handy. |
@SteVwonder: rebased. |
@tpatki and I looked at the code together a bit after our meeting this morning. There will be some minor changes. |
@dongahn: updated as per our discussion. |
sched/sched.c
Outdated
|
||
/* If one of the watchers is not available, send error */ | ||
if ((!ctx->before && ctx->after) || (!ctx->after && ctx->before)) { | ||
rc = EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following up. I would actually go with:
rc = -1;
errno = EINVAL;
sched/sched.c
Outdated
} | ||
/* If one of the watchers is not available, send error */ | ||
if ((!ctx->before && ctx->after) || (!ctx->after && ctx->before)) { | ||
rc = EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following up. I would actually go with:
rc = -1;
errno = EINVAL;
sched/sched.c
Outdated
if (flux_msg_get_userid (msg, &userid) < 0) | ||
goto error; | ||
|
||
flux_log (h, LOG_INFO, "sched_params change requested by user (%u) at runtime.", userid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems long. Does it fit 80 characters? If not, I'd suggest to break the line into two.
sched/sched.c
Outdated
if (flux_msg_get_userid (msg, &userid) < 0) | ||
goto error; | ||
|
||
flux_log (h, LOG_INFO, "sched_param values requested by user (%u).", userid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
@dongahn, @SteVwonder: Looks like my testcase fails with certain gcc/clang versions, not sure why. Will debug further in the evening/tmrw. Any thoughts on why this may be happening? |
t/t1008-runtime-sched-params.t
Outdated
#set -x | ||
|
||
test_description='Test dynamic updation of queue depth. | ||
First cut. Ensure jobs are correctly scheduled under different values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor nit. I don't think you need "Frist cut" here.
t/t1008-runtime-sched-params.t
Outdated
echo ${excl_4N4B_nc} | ||
' | ||
|
||
test_expect_success 'sched-params: getting sched params (no args) at runtime works' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can try to shorten the test messages here and below to fit into 80 characters. I don't think it is required but it would be generally good for readability.
Sounds good. I also added some minor comments above. I believe we are close. Thanks! |
We use `errno` for that purpose. Hopefully it answers your question. http://man7.org/linux/man-pages/man3/errno.3.html
Dong
…________________________________
From: Tapasya Patki <notifications@github.com>
Sent: Tuesday, July 17, 2018 12:08:19 PM
To: flux-framework/flux-sched
Cc: Ahn, Dong H.; Mention
Subject: Re: [flux-framework/flux-sched] Issue337 (#362)
@dongahn<https://github.com/dongahn>: I think some of the tests might be failing because rc isn't explicitly set to -1.
Regarding, error=EINVAL, should I be setting that to a global error variable somewhere? I don't see one in sched (setting it locally doesn't make sense as it won't propagate through the callstack). Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#362 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AA0nqxaU4E2edrdSZptEQyllNdU0eiEyks5uHjYjgaJpZM4VPlQh>.
|
@dongahn, @SteVwonder, @grondo: I think this is ready to go, please let me know if I missed anything. It also fixes #364, I had to add "-libevent-dev" to .travis.yml in sched after the pmix update in flux-core. |
@tpatki: This looks great. Merging. |
Added request/reponse functionality to update scheduler parameters at runtime. Added a test case for the same. Needs the
flux wreck sched-params get/set
subcommand to be present in flux-core (separate PR).Also fixes #364.