Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck incremental improvements #1399

Merged
merged 26 commits into from Mar 30, 2018

Conversation

Projects
None yet
6 participants
@grondo
Copy link
Contributor

grondo commented Mar 29, 2018

This is a set of changes accumulated in wreck to support the splash app issue #1358. It still needs work and tests added, but features are functional, including:

  • Do not arbitrarily set nnodes in the kvs if the user didn't provide it, and preserve other user provided options in an .opts key in the kvs
  • Fix flux-wreck purge sort order, as well as removal of empty lwj.x.y.z dirs after a purge.
  • Add new job.submit-nocreate RPC to "submit" a job with an existing KVS entry, in support of:
  • Schedule flux wreckrun jobs if a scheduler is present and the new -I, --immediate option is not provided
  • Fix crash in job.kvspath #1394
  • Distribute the lwj.ntasks global ntasks properly across all cores in wrexecd, instead of assuming rank.X.cores is the number of tasks to run on each rank X.
  • Add support for R lite format in lwj.R_lite. This code expects R_lite to be an array of JSON objects with rank: keys (to be fixed soon if we need to switch to HostName keys)
  • flux-wreckrun now emits its manual "resource assignment" in R_lite format

Also thrown in (these should be pulled out probably):

  • resource-hwloc: Don't complain about "all ranks not aggregated" since the module may not be loaded everywhere
  • flux-hwloc Use FLUX_NODEID_ANY as target of get topo RPCs.

The re-worked code to parse R_lite falls back to rank.X.cores if there is no R_lite key, so it should work with the current scheduler and do a much better job of distributing tasks across the assigned resources, assuming that rank.N.cores is a count of assigned cores on each rank.

It may be that this approach which uses the current method of scheduler resource assignment will work "good enough" to satisfy #1358, but we'll need to have @trws comment. The only thing that a full R_lite implementation would gain is that we'd have the exact core list for each job, and wreck could set the default cpu binding for each job.

Lastly, I have to apologize for the quality of work here. This is definitely a rush job. Given more time it would have been better to refactor wrexecd.c first, and possibly the job module is due for a rewrite as well to remove all synchronous RPCs.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 29, 2018

Coverage Status

Coverage decreased (-0.05%) to 78.783% when pulling 4471f15 on grondo:wreck-experimental into f7bde37 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

Oops, #1400 reminded me that flux wreck ls breaks on this branch (at least the RANKS reporting does).

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

This isn't ready for merge, something is wrong with the new submit-only RPC.

@grondo grondo force-pushed the grondo:wreck-experimental branch from 0c9dccf to 1836602 Mar 29, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 29, 2018

Codecov Report

Merging #1399 into master will decrease coverage by 0.04%.
The diff coverage is 81.45%.

@@            Coverage Diff             @@
##           master    #1399      +/-   ##
==========================================
- Coverage   78.52%   78.47%   -0.05%     
==========================================
  Files         162      163       +1     
  Lines       29768    29979     +211     
==========================================
+ Hits        23374    23526     +152     
- Misses       6394     6453      +59
Impacted Files Coverage Δ
src/cmd/builtin/hwloc.c 80.36% <100%> (ø) ⬆️
src/modules/resource-hwloc/resource.c 68.38% <50%> (+0.82%) ⬆️
src/modules/wreck/job.c 72.47% <62.79%> (-0.35%) ⬇️
src/common/libjsc/jstatctl.c 74.89% <75%> (+0.03%) ⬆️
src/modules/wreck/wrexecd.c 74.55% <84.93%> (-1.43%) ⬇️
src/modules/wreck/rcalc.c 85.71% <85.71%> (ø)
src/modules/connector-local/local.c 74.18% <0%> (-1.64%) ⬇️
src/common/libutil/dirwalk.c 93.57% <0%> (-0.72%) ⬇️
src/broker/overlay.c 74.14% <0%> (-0.32%) ⬇️
... and 6 more
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

Ok, submit-only fixed and tested against flux-framework/flux-sched#295.

@garlick
Copy link
Member

garlick left a comment

This seems like a lot of great cleanup!

I added a couple of very minor review comments. Looks great.

@@ -315,10 +315,11 @@ end

This comment has been minimized.

@garlick

garlick Mar 29, 2018

Member

Duplicate "the" in commit message (the the minimum of the number of...)

@@ -89,12 +90,16 @@ local function alloc_tasks (f, wreck, lwj)
end
end
for i, ntasks in pairs (counts) do
local key = "rank."..i..".cores"
lwj [key] = ntasks
local corelist = "0"

This comment has been minimized.

@garlick

garlick Mar 29, 2018

Member

Would it make sense here to keep the old rank.N.cores as an option for testing?

This comment has been minimized.

@grondo

grondo Mar 29, 2018

Author Contributor

we could I didn't want to add yet another option so in tests I just use --pre-launch-hook to do it. Easy to add an option though and that would be more straightforward.

This comment has been minimized.

@garlick

garlick Mar 29, 2018

Member

Oh as long as there is a way to drive it from tests.

grondo added some commits Mar 17, 2018

cmd/flux-submit: indicate to sched when -N not set
When only -n is used with flux-submit, the wreck implementation
sets a default value for nnodes to the minimum of the number
of tasks requested and the total number of nodes in the session.
This unfortunately loses the context that the user didn't explicitly
request a number of nodes.

For flux-submit only, change this behavior and set the nnodes value
in the request to 0 if -N, --nnodes was not set by the user.

Fixes #1368
wreck: save -n,-N,-t options in kvs for submission
For posterity and possibly scheduler use, save the raw value
of the -N, --nodes; -n, --ntasks, and -t, --tasks-per-node options
used in flux-submit and flux-wreckrun to the LWJ kvs directory
under an `opts.` key. If an option wasn't used at all, then
there will no corresponding entry in opts. for that option.
wreck: preserve user -n, --ntasks option
Do not directly modify self.opts.n within wreck:parse_cmdline() since
this is now propagated to the kvs for the scheduler and posterity.
Instead directly set self.ntasks in the same manner as before, but
without self.opts.n as intermediary.
wreck: print exit string for complete jobs in ls
In flux-wreck ls, print state of complete jobs as "exited" for
normal exit status, "failed" for jobs with non-zero exit status,
and "killed" for jobs terminated by a signal.
wreck: purge: fix sort order during purge
purge was iterating lwj-complete.<hb> directories in strcmp()
order instead of numeric order so that jobs were not being purged
oldest-first as expected.

Switch purge to collect kvs directories in a table and sort the
table numerically.
wreck: purge: also remove empty lwj.x.y.z dirs
When purging lwj directories with flux-wreck purge, also remove
any now empty lwj.x.y.z directories in the hierarchy.

Fixes #1387
wreck: add job.submit-nocreate RPC
Add an RPC to the wreck/job module to submit a job without
creating it, i.e. to submit an existing job entry created
with job.create.
wreck: schedule jobs by default with wreckrun
Use new job.submit-nocreate on jobs for flux-wreckrun by default.
If no scheduler is loaded, or a new -I, --immediate flag is used,
then fall back to issuing the runrequest manually.

Fixes #1392
doc: document -I, --immediate flag in flux-wreckrun
Add documentation on the -I, --immediate flag for
flux-wreckrun.
modules/wreck: fix crash in job.kvspath
Fix a crash in job_kvspath_cb() when the json `ids`
input doesn't point to an array. Found by @trws.

Fixes #1394
cmd/flux-hwloc: send RPCs to FLUX_NODEID_ANY
RPC issued by the flux-hwloc command to gather topology
was always directed at rank 0. This is unnecessary, and makes
it impossible to query the service when resource-hwloc is not
loaded on rank 0.

Send the RPC to FLUX_NODEID_ANY instead.
resource-hwloc: remove error when all ranks not aggregated
resource-hwloc refuses to respond to topo query requests when
all ranks haven't been aggregated in the KVS. This assumption
that resource-hwloc will be loaded everywhere is erroneous in
some cases. Relax the error and just return whatever was found.
t/wreck: add test code for wreck/rcalc
Add a test program for use in tests for the wreck resource
calculation code.
wreck: cores_per_node should be tasks_per_node
Rename internal wreck.cores_per_node to tasks_per_node to avoid
continuing confusion.
wreck: check for R_lite in job module
In the wreck/job module, check for existence of an R_lite key,
and if present, see if that key has information for the current
rank to determine if a job targets the current rank.

If not, then fall back to looking for `rank.N` directory for rank N.
wreck: switch to rcalc for count of tasks to run
Problem: wrexecd assumes that the rank.N.cores count assigned by
the scheduler is the same as the number of local tasks to run
on the local rank.

Solution: Switch wrexecd to using the `rcalc` code to distribute
the total ntasks requested for the job across the global resources
assigned. The code first checks for an "R_lite" key in the Rlite
format, and if that is not avilable, falls back to the rank.N.cores.

Fixes #1378
cmd/flux-wreckrun: emit resources in R lite format
Switch from using rank.N.cores format to emitting R lightweight
format into the R_lite key.
t/t2000-wreck: fix affinity assignment test
Fix the affinity assignment test to use better quoting of the
kvs path.
t/issues/0900-wreck-invalid-cores.sh: remove
Remove the test for wreck invalid cores in KVS rank.N.cores,
since this format is no longer used by default, and there is a
different error behavior with invalid counts in these keys
anyway.
cmd/flux-wreck: get rank information from R_lite if available
If self.lwj.R_lite is available, pull rank information from there
instead of trying the lwj.rank.X KVS dirs. If R_lite is not present
then fall back to the old style lwj.rank dirs.

grondo and others added some commits Mar 29, 2018

t/t2000-wreck: test fallback to rank.N.cores
Test flux-wreckrun ability to fall back to old style rank.N.cores
style of resource assignment.
wreck: add ncores to job requests
Add ncores to job requests.
Computed with assistance of new --cores-per-task option to submit/wreckrun.
wreck: add ncores to job create events
Publish ncores in addition to other job data during job creation
events like reserved and submitted.

@grondo grondo force-pushed the grondo:wreck-experimental branch from 1836602 to 9577af3 Mar 29, 2018

@grondo grondo referenced this pull request Mar 29, 2018

Closed

[WIP] Splash #1396

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

Ok, fixed up the typo in commit message. I've also added @trws changes to add an ncores member to job requests, and published the ncores value in the submitted event message.

We'll need @dongahn to fix up his pending PR in flux-sched to also pull ncores directly from the event, and perhaps point me in the right direction to add the ncores field to jsc.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

perhaps point me in the right direction to add the ncores field to jsc.

Ok, I'm going through jstatctl now adding NCORES everywhere there is also an NTASKS

grondo added some commits Mar 29, 2018

libjsc: Add ncores attribute for job requests
Support the `ncores` attribute for job submission in all the same
places `ntasks` is also used.
t/t2001-jsc.t: Add ncores to jstatctl tests
Add ncores in JSON input for the jsc tests that process other
job submission parameters.
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 29, 2018

Ok, did my best to extend libjsc for ncores attribute of jobs. @dongahn should probably review.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Mar 30, 2018

Ok, did my best to extend libjsc for ncores attribute of jobs. @dongahn should probably review.

Looks great! This shouldn't break the scheduler PR flux-framework/flux-sched#295 as it still emits ntasks. So once this is merged, I will change sched code to use scores instead of ntasks.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 30, 2018

I like it!

@garlick garlick merged commit 25f7d40 into flux-framework:master Mar 30, 2018

4 checks passed

codecov/patch 81.45% of diff hit (target 78.52%)
Details
codecov/project Absolute coverage decreased by -0.04% but relative coverage increased by +2.93% compared to f7bde37
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.05%) to 78.783%
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 30, 2018

Thanks @garlick! Whew!

@grondo grondo deleted the grondo:wreck-experimental branch Apr 26, 2018

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.