Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop rank.N support in favor of R_lite #339

Merged
merged 3 commits into from May 8, 2018

Conversation

Projects
None yet
5 participants
@dongahn
Copy link
Contributor

dongahn commented May 5, 2018

This PR drops rank.N and introduces GPU support into R_lite. If GPUs are involved, R_lite will look like:

[ { "node": "cab1234", "children": { "core": "0-15", "gpu": "0-1" }, "rank": 0 } ]

@trws and @grondo: will this representation work? I think wreck is okay with the introduction of the gpu field.

Now, a bunch of scheduler tests will fail with these changes because of this sharness script code.

I will have to adjust this script code to make use of R_lite instead of rank.N.cores. I can write a simple C wrapper that makes use of jansson and idset for this purpose, but I'm wondering if flux-core already has some utilities that I can use for this purpose. @grondo?

@dongahn dongahn requested review from grondo and trws May 5, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 5, 2018

Coverage Status

Coverage decreased (-0.09%) to 75.15% when pulling 7ab10d7 on dongahn:drop_rank_N into ce8c530 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 6, 2018

Thanks @dongahn! You are correct currently wreck ignores other fields in R_lite besides "core".

The tests in flux-core are mainly focused on verifying correct distribution of N tasks across the cores present in R_lite, so there is nothing specific that will help with your sharness script.

I have found jq to be a nice cli for manipulating, filtering, and querying json objects. It would also be trivial to check fields of R_lite via the Lua bindings, since the bindings will translate the json object to a Lua table for you.

Here's a short script that will print info from R_lite for any job given jobid and broker rank (or "all" for all ranks), e.g.

grondo:~/flux-sched.git (drop_rank_N)(0) $ flux submit -N2 -n4 hostname                                                                                                                   
submit: Submitted jobid 12
grondo:~/flux-sched.git (drop_rank_N)(0) $ ./R_lite 12 0
rank0: core = 0-1
grondo:~/flux-sched.git (drop_rank_N)(0) $ ./R_lite 12 4
No info in R_lite for rank 4
grondo:~/flux-sched.git (drop_rank_N)(0) $ ./R_lite 12 all
rank0: core = 0-1
rank1: core = 0-1

Maybe this isn't much quicker than a C program but at least it doesn't need to be compiled.
Let me know if you'd need slightly different output.

#!/usr/bin/env lua
local id_to_kvs_path = require 'wreck'.id_to_path
local f = assert (require 'flux'.new())
local jobid = tonumber (arg[1])
local rank = arg[2]

local function die (...)
    io.stderr:write (string.format (...))
    os.exit (1)
end

if not jobid or not rank then
    die ("Usage: %s JOBID RANK\n", arg[0])
end
local key = id_to_kvs_path{ flux = f, jobid = jobid }..".R_lite"
local R_lite = assert (f:kvs_get (key))
for _,r in pairs (R_lite) do
    if rank == "all" or r.rank == tonumber (rank) then
        for k,v in pairs (r.children) do
            print ("rank"..r.rank..": "..k.." = "..v)
            if rank ~= "all" then os.exit (0) end
        end
    end
end
if rank ~= "all" then
    die ("No info in R_lite for rank %d\n", rank)
end
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 7, 2018

@dongahn, fyi, we do not have Lua bindings for nodeset yet, but there are bindings for cpu_set_t as part of flux-core you can use to parse and count these cpu sets, e.g.:

#!/usr/bin/env lua
local cpuset = require 'flux.cpuset'
local cset = cpuset.new (arg[1])
print ("Cpuset has "..#cset.." members")
$ lua t.lua 0-1,10
Cpuset has 3 members
$ lua t.lua 0-1   
Cpuset has 2 members
@trws

This comment has been minimized.

Copy link
Member

trws commented May 7, 2018

As long as it works for wreck I don't see any issue with it.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 8, 2018

Codecov Report

Merging #339 into master will decrease coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #339      +/-   ##
=========================================
- Coverage   73.69%   73.6%   -0.09%     
=========================================
  Files          56      56              
  Lines        9843    9791      -52     
=========================================
- Hits         7254    7207      -47     
+ Misses       2589    2584       -5
Impacted Files Coverage Δ
resrc/test/tresrc.c 96.38% <100%> (+0.01%) ⬆️
sched/sched.c 73.26% <100%> (-0.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce8c530...7ab10d7. Read the comment docs.

@dongahn dongahn changed the title [WIP] Drop rank.N support in favor of R_lite Drop rank.N support in favor of R_lite May 8, 2018

@grondo

grondo approved these changes May 8, 2018

Copy link
Contributor

grondo left a comment

Looks good to me!

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 8, 2018

Thanks @grondo. I just saw two minor knits. I will force a push at which point this can go in. Also as part of this, I will drop some keys in JSC. So, I will post a PR on flux-core as well.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 8, 2018

BTW, Lua is pretty easy to learn and simple! It seems having some hands-on for a simple C binding and module and OO package/module development at some point will be helpful for me.

@dongahn dongahn force-pushed the dongahn:drop_rank_N branch from 50b541e to 13dfc0d May 8, 2018

dongahn added some commits May 5, 2018

test: Add R_lite.lua script to fetch R_lite
Modify @grondo's R_lite.lua and introduce it into t/scripts

Convert a function in sched-sharness.sh to use this lua script
instead of the old `rank.N.cores` field.

@dongahn dongahn force-pushed the dongahn:drop_rank_N branch from 13dfc0d to 7ab10d7 May 8, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 8, 2018

Ok. Forced a push. If Travis turns green, this can go in.

@grondo grondo merged commit 0d9a9ff into flux-framework:master May 8, 2018

4 checks passed

codecov/patch 100% of diff hit (target 73.69%)
Details
codecov/project Absolute coverage decreased by -0.08% but relative coverage increased by +26.3% compared to ce8c530
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.09%) to 75.15%
Details
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 8, 2018

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.