-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOOLS/PERFTEST: ucc perftest #166
Conversation
406d276
to
2afcd8c
Compare
tools/perf/ucc_pt_benchmark.cc
Outdated
UCCCHECK_GOTO(ucc_collective_post(req), free_req, st); | ||
do { | ||
UCCCHECK_GOTO(ucc_context_progress(ctx), free_req, st); | ||
} while (ucc_collective_test(req) == UCC_INPROGRESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for NOT_SUPPORTED or other err codes?
tools/perf/ucc_pt_coll.cc
Outdated
coll_args.dst.info.mem_type = mt; | ||
} | ||
|
||
ucc_status_t ucc_pt_coll_allreduce::get_coll(size_t count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho "get_coll" confusing name, since you not "getting" anything. i would call it "init_args"
tools/perf/ucc_pt_comm.cc
Outdated
ucc_context_params_t context_params; | ||
ucc_team_params_t team_params; | ||
ucc_status_t st; | ||
st = ucc_lib_config_read("TORCH", nullptr, &lib_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TORCH -> PERF_TEST ?
tools/perf/ucc_pt_comm.cc
Outdated
std::memset(&lib_params, 0, sizeof(ucc_lib_params_t)); | ||
lib_params.mask = UCC_LIB_PARAM_FIELD_THREAD_MODE; | ||
lib_params.thread_mode = UCC_THREAD_SINGLE; | ||
st = ucc_init(&lib_params, lib_config, &lib); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCC_CHECKGOTO(st)?
tools/perf/ucc_pt_comm.cc
Outdated
UCC_CONTEXT_PARAM_FIELD_OOB; | ||
context_params.type = UCC_CONTEXT_SHARED; | ||
context_params.oob = bootstrap->get_context_oob(); | ||
ucc_context_create(lib, &context_params, context_config, &context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
st = ucc_Context_create
tools/perf/ucc_pt_config.cc
Outdated
bench.mt = UCC_MEMORY_TYPE_HOST; | ||
bench.op = UCC_OP_SUM; | ||
bench.inplace = false; | ||
bench.n_iter = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is very useful to have iter/warmup pair for at least "small" and "large" msg sizes. For small (say up to 64K) we want warmup 100, iter 1000 i beleive. 10/10 will have a lot of noise on high core counts
config(cfg), | ||
comm(communcator) | ||
{ | ||
coll = new ucc_pt_coll_allreduce(cfg.dt, cfg.mt, cfg.op, cfg.inplace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will you make test with multiple collectives? Will create multiple ucc_pt_benchmark objects in main? Right now it is kind of all tight to a 1 benchmark. You have 1 arg parse, which creates 1 config, with 1 coll. Looks like it must be changed somehow. OR, is the use case is 1 coll_type at a time ? It also fine imho, just making sure i understand.
tools/perf/ucc_pt_config.cc
Outdated
bench.mt = UCC_MEMORY_TYPE_HOST; | ||
bench.op = UCC_OP_SUM; | ||
bench.inplace = false; | ||
bench.n_iter_large = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
large should be 20/200 and small 100/1000 - vice versa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops 😀, fixed
d906a1d
to
95ea74e
Compare
Can we convert these numbers to names? |
95ea74e
to
9b3f15c
Compare
yep, fixed. Header looks like this now
|
split warmup and iterations for small and large tests fix error checking fix config read prefix fix datatype and reduction print
9b3f15c
to
f93f463
Compare
What
Adding UCC internal performance tests
Why ?
UCC perftest is not a replacement for other well known benchmark such as OSU, instead the goal here is to cover different usage scenarios specific for UCC such as persistent collectives, asymmetric memory type, multithreading, measuring collective bw.
How ?
ucc_perftest is compiled as a separate binary file in tools/perftest. Different backends might be used for OOB, but right now only MPI is available.
ucc_pt_benchmark - common logic for starting benchmark
ucc_pt_coll - collective abstraction for benchmarking
ucc_pt_bootstrap - OOB backend abstraction
ucc_pt_comm - ucc_lib + ucc_context + ucc_team
Running allreduce on 8 ranks with ucc_perftest and OSU with ucc coll component (using Val's MPI driver):
Collective: Allreduce Memory type: host Data type: 11 Operation type: 2 Warmup: 200; Iterations: 100 Count Size Time, us avg min max 128 512 6.48 5.76 7.18 256 1024 7.65 7.19 8.10 512 2048 9.89 9.56 10.14 1024 4096 16.18 16.02 16.41 2048 8192 25.03 24.55 25.46 4096 16384 41.91 41.30 43.14 8192 32768 56.47 55.63 57.13 16384 65536 101.32 100.09 103.56 32768 131072 182.24 179.16 185.13 65536 262144 352.69 320.45 377.47 131072 524288 682.40 624.88 727.74 262144 1048576 1304.06 1286.65 1320.36 524288 2097152 2641.49 2627.99 2674.91 1048576 4194304 6433.45 6370.56 6472.88 2097152 8388608 15113.78 14921.75 15312.76 4194304 16777216 44856.27 44334.32 45363.89