TOOLS/PERFTEST: ucc perftest #166

Sergei-Lebedev · 2021-04-26T08:08:52Z

What

Adding UCC internal performance tests

Why ?

UCC perftest is not a replacement for other well known benchmark such as OSU, instead the goal here is to cover different usage scenarios specific for UCC such as persistent collectives, asymmetric memory type, multithreading, measuring collective bw.

How ?

ucc_perftest is compiled as a separate binary file in tools/perftest. Different backends might be used for OOB, but right now only MPI is available.
ucc_pt_benchmark - common logic for starting benchmark
ucc_pt_coll - collective abstraction for benchmarking
ucc_pt_bootstrap - OOB backend abstraction
ucc_pt_comm - ucc_lib + ucc_context + ucc_team

Running allreduce on 8 ranks with ucc_perftest and OSU with ucc coll component (using Val's MPI driver):

UCC Perftest

Collective: Allreduce
Memory type: host
Data type: 11
Operation type: 2
Warmup: 200; Iterations: 100
       Count        Size                Time, us
                                 avg         min         max
         128         512        6.48        5.76        7.18
         256        1024        7.65        7.19        8.10
         512        2048        9.89        9.56       10.14
        1024        4096       16.18       16.02       16.41
        2048        8192       25.03       24.55       25.46
        4096       16384       41.91       41.30       43.14
        8192       32768       56.47       55.63       57.13
       16384       65536      101.32      100.09      103.56
       32768      131072      182.24      179.16      185.13
       65536      262144      352.69      320.45      377.47
      131072      524288      682.40      624.88      727.74
      262144     1048576     1304.06     1286.65     1320.36
      524288     2097152     2641.49     2627.99     2674.91
     1048576     4194304     6433.45     6370.56     6472.88
     2097152     8388608    15113.78    14921.75    15312.76
     4194304    16777216    44856.27    44334.32    45363.89

OSU Allreduce

# OSU MPI Allreduce Latency Test v5.6.2
# Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)  Iterations
512                     7.24              6.86              7.52         100
1024                    8.29              8.01              8.46         100
2048                   10.53              9.93             11.18         100
4096                   15.03             14.50             15.64         100
8192                   25.11             24.40             25.80         100
16384                  37.26             36.14             38.33         100
32768                  60.45             58.65             62.75         100
65536                 109.76            107.78            112.19         100
131072                190.86            180.98            196.24         100
262144                355.12            349.44            363.02         100
524288                708.33            678.93            734.56         100
1048576              1356.70           1325.77           1399.61         100
2097152              2690.97           2642.03           2774.06         100
4194304              6500.48           6298.73           6777.99         100
8388608             15228.75          14709.94          15732.21         100
16777216            45990.79          45844.17          46254.13         100

vspetrov · 2021-04-27T06:53:25Z

tools/perf/ucc_pt_benchmark.cc

+        UCCCHECK_GOTO(ucc_collective_post(req), free_req, st);
+        do {
+            UCCCHECK_GOTO(ucc_context_progress(ctx), free_req, st);
+        } while (ucc_collective_test(req) == UCC_INPROGRESS);


check for NOT_SUPPORTED or other err codes?

vspetrov · 2021-04-27T07:55:45Z

tools/perf/ucc_pt_coll.cc

+    coll_args.dst.info.mem_type = mt;
+}
+
+ucc_status_t ucc_pt_coll_allreduce::get_coll(size_t count,


imho "get_coll" confusing name, since you not "getting" anything. i would call it "init_args"

vspetrov · 2021-04-27T07:57:36Z

tools/perf/ucc_pt_comm.cc

+    ucc_context_params_t context_params;
+    ucc_team_params_t team_params;
+    ucc_status_t st;
+    st = ucc_lib_config_read("TORCH", nullptr, &lib_config);


TORCH -> PERF_TEST ?

vspetrov · 2021-04-27T07:58:13Z

tools/perf/ucc_pt_comm.cc

+    std::memset(&lib_params, 0, sizeof(ucc_lib_params_t));
+    lib_params.mask = UCC_LIB_PARAM_FIELD_THREAD_MODE;
+    lib_params.thread_mode = UCC_THREAD_SINGLE;
+    st = ucc_init(&lib_params, lib_config, &lib);


UCC_CHECKGOTO(st)?

vspetrov · 2021-04-27T07:59:02Z

tools/perf/ucc_pt_comm.cc

+                          UCC_CONTEXT_PARAM_FIELD_OOB;
+    context_params.type = UCC_CONTEXT_SHARED;
+    context_params.oob  = bootstrap->get_context_oob();
+    ucc_context_create(lib, &context_params, context_config, &context);


st = ucc_Context_create

vspetrov · 2021-04-27T08:02:20Z

tools/perf/ucc_pt_config.cc

+    bench.mt            = UCC_MEMORY_TYPE_HOST;
+    bench.op            = UCC_OP_SUM;
+    bench.inplace       = false;
+    bench.n_iter        = 10;


I think it is very useful to have iter/warmup pair for at least "small" and "large" msg sizes. For small (say up to 64K) we want warmup 100, iter 1000 i beleive. 10/10 will have a lot of noise on high core counts

vspetrov · 2021-04-27T08:06:12Z

tools/perf/ucc_pt_benchmark.cc

+    config(cfg),
+    comm(communcator)
+{
+    coll = new ucc_pt_coll_allreduce(cfg.dt, cfg.mt, cfg.op, cfg.inplace);


How will you make test with multiple collectives? Will create multiple ucc_pt_benchmark objects in main? Right now it is kind of all tight to a 1 benchmark. You have 1 arg parse, which creates 1 config, with 1 coll. Looks like it must be changed somehow. OR, is the use case is 1 coll_type at a time ? It also fine imho, just making sure i understand.

vspetrov · 2021-04-27T14:35:35Z

tools/perf/ucc_pt_config.cc

+    bench.mt             = UCC_MEMORY_TYPE_HOST;
+    bench.op             = UCC_OP_SUM;
+    bench.inplace        = false;
+    bench.n_iter_large   = 1000;


large should be 20/200 and small 100/1000 - vice versa

oops 😀, fixed

bureddy · 2021-04-28T16:59:17Z

Data type: 11
Operation type: 2

Can we convert these numbers to names?

Sergei-Lebedev · 2021-04-28T19:02:09Z

Data type: 11
Operation type: 2
Can we convert these numbers to names?

yep, fixed. Header looks like this now

Collective:             Allreduce
Memory type:            host
Data type:              float32
Operation type:         sum
Warmup:
  small                 100
  large                 20
Iterations:
  small                 1000
  large                 200

split warmup and iterations for small and large tests fix error checking fix config read prefix fix datatype and reduction print

Sergei-Lebedev added the Ready-for-Review label Apr 26, 2021

Sergei-Lebedev requested review from manjugv, vspetrov, bureddy and alex--m April 26, 2021 08:08

Sergei-Lebedev force-pushed the topic/perftest branch from 406d276 to 2afcd8c Compare April 26, 2021 15:54

vspetrov reviewed Apr 27, 2021

View reviewed changes

Sergei-Lebedev force-pushed the topic/perftest branch from d906a1d to 95ea74e Compare April 27, 2021 14:42

vspetrov self-requested a review April 27, 2021 14:44

vspetrov approved these changes Apr 27, 2021

View reviewed changes

Sergei-Lebedev force-pushed the topic/perftest branch from 95ea74e to 9b3f15c Compare April 28, 2021 19:00

bureddy approved these changes Apr 28, 2021

View reviewed changes

Sergei-Lebedev added 2 commits April 28, 2021 22:30

TOOLS: add ucc perftest

3c35adb

REVIEW: perftest review comments

f93f463

split warmup and iterations for small and large tests fix error checking fix config read prefix fix datatype and reduction print

Sergei-Lebedev force-pushed the topic/perftest branch from 9b3f15c to f93f463 Compare April 28, 2021 19:30

Sergei-Lebedev merged commit d13e395 into openucx:master Apr 28, 2021

Sergei-Lebedev deleted the topic/perftest branch April 28, 2021 19:38

vspetrov mentioned this pull request May 11, 2021

Add Performance Tests #162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TOOLS/PERFTEST: ucc perftest #166

TOOLS/PERFTEST: ucc perftest #166

Sergei-Lebedev commented Apr 26, 2021 •

edited

Loading

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

vspetrov Apr 27, 2021

Sergei-Lebedev Apr 27, 2021

bureddy commented Apr 28, 2021

Sergei-Lebedev commented Apr 28, 2021

TOOLS/PERFTEST: ucc perftest #166

TOOLS/PERFTEST: ucc perftest #166

Conversation

Sergei-Lebedev commented Apr 26, 2021 • edited Loading

What

Why ?

How ?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bureddy commented Apr 28, 2021

Sergei-Lebedev commented Apr 28, 2021

Sergei-Lebedev commented Apr 26, 2021 •

edited

Loading