TEST: sync up upstream #1

kingchc · 2022-07-20T22:54:05Z

sync up upstream UCC

MC: Fix build with profiling

* UTIL: host hash avoid using gethostid() * TL/UCP: fix ep hash for 64 bit node_hash * CORE: check for proc info uniqness * TL/UCP: array ep storage

Signed-off-by: artemry-nv <artemry@nvidia.com>

Co-authored-by: Lior Paz <liorpa@mellanox.com>

TL/UCP: Fix bug in ep close

Co-authored-by: Lior Paz <liorpa@mellanox.com>

Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

* TL/NCCL: add reduce scatter and reduce * TL/NCCL: fix reduce scatter count

* CORE: basic topo/subgrouping * TEST: sbgp tests

Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

* UCP: Implementing Reduce Knomial * CORE: Fixed check coll type for reduce case * REVIEW: Fixes to first code review * REVIEW: Fixes to second code review * REVIEW: Fixes to third code review Co-authored-by: valentin petrov <valentinp@nvidia.com>

Co-authored-by: Lior Paz <liorpa@mellanox.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

Re-implement internally to keep backward bin-compat.

* CORE: optimize team create Don't perform service_team creation or team_id allocation if it is not required by CL/TLs * UTIL: service colls - Adds service allgather tl iface - Adds ucc_service_coll convenience layer in core - Adds option UCC_INTERNAL_OOB * TEST: service coll tests * CODESTYLE: apply clang * CORE: fixes service coll map usage * CORE: ucc_team_subset_t to common place * CORE: service coll progress fix * TL/UCP: service ag fix Fix afterrebase on top of "count" definition change * API: oob_ep field to oob

#273) * API: Clarifying semantics for coll args flags * API: Clarfying persistent flag semantic

Change coll args rules for allreduce, reduce_scatter and reduce

* SCHEDULE: pipelined schedule iface * TL/UCP: pipelined sra * TEST: gtest for sra pipelined * TL/UCP: fix linter warn

* UTIL: new ucs profiler config * UTIL: code review fixes

Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

Co-authored-by: valentin petrov <valentinp@nvidia.com>

* API: add float128 and float32(64,128)_complex dt * TEST: update mpi_tests with new dt * TEST: update Gtest with new dt * BUILD: check dt size during preprocessing

explicitely disabeling tl/rccl was broken due to some misplaced parenthesis in the rccl.m4 file.

Co-authored-by: Wes Bland <wbland@fb.com>

This commits resync the rocm components with the ucc code base. Specifically, it adds support (or disqualifies itself) for various float(32,64,128)_complex datatypes. Co-authored-by: Sergey Lebedev <sergeyle@nvidia.com>

move the invocation of event_destroy() from the coll_finalize() function to the free_task() function to avoid leaving events in the mpool if a test is skipped because of an unsupported datatype.

attr flags are not initialized. Context fot these TLs is not created since core context assumes service team is required.

Co-authored-by: valentin petrov <valentinp@nvidia.com>

* TL/CUDA: allgather(v) linear alg * REVIEW: fix review comments * addeed algorithm description * fixed alignment * fixed copyright * REVIEW: apply clang format Co-authored-by: valentin petrov <valentinp@nvidia.com>

* MC/CUDA: add uint16(32,64) support in reduce * TEST: add CUDA reduce gtest with uint16(32,64) dt * TEST: add reduce mpi tests with uint16(32,64) dt Co-authored-by: valentin petrov <valentinp@nvidia.com>

bureddy and others added 30 commits July 24, 2021 18:27

MC: Fix build with profiling

1164813

Merge pull request #275 from bureddy/fix-build

1ccf7c3

MC: Fix build with profiling

TEST: Enabled UCC unit tests (#277)

b08d3e7

Topic/node hash fix (#279)

af1e11a

* UTIL: host hash avoid using gethostid() * TL/UCP: fix ep hash for 64 bit node_hash * CORE: check for proc info uniqness * TL/UCP: array ep storage

CORE: coll_init log (#267)

b6eb878

Update README.md

1a2da2a

TEST: W/A for DLRM installation (#283)

fc9ec0d

Signed-off-by: artemry-nv <artemry@nvidia.com>

TL/UCP: Adjust allgather, alltoall count (#259)

1af7491

Co-authored-by: Lior Paz <liorpa@mellanox.com>

TL/UCP: Fix bug in ep close

14ecd79

Merge pull request #287 from lappazos/Fix_ep_bug

73d827a

TL/UCP: Fix bug in ep close

UTIL: Add ucs machine id (#286)

040adc4

Co-authored-by: Lior Paz <liorpa@mellanox.com>

TEST: Removed W/A for ONNX installation (#285)

d62b98b

Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

DOCS: Adding UCF legal frontmatter (#289)

4827b29

TL/NCCL: add reduce scatter and reduce (#221)

7ab82ef

* TL/NCCL: add reduce scatter and reduce * TL/NCCL: fix reduce scatter count

BUILD: fix nvcc configure (#292)

3be7e99

Topic/topo subgrouping (#266)

d6eb20d

* CORE: basic topo/subgrouping * TEST: sbgp tests

TEST: Updated pytorch + W/A for DLRM installation (#282)

6de7a74

Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

UTIL: ucs_config_names_search signature (#300)

f252ee4

API: Add Average reduce operation (#295)

a2727d2

Co-authored-by: Lior Paz <liorpa@mellanox.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

UTIL: proper fix for config_names_search (#301)

d027c9f

Re-implement internally to keep backward bin-compat.

API: Clarifying semantics for coll args flags and adding documentation (

ee78a82

#273) * API: Clarifying semantics for coll args flags * API: Clarfying persistent flag semantic

CORE: change coll args rules

b27cc49

DOCS: update coll args table

b4c967b

Merge pull request #293 from Sergei-Lebedev/topic/coll_args_change

6be2727

Change coll args rules for allreduce, reduce_scatter and reduce

Topic/schedule pipelined (#276)

1ae0848

* SCHEDULE: pipelined schedule iface * TL/UCP: pipelined sra * TEST: gtest for sra pipelined * TL/UCP: fix linter warn

TEST: add ompi/coll/ucc to github actions (#306)

a9951ce

API: coll timeout option (#307)

8f85d2a

TL/UCP: fix eps array storage (#304)

79f3f8f

valentin petrov and others added 27 commits June 14, 2022 22:14

TL/UCP: service bcast (#516)

82c0919

UTIL: new ucs profiler config (#535)

932462c

* UTIL: new ucs profiler config * UTIL: code review fixes

UTIL: decrease debug level for proc_info (#537)

cbe7b03

UTIL: rcache_get_arg (#518)

a43fad1

BUILD: verbs/dv detection (#517)

17d0864

EC/ROCM: add support for rocm memory in tl/ucp (#528)

a541736

Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>

CL/BASIC: fix score_map cleanup (#539)

6af46c1

CORE: ctx service team flag (#519)

a9f6022

BUILD: exit on cuda compile error (#543)

acd0f61

Co-authored-by: valentin petrov <valentinp@nvidia.com>

API: add float128 and float32(64,128)_complex dt (#492)

8afd34a

* API: add float128 and float32(64,128)_complex dt * TEST: update mpi_tests with new dt * TEST: update Gtest with new dt * BUILD: check dt size during preprocessing

TL/RCCL: fix configure logic (#547)

ac13e5c

explicitely disabeling tl/rccl was broken due to some misplaced parenthesis in the rccl.m4 file.

TL/RCCL: Add active sets support to bcast (#550)

c70aff5

Co-authored-by: Wes Bland <wbland@fb.com>

TL/CUDA: fix exec task finalize (#548)

bf5e36b

TL/CUDA: fix uninit alltoall stage (#549)

4e58ed7

MC/ROCM: resync component adding complex datatypes (#552)

a98964c

This commits resync the rocm components with the ucc code base. Specifically, it adds support (or disqualifies itself) for various float(32,64,128)_complex datatypes. Co-authored-by: Sergey Lebedev <sergeyle@nvidia.com>

TL/UCP: fix executor status (#557)

dfcf806

TL/SHARP: fix sharp oob

3c00b91

DOCS: fix ucc.h inline doc

f0911c1

TL/RCCL: move event_destroy call (#551)

95f6773

move the invocation of event_destroy() from the coll_finalize() function to the free_task() function to avoid leaving events in the mpool if a test is skipped because of an unsupported datatype.

TL/SELF: fix lib attr flags (#564)

a9129dc

attr flags are not initialized. Context fot these TLs is not created since core context assumes service team is required.

TL/SELF: fix linter warnings (#562)

506c5af

Co-authored-by: valentin petrov <valentinp@nvidia.com>

TL/CUDA: check cuda device in coll init (#546)

8b4462f

Co-authored-by: valentin petrov <valentinp@nvidia.com>

CORE: fix spelling (#563)

8875a10

Co-authored-by: valentin petrov <valentinp@nvidia.com>

TL/CUDA: multicopy allgatherv (#544)

a7bda27

BUILD: return error for not implemented functions (#555)

b374808

Co-authored-by: valentin petrov <valentinp@nvidia.com>

TL/CUDA: allgather(v) linear alg (#561)

786275c

* TL/CUDA: allgather(v) linear alg * REVIEW: fix review comments * addeed algorithm description * fixed alignment * fixed copyright * REVIEW: apply clang format Co-authored-by: valentin petrov <valentinp@nvidia.com>

MC/CUDA: add uint16(32,64) support in reduce (#565)

5cf5815

* MC/CUDA: add uint16(32,64) support in reduce * TEST: add CUDA reduce gtest with uint16(32,64) dt * TEST: add reduce mpi tests with uint16(32,64) dt Co-authored-by: valentin petrov <valentinp@nvidia.com>

facebook-github-bot added cla signed module: rocm labels Jul 20, 2022

kingchc merged commit 531a02b into facebookresearch:master Jul 20, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST: sync up upstream #1

TEST: sync up upstream #1

kingchc commented Jul 20, 2022

TEST: sync up upstream #1

TEST: sync up upstream #1

Conversation

kingchc commented Jul 20, 2022