forked from openucx/ucc
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST: sync up upstream #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MC: Fix build with profiling
* UTIL: host hash avoid using gethostid() * TL/UCP: fix ep hash for 64 bit node_hash * CORE: check for proc info uniqness * TL/UCP: array ep storage
Signed-off-by: artemry-nv <artemry@nvidia.com>
Co-authored-by: Lior Paz <liorpa@mellanox.com>
TL/UCP: Fix bug in ep close
Co-authored-by: Lior Paz <liorpa@mellanox.com>
Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>
* TL/NCCL: add reduce scatter and reduce * TL/NCCL: fix reduce scatter count
* CORE: basic topo/subgrouping * TEST: sbgp tests
Signed-off-by: artemry-nv <artemry@nvidia.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>
* UCP: Implementing Reduce Knomial * CORE: Fixed check coll type for reduce case * REVIEW: Fixes to first code review * REVIEW: Fixes to second code review * REVIEW: Fixes to third code review Co-authored-by: valentin petrov <valentinp@nvidia.com>
Co-authored-by: Lior Paz <liorpa@mellanox.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>
Re-implement internally to keep backward bin-compat.
* CORE: optimize team create Don't perform service_team creation or team_id allocation if it is not required by CL/TLs * UTIL: service colls - Adds service allgather tl iface - Adds ucc_service_coll convenience layer in core - Adds option UCC_INTERNAL_OOB * TEST: service coll tests * CODESTYLE: apply clang * CORE: fixes service coll map usage * CORE: ucc_team_subset_t to common place * CORE: service coll progress fix * TL/UCP: service ag fix Fix afterrebase on top of "count" definition change * API: oob_ep field to oob
#273) * API: Clarifying semantics for coll args flags * API: Clarfying persistent flag semantic
Change coll args rules for allreduce, reduce_scatter and reduce
* SCHEDULE: pipelined schedule iface * TL/UCP: pipelined sra * TEST: gtest for sra pipelined * TL/UCP: fix linter warn
* UTIL: new ucs profiler config * UTIL: code review fixes
Co-authored-by: Edgar Gabriel <edgar.gabriel@amd.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: Min Si <msi@fb.com> Co-authored-by: valentin petrov <valentinp@nvidia.com>
Co-authored-by: valentin petrov <valentinp@nvidia.com>
* API: add float128 and float32(64,128)_complex dt * TEST: update mpi_tests with new dt * TEST: update Gtest with new dt * BUILD: check dt size during preprocessing
explicitely disabeling tl/rccl was broken due to some misplaced parenthesis in the rccl.m4 file.
Co-authored-by: Wes Bland <wbland@fb.com>
This commits resync the rocm components with the ucc code base. Specifically, it adds support (or disqualifies itself) for various float(32,64,128)_complex datatypes. Co-authored-by: Sergey Lebedev <sergeyle@nvidia.com>
move the invocation of event_destroy() from the coll_finalize() function to the free_task() function to avoid leaving events in the mpool if a test is skipped because of an unsupported datatype.
attr flags are not initialized. Context fot these TLs is not created since core context assumes service team is required.
Co-authored-by: valentin petrov <valentinp@nvidia.com>
Co-authored-by: valentin petrov <valentinp@nvidia.com>
Co-authored-by: valentin petrov <valentinp@nvidia.com>
Co-authored-by: valentin petrov <valentinp@nvidia.com>
* TL/CUDA: allgather(v) linear alg * REVIEW: fix review comments * addeed algorithm description * fixed alignment * fixed copyright * REVIEW: apply clang format Co-authored-by: valentin petrov <valentinp@nvidia.com>
* MC/CUDA: add uint16(32,64) support in reduce * TEST: add CUDA reduce gtest with uint16(32,64) dt * TEST: add reduce mpi tests with uint16(32,64) dt Co-authored-by: valentin petrov <valentinp@nvidia.com>
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
sync up upstream UCC