Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fp16 in CAGRA and IVF-PQ #2085

Merged
merged 22 commits into from
Jan 19, 2024

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Jan 10, 2024

Add fp16 (CUDA half) support to CAGRA and its dependencies.

@achirkin achirkin added feature request New feature or request non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels Jan 10, 2024
@achirkin achirkin self-assigned this Jan 10, 2024
@achirkin
Copy link
Contributor Author

NB: this does not add the fp16 capabilities to the ann-bench executable. I suggest we add that in a follow-on PR.

@achirkin achirkin marked this pull request as ready for review January 11, 2024 17:02
@achirkin achirkin requested review from a team as code owners January 11, 2024 17:02
@achirkin achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Jan 11, 2024
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for this PR!

We have here a large amount of boilerplate code. Fortunately the nontrivial changes are relatively small, and are confined to

  • mdspan_numpy_serializer.hpp
  • device_load_stores.cuh
  • test/neighbors/ann_cagra.cuh

The PR looks good to me!

cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
cpp/include/raft/util/device_loads_stores.cuh Show resolved Hide resolved
@@ -148,6 +149,26 @@ DI void sts(int32_t* addr, const int32_t (&x)[4])
: "l"(s4), "r"(x[0]), "r"(x[1]), "r"(x[2]), "r"(x[3]));
}

DI void sts(half* addr, const half& x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @mdoijade to have a look at the changes in this file, since the load and store ops here are mostly used by IVF-Flat and contractions.cuh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additions to device_loads_stores.cuh looks good to me, I agree it is good to have matching sts function call for lds for larger fp16 vector sizes.

cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
achirkin and others added 6 commits January 18, 2024 09:52
Co-authored-by: tsuki <12711693+enp1s0@users.noreply.github.com>
Co-authored-by: tsuki <12711693+enp1s0@users.noreply.github.com>
Co-authored-by: tsuki <12711693+enp1s0@users.noreply.github.com>
@achirkin
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 72f48ae into rapidsai:branch-24.02 Jan 19, 2024
61 checks passed
cjnolet added a commit to cjnolet/raft that referenced this pull request Feb 8, 2024
raydouglass pushed a commit that referenced this pull request Feb 9, 2024
RAFT C++ tests were not running for a portion of the 24.02 development cycle, until the merger of rapidsai/rapids-cmake#533.

This PR fixes some failing tests and reverts PRs that caused test failures that were silent until now, specifically #2097 and #2085. These features will be revisited in a subsequent release.

Authors:
   - Malte Förster (https://github.com/mfoerste4)
   - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
   - Ben Frederickson (https://github.com/benfred)
   - Bradley Dice (https://github.com/bdice)
achirkin added a commit to achirkin/raft that referenced this pull request Feb 11, 2024
Add fp16 (CUDA half) support to CAGRA and its dependencies.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - tsuki (https://github.com/enp1s0)

URL: rapidsai#2085
rapids-bot bot pushed a commit that referenced this pull request Feb 13, 2024
1. Add fp16 (CUDA half) support to CAGRA and its dependencies (#2085).
2. Fix the shared memory size error in the ivf-flat that got exposed by new tests in #2085.

Regarding the point (2):
Warp-sort top-k queue uses shared memory; the module provides the required shmem size calculation function decoupled from the queue object itself. As a result, it's easy to plug-in wrong types and get the calculation incorrectly.
IVF-Flat scan kernel always kept the distances in the queue as floats, but we calculated the shmem size as if it used `AccT` (IVF-Flat's internal accumulation type). Hence, with adding the tests with fp16 inputs (and `AccT`), the allocated shmem became too small, which resulted in memory access violation errors.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: #2172
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

4 participants