Branch 23.02 merge 22.12 #1051

benfred · 2022-11-28T21:32:05Z

PR to unblock forward merge #1018 -

This change adds a pre-commit hook to run on each commit, that will automatically run code linters and not allow you to commit changes if the linters fail. This is similar to how cudf uses pre-commit, as described in rapidsai/cudf#9309 . The pre-commit checks will also be run in CI as part of the check style script. This change also adds several new linters that weren't being run in CI before: * pydocstyle * mypy * black * isort * cmake-format * cmake-lint Of these new linters - mypy caught an issue where the test_comms.py would always be skipped (`python/raft-dask/raft_dask/test/test_comms.py:23: error: Module "raft_dask" has no attribute "Comms"` ), and Pydocstyle caught some minor formatting inconsistencies in the python docstrings. black/isort ensure consistent code formatting for the python raft code Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) - Ray Douglass (https://github.com/raydouglass) - Bradley Dice (https://github.com/bdice) URL: rapidsai#965

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1006

This prevents requiring full specification for all template parameters in downstream `make_mdspan` usage. A more complete discussion of how we want to use `memory_type` in the public API can be deferred to a later PR. Authors: - William Hicks (https://github.com/wphicks) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1005

Similar to rapidsai/cudf#12097, this adds codespell as a linter to raft - and fixes the spelling mistakes it found Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Mark Sadang (https://github.com/msadang) URL: rapidsai#1007

Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#995

Re-enable copy_prs ops-bot plugin

…i#1008) This allows the `pylibraft` APIs to process outputs in place or allocate and return new device memory for outputs in a way which benefits from `__cuda_array_interface__` compliance. Ultimatley, it allows us to stay decoupled from libraries like cupy while still affording the user the benefits of not having to explicitly allocate outputs up front. Closes rapidsai#1002 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Divye Gala (https://github.com/divyegala) URL: rapidsai#1008

This PR adds python wrappers to IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#970

Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1020

This PR pins `dask` and `distributed` to `2022.11.0` for `22.12` release. xref: rapidsai/cudf#12165 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1022

Add an indexing parameter `adaptive_centers` (false by default) to control whether `index.centers()` should be kept up-to-date with the cluster data or remain immutable. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1019

…r higher (rapidsai#939) -- 3xTF32 cutlass based L2 exp/cosine kernel provides 3.5x speedup for fp32 inputs compared to existing pairwise distance kernel for ampere or higher. -- DMMA cutlass based implementation for L2 exp/cosine provides 2.6x speedup instead of existing double precision FMA pipeline based kernel. -- add cutlass as header only dependency to RAFT. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Robert Maynard (https://github.com/robertmaynard) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#939

Reverts rapidsai#1022 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1023

…rim and improve performance for thin matrices (small `n_cols`) (rapidsai#979) This follows up on a discussion at rapidsai#652 (comment). The main goal of this PR is to make this helper accessible as a raft primitive. I also used the opportunity to look at the performance of this primitive, and have improved it for: - Thin matrices: less than 32 threads per row with shuffle-based reductions. - Thick matrices: cub-based reduction doing one row per block. Here is an overview of the before/after performance on A100: ![2022-11-11_normalize_perf_float_int32](https://user-images.githubusercontent.com/17441062/201403965-bf68d368-b64b-4a1f-92f0-a5de03b9d1a8.png) Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#979

A few optimizations to the `ivfpq_compute_similarity_kernel`: - Overhauled the way shmem/L1 carveout is selected - Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency - Ported a new warp-sort module (`warp_sort_distributed`) - Transposed `pq_centers` to make loads coalesced - Changed layout of `pq_dataset` to make loads coalesced and vectorized - Optimized the loops to minimize ALU load Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#926

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1027

Similar to rapidsai/cuml#4985, this PR changes the docs theme for `raft` to be in-line with rest of the rapids docs theme. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1026

PR rapidsai#939 introduced CUTLASS dependency. When compiled in debug mode, this leads to the following error: ``` ptxas error : Stack size for entry function '_ZN12raft_cutlass6KernelINS_...' cannot be statically determined ``` This would be normally just a warning, but we treat warnings as errors. This PR disables the warning in Debug mode. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1033

…apidsai#1029) Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1029

Fix some of the easier deprecated headers, leftovers from past refactorings. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1034

Use raft handle's lazy-loading helper `get_device_properties` instead of explicitly calling `cudaGetDeviceProperties` on every kernel launch, which is a costly operation. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1035

…#1032) This PR removes the dlopen logic for libucp in ucp_helper.hpp in favor of calling the relevant APIs directly. It also adds a new CMake component `raft::distributed` that can be used by dependent libraries to indicate the dependency on parts of raft that require UCX. While it does not change any public APIs, I have marked this PR as breaking since it does mean that any C++ code linking to UCX must now ensure that UCX is available at link time. It is no longer sufficient to make the library available at runtime. Resolves rapidsai#1031. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1032

Add cython bindings for the cluster_cost function, to allow computing inertia from python. Closes rapidsai#972 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1028

…ion` performance (rapidsai#1011) `dots_along_rows` in `ann_utils.cuh` was in some cases more performant than the corresponding raft primitive `rowNorm`, so I have improved that primitive in order to replace `dots_along_rows` without performance regressions. `rowNorm` for a row-major matrix calls `coalescedReduction`, which I have modified to conditionally select one of the following code paths based on the input dimensions: - Thin: for matrices with many small rows, one block processes multiple rows, with 2 to 32 threads collaborating on each row using a shuffle-based reduction. - Medium: the existing cub-based implementation with one block per row (I have only changed the reduction algorithm to raking which is more performant provided that the workload is big enough) - Thick: two-step implementation. In the first step, multiple blocks per row reducing to an intermediate buffer (`main_op` is applied but not `final_op`). In the second step, reduces the intermediate buffer using the thin kernel (this time `final_op` is applied but not `main_op`). Other changes included in this PR: - In order to properly support shuffle-based reductions, I have added generic shuffle helpers that support arbitrary types by cutting them into chunks (based on size/alignment). This was adapted from similar helpers in CUB. - I have added a helper for "logical" warp reduction, i.e sub-warps of 2, 4, 8, 16 or 32 threads, and added support for arbitrary reduction operations in the warp reduction. - I have consolidated tests with support for arbitrary types and operations and tested some operations that in particular use the index argument of `main_op` such as an argmax, and only for the coalesced reduction I have added test cases with `raft::KeyValuePair` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1011

… x and y (rapidsai#1040) Solves rapidsai#1036 Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken. Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1040

README tweaks: * Add a resources section with links to the generated HTML documentation * Add a build status badge * Add a section about installing with the new experimental pip packages Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1042

This PR enables building wheels for pylibraft and raft-dask. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Sevag H (https://github.com/sevagh) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#1013

This PR implements refinement for approximate nearest neighbor search. Refinement is a post processing step for ANN search, it follows an ANN search that returned `k0` neighbor candidates, and select `k` out of these candidates. The selection by calculating exact distances from the original dataset. Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ). Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1038

Add an extra check for the alignment of the input matrices to avoid misaligned address errors. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1045

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1030

vyasr · 2022-11-29T18:19:21Z

This PR is massive. Have the 22.12->23.02 merges been off for a long time, or were they blocked by the pre-commit linters run causing merge conflicts (I see that's the first unmerged commit)?

benfred · 2022-11-29T18:49:06Z

This PR is massive. Have the 22.12->23.02 merges been off for a long time, or were they blocked by the pre-commit linters run causing merge conflicts (I see that's the first unmerged commit)?

Yeah - there was a merge conflict because of the pre-commit linters. That commit modified the cpp/CMakeLists.txt file (with auto formatting cmake), which conflicted with the change to bump the version number for 23.02 =(.

All I did in this PR is follow the steps here https://docs.rapids.ai/maintainers/gpuci/#forward-mergers , and resolve the merge conflict

vyasr · 2022-11-29T21:26:07Z

Yup I'm sure you did the right thing with the forward merger, I was just a little concerned since the size of the PR makes it almost impossible to verify the correctness of the merge (at least for me, but I'm also no raft expert).

msadang

LGTM

benfred and others added 30 commits November 10, 2022 19:04

Moving contributing and developer guides to main docs (rapidsai#1006)

ee8ab81

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1006

Changing Overloads for GCC 11/12 bug (rapidsai#995)

3d9f4b7

Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#995

Re-enable copy_prs ops-bot plugin

24adae1

Merge pull request rapidsai#1014 from sevagh/feat/enable-copy-prs

1d75bfd

Re-enable copy_prs ops-bot plugin

IVF-PQ Python wrappers (rapidsai#970)

355f693

This PR adds python wrappers to IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#970

Fix for gemmi deprecation (rapidsai#1020)

e9d0944

Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1020

Revert " Pin dask and distributed for release" (rapidsai#1023)

f755fd8

Reverts rapidsai#1022 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1023

Adding lightweight cai_wrapper to reduce boilerplate (rapidsai#1027)

f6e5226

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1027

Fix trivial deprecated header includes (rapidsai#1034)

f2b8ae0

Fix some of the easier deprecated headers, leftovers from past refactorings. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1034

fusedL2NN: Add input alignment checks (rapidsai#1045)

8162565

Add an extra check for the alignment of the input matrices to avoid misaligned address errors. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1045

cjnolet and others added 2 commits November 24, 2022 01:38

Fixing broken doc functions and improving coverage (rapidsai#1030)

11c5105

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1030

Merge branch-22.12 into branch-23.02

abf256d

benfred requested review from a team as code owners November 28, 2022 21:32

github-actions bot added CMake cpp gpuCI python labels Nov 28, 2022

benfred added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Nov 28, 2022

benfred mentioned this pull request Nov 28, 2022

Add python bindings for kmeans fit #1016

Merged

msadang approved these changes Nov 29, 2022

View reviewed changes

cjnolet approved these changes Nov 29, 2022

View reviewed changes

raydouglass merged commit 4eb0a80 into rapidsai:branch-23.02 Nov 29, 2022

benfred deleted the branch-23.02-merge-22.12 branch November 29, 2022 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 23.02 merge 22.12 #1051

Branch 23.02 merge 22.12 #1051

benfred commented Nov 28, 2022

vyasr commented Nov 29, 2022

benfred commented Nov 29, 2022

vyasr commented Nov 29, 2022

msadang left a comment

Branch 23.02 merge 22.12 #1051

Branch 23.02 merge 22.12 #1051

Conversation

benfred commented Nov 28, 2022

vyasr commented Nov 29, 2022

benfred commented Nov 29, 2022

vyasr commented Nov 29, 2022

msadang left a comment

Choose a reason for hiding this comment