Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 23.02 merge 22.12 #1051

Merged
merged 32 commits into from
Nov 29, 2022

Conversation

benfred
Copy link
Member

@benfred benfred commented Nov 28, 2022

PR to unblock forward merge #1018 -

benfred and others added 30 commits November 10, 2022 19:04
This change adds a pre-commit hook to run on each commit, that will automatically run code linters and not allow you 
to commit changes if the linters fail. This is similar to how cudf uses pre-commit, as described in rapidsai/cudf#9309 . The pre-commit checks will also be run in CI as part of the check style script.

This change also adds several new linters that weren't being run in CI before:
* pydocstyle
* mypy
* black
* isort
* cmake-format
* cmake-lint

Of these new linters - mypy caught an issue where the test_comms.py would always be skipped (`python/raft-dask/raft_dask/test/test_comms.py:23: error: Module "raft_dask" has no attribute "Comms"` ), and Pydocstyle caught some minor formatting inconsistencies in the python docstrings. black/isort ensure consistent code formatting for the python raft code

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ray Douglass (https://github.com/raydouglass)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#965
This prevents requiring full specification for all template parameters in downstream `make_mdspan` usage. A more complete discussion of how we want to use `memory_type` in the public API can be deferred to a later PR.

Authors:
  - William Hicks (https://github.com/wphicks)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1005
Similar to rapidsai/cudf#12097, this adds codespell as a linter to raft - and fixes the spelling mistakes it found

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Mark Sadang (https://github.com/msadang)

URL: rapidsai#1007
…i#1008)

This allows the `pylibraft` APIs to process outputs in place or allocate and return new device memory for outputs in a way which benefits from `__cuda_array_interface__` compliance. Ultimatley, it allows us to stay decoupled from libraries like cupy while still affording the user the benefits of not having to explicitly allocate outputs up front. 

Closes rapidsai#1002

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#1008
This PR adds python wrappers to IVF-PQ.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#970
This PR pins `dask` and `distributed` to `2022.11.0` for `22.12` release.
xref: rapidsai/cudf#12165

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#1022
Add an indexing parameter `adaptive_centers` (false by default) to control whether `index.centers()` should be kept up-to-date with the cluster data or remain immutable.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1019
…r higher (rapidsai#939)

-- 3xTF32 cutlass based L2 exp/cosine kernel provides 3.5x speedup for fp32 inputs compared to existing pairwise distance kernel for ampere or higher.
-- DMMA cutlass based implementation for L2 exp/cosine provides 2.6x speedup instead of existing double precision FMA pipeline based kernel.
-- add cutlass as header only dependency to RAFT.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Robert Maynard (https://github.com/robertmaynard)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#939
…rim and improve performance for thin matrices (small `n_cols`) (rapidsai#979)

This follows up on a discussion at rapidsai#652 (comment). The main goal of this PR is to make this helper accessible as a raft primitive.

I also used the opportunity to look at the performance of this primitive, and have improved it for:

- Thin matrices: less than 32 threads per row with shuffle-based reductions.
- Thick matrices: cub-based reduction doing one row per block.

Here is an overview of the before/after performance on A100:

![2022-11-11_normalize_perf_float_int32](https://user-images.githubusercontent.com/17441062/201403965-bf68d368-b64b-4a1f-92f0-a5de03b9d1a8.png)

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#979
A few optimizations to the `ivfpq_compute_similarity_kernel`:

  - Overhauled the way shmem/L1 carveout is selected
  - Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency
  - Ported a new warp-sort module (`warp_sort_distributed`)
  - Transposed `pq_centers` to make loads coalesced
  - Changed layout of `pq_dataset` to make loads coalesced and vectorized
  - Optimized the loops to minimize ALU load

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#926
Similar to rapidsai/cuml#4985, this PR changes the docs theme for `raft` to be in-line with rest of the rapids docs theme.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#1026
PR rapidsai#939 introduced CUTLASS dependency. When compiled in debug mode, this leads to the following error:

```
ptxas error   : Stack size for entry function '_ZN12raft_cutlass6KernelINS_...' cannot be statically determined
```

This would be normally just a warning, but we treat warnings as errors. This PR disables the warning in Debug mode.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1033
…apidsai#1029)

Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#1029
Fix some of the easier deprecated headers, leftovers from past refactorings.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1034
Use raft handle's lazy-loading helper `get_device_properties` instead of explicitly calling `cudaGetDeviceProperties` on every kernel launch, which is a costly operation.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#1035
…#1032)

This PR removes the dlopen logic for libucp in ucp_helper.hpp in favor of calling the relevant APIs directly. It also adds a new CMake component `raft::distributed` that can be used by dependent libraries to indicate the dependency on parts of raft that require UCX.

While it does not change any public APIs, I have marked this PR as breaking since it does mean that any C++ code linking to UCX must now ensure that UCX is available at link time. It is no longer sufficient to make the library available at runtime.

Resolves rapidsai#1031.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1032
Add cython bindings for the cluster_cost function, to allow computing inertia from python.

Closes rapidsai#972

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1028
…ion` performance (rapidsai#1011)

`dots_along_rows` in `ann_utils.cuh` was in some cases more performant than the corresponding raft primitive `rowNorm`, so I have improved that primitive in order to replace `dots_along_rows` without performance regressions. `rowNorm` for a row-major matrix calls `coalescedReduction`, which I have modified to conditionally select one of the following code paths based on the input dimensions:

- Thin: for matrices with many small rows, one block processes multiple rows, with 2 to 32 threads collaborating on each row using a shuffle-based reduction.
- Medium: the existing cub-based implementation with one block per row (I have only changed the reduction algorithm to raking which is more performant provided that the workload is big enough)
- Thick: two-step implementation. In the first step, multiple blocks per row reducing to an intermediate buffer (`main_op` is applied but not `final_op`). In the second step, reduces the intermediate buffer using the thin kernel (this time `final_op` is applied but not `main_op`).

Other changes included in this PR:

- In order to properly support shuffle-based reductions, I have added generic shuffle helpers that support arbitrary types by cutting them into chunks (based on size/alignment). This was adapted from similar helpers in CUB.
- I have added a helper for "logical" warp reduction, i.e sub-warps of 2, 4, 8, 16 or 32 threads, and added support for arbitrary reduction operations in the warp reduction.
- I have consolidated tests with support for arbitrary types and operations and tested some operations that in particular use the index argument of `main_op` such as an argmax, and only for the coalesced reduction I have added test cases with `raft::KeyValuePair`

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#1011
… x and y (rapidsai#1040)

Solves rapidsai#1036 

Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken.

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1040
README tweaks:

* Add a resources section with links to the generated HTML documentation
* Add a build status badge
* Add a section about installing with the new experimental pip packages

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1042
This PR enables building wheels for pylibraft and raft-dask.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Sevag H (https://github.com/sevagh)
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#1013
This PR implements refinement for approximate nearest neighbor search.

Refinement is a post processing step for ANN search, it follows an ANN search that returned `k0` neighbor candidates,
and select `k` out of these candidates. The selection by calculating exact distances from the original dataset.

Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ).

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1038
Add an extra check for the alignment of the input matrices to avoid misaligned address errors.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1045
@benfred benfred requested review from a team as code owners November 28, 2022 21:32
@benfred benfred added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Nov 28, 2022
@vyasr
Copy link
Contributor

vyasr commented Nov 29, 2022

This PR is massive. Have the 22.12->23.02 merges been off for a long time, or were they blocked by the pre-commit linters run causing merge conflicts (I see that's the first unmerged commit)?

@benfred
Copy link
Member Author

benfred commented Nov 29, 2022

This PR is massive. Have the 22.12->23.02 merges been off for a long time, or were they blocked by the pre-commit linters run causing merge conflicts (I see that's the first unmerged commit)?

Yeah - there was a merge conflict because of the pre-commit linters. That commit modified the cpp/CMakeLists.txt file (with auto formatting cmake), which conflicted with the change to bump the version number for 23.02 =(.

All I did in this PR is follow the steps here https://docs.rapids.ai/maintainers/gpuci/#forward-mergers , and resolve the merge conflict

@vyasr
Copy link
Contributor

vyasr commented Nov 29, 2022

Yup I'm sure you did the right thing with the forward merger, I was just a little concerned since the size of the PR makes it almost impossible to verify the correctness of the merge (at least for me, but I'm also no raft expert).

Copy link
Contributor

@msadang msadang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@raydouglass raydouglass merged commit 4eb0a80 into rapidsai:branch-23.02 Nov 29, 2022
@benfred benfred deleted the branch-23.02-merge-22.12 branch November 29, 2022 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp gpuCI improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

None yet