Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5223

beckernick · 2023-02-09T21:58:02Z

This PR:

Updates the CPU dispatching logic to support keyword arguments that are numeric, boolean, and strings (i.e., things that can't be coerced to cuml arrays). It doesn't support passing sequences, as I believe this use case isn't necessary
Makes a minor update to the tests to test this dispatching in principle. We may want to expand the testing of keyword arguments in general, but as this is likely worth a broader discussion/test expansion I thought it might be out of scope for this small PR

Fixes rapidsai#4617 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4735

Fix static storage error: ``` /usr/bin/ld: bench/CMakeFiles/sg_benchmark.dir/sg/arima_loglikelihood.cu.o: in function `ML::Bench::Fixture::SetUp(benchmark::State const&)': tmpxft_0000bc8b_00000000-6_arima_loglikelihood.cudafe1.cpp:(.text._ZN2ML5Bench7Fixture5SetUpERKN9benchmark5StateE[_ZN2ML5Bench7Fixture5SetUpERKN9benchmark5StateE]+0x2d): undefined reference to `ML::Bench::Fixture::NumStreams' ``` Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4766

Rapids recently bumped the `xgbooot` to `1.6.0` from `1.5.2` in: rapidsai/integration#487, this PR adapts to those recent changes. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4777

This PR updates raft outdated pinnings in dev yml files. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Thejaswi. N. S (https://github.com/teju85) - Ray Douglass (https://github.com/raydouglass) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4778

Changes to be in line with: rapidsai/cudf#11058 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4771

Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4782

…#4770) Resolves rapidsai#4442 This PR fixes the issue with using mixed data types in regression errors like `mean_squared_error`, `mean_absolute_error` and `mean_squared_log_error`. Authors: - Shaswat Anand (https://github.com/shaswat-indian) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4770

…th a ColumnTransformer step (rapidsai#4774) This PR fixes a subtle bug in check_array of cuml.thirdparty_adapters.adapters which is the primary cause for the bug. Fix rapidsai#4368. Authors: - https://github.com/VamsiTallam95 - Ray Douglass (https://github.com/raydouglass) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4774

Authors: - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4789

Pin max version of `cuda-python` to `11.7.0` Authors: - Jordan Jacobelli (https://github.com/Ethyling) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4793

Pin max version of `cuda-python` to `11.7.0` This is a back port of rapidsai#4793. Authors: - Jordan Jacobelli (https://github.com/Ethyling) Approvers:

@robertmaynard

## Description This PR cleans up some `#include`s for Thrust. This is meant to help ease the transition to Thrust 1.17 when that is updated in rapids-cmake. ## Context I opened a PR rapidsai/cudf#10489 that updates cuDF to Thrust 1.16. Notably, Thrust reduced the number of internal header inclusions: > [rapidsai#1572](NVIDIA/thrust#1572) Removed several unnecessary header includes. Downstream projects may need to update their includes if they were relying on this behavior. I spoke with @robertmaynard and he recommended making similar changes to clean up includes ("include what we use," in essence) to make sure we have compatibility with future versions of Thrust across all RAPIDS libraries. This changeset also removes dependence on `thrust/detail` headers. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4675

closes rapidsai#4210 Added cosine distance metric for computing epsilon neighborhood in DBSCAN. The cosine distance computed as L2 norm of L2 normalized vectors and the epsilon value is adjusted accordingly. Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4776

Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Ray Douglass (https://github.com/raydouglass) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4809

Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4805

This PR resolves rapidsai#802 by adding python API for `v_measure_score`. Also came across an [issue](rapidsai#4784) while working on this. Authors: - Shaswat Anand (https://github.com/shaswat-indian) Approvers: - Micka (https://github.com/lowener) - William Hicks (https://github.com/wphicks) URL: rapidsai#4785

Fixes issue rapidsai#2387. For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory. This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix. The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue rapidsai#2387. In this commit, the adjgraph step is sped up. This is done in two ways: 1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step. 2. The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory. Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#4803

This functionality has been moved to RAFT. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4829

…4804) This PR removes the naive versions of the DBSCAN algorithms. They were not used anymore and were largely incorrect, as described in rapidsai#3414. This fixes issue rapidsai#3414. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4804

[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]

Pass `NVTX` option to raft in a more similar way to the other arguments and make sure `RAFT_NVTX` option in the installed `raft-config.cmake`. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#4825

[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]

The conda recipe was updated to UCX 1.13.0 in rapidsai#4809 , but updating conda environment files was missing there. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) URL: rapidsai#4813

Allows cuML to be installed with CuPy 11. xref: rapidsai/integration#508 Authors: - https://github.com/jakirkham Approvers: - Sevag H (https://github.com/sevagh) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4837

Resolves rapidsai#3403 This PR adds support for using `pandas.Series` as an input to `TfidfVectorizer`, `HashingVectorizer` and `CountVectorizer`. Authors: - Shaswat Anand (https://github.com/shaswat-indian) - Ray Douglass (https://github.com/raydouglass) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4811

Authors: - William Hicks (https://github.com/wphicks) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd)

Forward-merge branch-23.02 to branch-23.04

Removed slow modulo operator by minor change in index arithmetic. This gave me following performance improvement for a test case: | | branch-23.02 |kernel-shap-improvments | Gain | |-------------------------|------------------|-------------------------|------| | sampled_rows_kernel | 663 | 193 | 3.4x | | exact_rows_kernel | 363 | 236 | 1.5x | All times in microseconds. Code used for benchmarking: ```python from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor as rf from cuml.explainer import KernelExplainer import numpy as np data, labels = make_classification(n_samples=1000, n_features=20, n_informative=20, random_state=42, n_redundant=0, n_repeated=0) X_train, X_test, y_train, y_test = train_test_split(data, labels, train_size=998, random_state=42) #sklearn train_test_split y_train = np.ravel(y_train) y_test = np.ravel(y_test) model = rf(random_state=42).fit(X_train, y_train) cu_explainer = KernelExplainer(model=model.predict, data=X_train, is_gpu_model=False, random_state=42, nsamples=100) cu_shap_values = cu_explainer.shap_values(X_test) print('cu_shap:', cu_shap_values) ``` Authors: - Vinay Deshpande (https://github.com/vinaydes) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5187

Forward-merge branch-23.02 to branch-23.04

…s during cpu dispatch

review-notebook-app · 2023-02-15T17:31:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…re numeric, boolean, and strings (i.e., things that can't be coerced to cuml arrays). Continuation of rapidsai#5223

beckernick · 2023-02-15T18:03:17Z

Closing in favor of #5236

…ds during CPU dispatching (#5236) This PR: - Updates the CPU dispatching logic to support keyword arguments that are numeric, boolean, and strings (i.e., things that can't be coerced to cuml arrays). It doesn't support passing sequences, as I believe this use case isn't necessary - Makes a minor update to the tests to test this dispatching in principle. We may want to expand the testing of keyword arguments in general, but as this is likely worth a broader discussion/test expansion I thought it might be out of scope for this small PR Closes #5218 This is a replacement for #5223 Authors: - Nick Becker (https://github.com/beckernick) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #5236

viclafargue and others added 30 commits May 31, 2022 15:04

Fix KBinsDiscretizer bin_edges_ (rapidsai#4735)

c1b4fbe

Fixes rapidsai#4617 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4735

Merge branch branch-22.06 into branch-22.08

ac1965a

update changelog

59a1241

Update ops-bot.yaml

6f51c1f

Fix KNN error message. (rapidsai#4782)

b26fe7e

Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4782

Fix forward-merge branch-22.06 to branch-22.08 (rapidsai#4789)

8dc2b08

Authors: - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4789

Pin max version of cuda-python to 11.7.0 (rapidsai#4793)

d1bc755

Pin max version of `cuda-python` to `11.7.0` Authors: - Jordan Jacobelli (https://github.com/Ethyling) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4793

Pin max version of cuda-python to 11.7.0 (rapidsai#4801)

4dfcf3f

Pin max version of `cuda-python` to `11.7.0` This is a back port of rapidsai#4793. Authors: - Jordan Jacobelli (https://github.com/Ethyling) Approvers:

Update conda recipes to UCX 1.13.0 (rapidsai#4809)

1629320

Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Ray Douglass (https://github.com/raydouglass) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4809

Add ComplementNB to the documentation (rapidsai#4805)

b5a48db

Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4805

DOC

7adfccc

Remove duplicate adj_to_csr implementation (rapidsai#4829)

092c4de

This functionality has been moved to RAFT. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4829

Merge pull request rapidsai#4833 from rapidsai/branch-22.08

628f4c7

[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]

Merge pull request rapidsai#4834 from rapidsai/branch-22.08

2864632

[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]

Allow CuPy 11 (rapidsai#4837)

dc77d6b

Allows cuML to be installed with CuPy 11. xref: rapidsai/integration#508 Authors: - https://github.com/jakirkham Approvers: - Sevag H (https://github.com/sevagh) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4837

Merge branch-22.06 into branch-22.08

794815c

wphicks and others added 5 commits February 7, 2023 14:57

Do not blindly push all locals into kwargs (rapidsai#5210)

d444099

Authors: - William Hicks (https://github.com/wphicks) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd)

Merge pull request rapidsai#5213 from rapidsai/branch-23.02

c42ae2a

Forward-merge branch-23.02 to branch-23.04

update changelog

3bc1de0

Merge pull request rapidsai#5220 from rapidsai/branch-23.02

17e0ec9

Forward-merge branch-23.02 to branch-23.04

beckernick added bug Something isn't working non-breaking Non-breaking change labels Feb 9, 2023

beckernick self-assigned this Feb 9, 2023

github-actions bot added the Cython / Python Cython or Python issue label Feb 9, 2023

beckernick marked this pull request as ready for review February 9, 2023 22:07

beckernick requested a review from a team as a code owner February 9, 2023 22:07

ajschmidt8 force-pushed the branch-23.04 branch from 6f2fda7 to 20d2690 Compare February 13, 2023 18:57

ajschmidt8 requested review from a team as code owners February 13, 2023 18:57

cjnolet and others added 4 commits February 15, 2023 12:31

Updating single gpu test to be a little more robust

8866468

support transfering number, bool, and string keyword arguments to cpu…

cdf1dde

…s during cpu dispatch

additional tests

9023750

add else valueerror

ec5dd05

beckernick force-pushed the bugfix/cpu-args-non-arrays branch from f8762ef to ec5dd05 Compare February 15, 2023 17:31

github-actions bot added ci CMake conda conda issue CUDA/C++ labels Feb 15, 2023

beckernick added a commit to beckernick/cuml that referenced this pull request Feb 15, 2023

Updates the CPU dispatching logic to support keyword arguments that a…

1245494

…re numeric, boolean, and strings (i.e., things that can't be coerced to cuml arrays). Continuation of rapidsai#5223

beckernick mentioned this pull request Feb 15, 2023

Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5236

Merged

beckernick closed this Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5223

Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5223

beckernick commented Feb 9, 2023

review-notebook-app bot commented Feb 15, 2023

beckernick commented Feb 15, 2023

Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5223

Support numeric, boolean, and string keyword arguments to class methods during CPU dispatching #5223

Conversation

beckernick commented Feb 9, 2023

review-notebook-app bot commented Feb 15, 2023

beckernick commented Feb 15, 2023