Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v22.04 #10512

Merged
merged 262 commits into from
Apr 6, 2022
Merged

[RELEASE] cudf v22.04 #10512

merged 262 commits into from
Apr 6, 2022

Conversation

ajschmidt8
Copy link
Member

❄️ Code freeze for branch-22.04 and v22.04 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-22.04 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-22.04 into main for the release

GPUtester and others added 30 commits January 20, 2022 18:07
[gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
[gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
Signed-off-by: Peixin Li <pxli@nyu.edu>

Update cudfjni version to 22.04.0-SNAPSHOT

Authors:
  - Peixin (https://github.com/pxLi)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Nghia Truong (https://github.com/ttnghia)

URL: #10084
[gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
This PR relates to #7960 and indicates that new benchmarks should be written using [NVBench](https://github.com/NVIDIA/nvbench). (This PR does not resolve that issue because it does not transition any existing benchmarks to nvbench, it is only an update in the developer guide.)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #10093
Bug: This is currently throwing exceptions due to cython failing to encode the responses from the python callback automatically.

Authors:
  - Jeremy Dyer (https://github.com/jdye64)

Approvers:
  - https://github.com/brandon-b-miller
  - https://github.com/jsmaupin
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10103
This PR removes docker-related files that are no longer necessary.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #10069
The test `test_dataframe.py::test_memory_usage_multi` is currently flaky. In theory it can fail for any value of the `rows` parameter, but in practice we only observe failures for the smaller value of 10. The reason for this is that the data for the `MultiIndex` is being constructed by randomly sampling from an array of size 3, and for a sufficiently small sample (e.g. 10) the probability that selection will not actually include all three values (e.g. a sample of `[0, 1, 1, 1, 0, 1, 1, 0, 0, 1]`) is not vanishingly small and occurs with observable frequency. The resulting `MultiIndex` will encode the levels for that column as a column with only two values, and as a result the column will occupy 8 fewer bytes (one 64 bit integer or float) less of space than expected. This PR changes that by always sampling without replacement from an array of the same length as the number of rows. I could also have fixed this problem by fixing a random seed that ensures that all the values are always sampled, but I made this change instead because 1) it more clearly conveys the intent, and 2) fixing a seed is a change that we should discuss and apply globally across all our tests.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ashwin Srinath (https://github.com/shwina)

URL: #10114
This PR is a follow-up to #9917 and should be merged after that PR. This resolves #9695 and resolves #5401. The implementation here is only a first pass, but in the interest of prioritizing a working feature for the upcoming release I'm postponing making various additional changes (including some breaking ones).

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - David Wendt (https://github.com/davidwendt)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #10037
To address part of #5773
This allows to run benchmarks for only specific iterations using environment variable `CUDF_BENCHMARK_ITERATIONS`.
except when benchmark definition  itself specifies iteration count.

Also, makes pool as static to allocate pool memory resource only once per binary.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: #10060
This depends on #9941 and #10037.  Adds Java bindings for mixed left semi join and mixed left anti join.

Authors:
  - Jason Lowe (https://github.com/jlowe)

Approvers:
  - Alessandro Bellina (https://github.com/abellina)
  - Thomas Graves (https://github.com/tgravescs)
  - Jim Brennan (https://github.com/jbrennan333)

URL: #10040
This PR avoid make `nan_as_null` a no-op if `nan_count` is 0, i.e., there are no nulls present.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - https://github.com/brandon-b-miller

URL: #10082
This PR addresses suggestions from @robertmaynard, @karthikeyann, and @harrism in #10093 to remove the suffix `_benchmark` from the name of the benchmark files. The developer guide has been updated to reflect this change.

I left a few shared files with the word "benchmark" in them, such as [cpp/benchmarks/fixture/benchmark_fixture.hpp](https://github.com/rapidsai/cudf/blob/cfb6cbe6a8e4b01d7ee28fbe3212b5f2eb275167/cpp/benchmarks/fixture/benchmark_fixture.hpp) and [cpp/benchmarks/fixture/templated_benchmark_fixture.hpp](https://github.com/rapidsai/cudf/blob/cfb6cbe6a8e4b01d7ee28fbe3212b5f2eb275167/cpp/benchmarks/fixture/templated_benchmark_fixture.hpp).

I built the benchmarks locally to ensure that nothing was broken (I'm noting this because I recall that our CI does not build the benchmarks).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Karthikeyan (https://github.com/karthikeyann)
  - Nghia Truong (https://github.com/ttnghia)

URL: #10112
This commit cleans up some of the redundancy in CUDF JNI native code, and modifies some of the `unique_ptr` handling:

### Improvements
1. Added utility function to convert from `native_jpointerArray<column_view>` to `vector<column_view>`. Removed redundant code in `stringConcat` functions.
2. Added utility to convert from `unique_ptr<column>` to `jlong`. This eliminates variables from nearly all the JNI functions.
3. Added utility functions for extracting `std::vector` from `native_jArray` classes. This reduces raw loops in `TableJni.cpp`.
4. Eliminated unnecessary use of `unique_ptr` in locations where they are immediately `release()`d. (E.g. `ColumnVectorJni::getNativeColumnView()`.)
5. Removed unnecessary `std::move()` calls.
6. Removed unnecessary `reinterpret_cast` (E.g. in `copyColumnViewToCV()`.)
7. `typedef` -> `using`.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Jason Lowe (https://github.com/jlowe)

URL: #10019
This PR removes the deprecated method `Series.set_index`. Resolves #9541, follows up on #9529.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Christopher Harris (https://github.com/cwharris)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9945
[gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
Closes #10043 

A metadata singleton was allocated from the current rmm memory-resource inside the nvtext normalizer functions. If the memory resource is later changed, the metadata pointer may become invalidated. This PR removes the singleton pattern. 

The normalizer is used by the subword-tokenizer which is passed a vocabulary structure that is built only once and is maintained by the caller. The metadata has been added to this structure so it's lifetime can share the same scope.

The normalizer can also be called directly through the `nvtext::normalize_characters` API. Here the metadata table (size about 1MB) is now created on each call. This showed only significant performance impact on benchmarks testing a small number (<50K) of shorter strings.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #10090
This PR fixes an issue in caching `apply` UDFs where we were incorrectly getting a cache miss on the same function/dataframe. This was happening because `dict_values` objects don't compare as equal.

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)
  - Ashwin Srinath (https://github.com/shwina)

URL: #10133
Updates cmake-format script to branch 22.04. (@robertmaynard Is this supposed to be updated whenever we bump the release version?)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #10132
This PR resolves #10076. It improves `gpu_utils.py` by removing code for handling CUDA < 11.0, which we no longer support.

This is marked as "breaking" because of minor Python API changes. I changed the name of an error class from `UnSupportedCUDAError` to `UnsupportedCUDAError` and removed an unused error class named `UnSupportedGPUError`. It appears the `UnSupportedGPUError` class was introduced in #4692 but has never been used in the code.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10113
… reader (#10127)

The option was put in place to prevent Python users of the read_orc API to read decimal columns as 128bit, since this type was not supported in Python at the time.
Now that decimal128 in supported in Python, this option can be removed.

The change in technically breaking, but it is very unlikely that anyone is using the API.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10127
Fixes: #10117 

This PR adds a duplicate column validation check in `ColumnAccessor.rename_levels`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10120
This PR removes the `drop_nan` parameter from the internal API `IndexedFrame._drop_na_rows`. Its behavior was unused internally in cudf (always set to `True` in the public API `IndexedFrame.dropna`). The behavior of `drop_nan=False` was untested until 22.02 hotfix #10123, when an issue was found in gpu-bdb. However, that code can use the public API `df.dropna(axis=0)` instead. See rapidsai/gpu-bdb#228.

This is marked as a non-breaking change because it only affects internal APIs.

Resolves #10125, follows up on #10123.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10140
Pariatlly resolves: #10129

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10142
Currently, Java binding for `strings::split` calls the API with the default max split value. This PR adds a parameter for it, allowing to specify max split from Java.

This is non-breaking because the Java code implicitly uses the default max split value.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #10137
Reference #9856 specifically #9856 (comment)

Adds `cudf::strings::findall_record` which was initially implemented in nvstrings but not ported over since LIST column types did not exist at the time and returning a vector of small columns was very inefficient. This API should also allow using the current python function `cudf.str.findall()` with the `expand=False` parameter more effectively. A follow-on PR will address these python changes.

This PR reorganizes the libcudf strings _find_ source files into the `cpp/src/strings/search` subdirectory as well. Also, `findall()` has only a regex version so the `_re` suffix is dropped from the name in the libcudf implementation. The python changes in this PR address only the name change and the addition of the new API in the cython interface.

Depends on #9909 -- shares the `cudf::strings::detail::count_matches()` utility function.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9911
[gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
benchmark fixture - fix sharing static object pointer, replace with static shared pointer.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #10145
`cudf::jni::convert_table_for_return()` is usually used on tables returned from a libcudf API call.
It currently requires an l-value reference for its table argument. This necessitates parking the result of libcudf call in an avoidable temp variable.
This commit adds the option to use an r-value reference. This allows table expressions to be used directly, reducing clutter.

Note:
  1. The previous signature is retained, because not all call sites can use the r-value 
     interface cleanly. (E.g. when the libcudf call is complex.)
  2. The third argument (vector<unique_ptr<column>>) has been converted from l-ref to r-ref,
     so that an empty default can be introduced.

This commit also includes minor code cleanup in the periphery of calls to `convert_table_for_return()`.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #10131
Closes #10006 

Fixes a use case where the regex pattern creates a set of instructions that can cause the regex evaluation process to go into an infinite loop. For example, the pattern `(x?)+` creates the following instructions:

```
Instructions:
  0:    CHAR c='x', next=2
  1:      OR right=0, left=2, next=2
  2:    RBRA id=1, next=4
  3:    LBRA id=1, next=1
  4:      OR right=3, left=5, next=5
  5:     END
startinst_id=3
startinst_ids: [ 3 -1]
```

This causes in an infinite loop at instruction 4 where the path may go like: 4->3->1->2->4 ... forever.
Supporting this pattern does not look possible. The `+` quantifier is applied to capture group symbol `)` inside of which `x?` means 0 or more repeating the character `x`. This means it could match `x` or nothing and so applying the `+` to nothing would be invalid. That said, the pattern `x?+` currently already throws an error because of the invalid usage of `+` quantifier.

Therefore, the fix here adds a checking step after the instruction set is created to check for a possible infinite-loop case. If one is detected, an exception is thrown indicating the pattern is not supported.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10095
@ajschmidt8 ajschmidt8 requested a review from a team as a code owner March 24, 2022 20:36
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 24, 2022
Adds JNI bindings for t-digest reductions (in commit 756ed20)

Authors:
   - https://github.com/nvdbaranec
   - Andy Grove (https://github.com/andygrove)

Approvers:
   - Jason Lowe (https://github.com/jlowe)
shwina and others added 9 commits March 29, 2022 14:50
This PR closes #10527, an issue with outdated code in our "10 minutes" notebook. The material changes are on lines 65 and 66.

Authors:
   - Ashwin Srinath (https://github.com/shwina)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Bradley Dice (https://github.com/bdice)
This PR pins Click to a version supported by the version of black that we use. #10523 is too disruptive to backport to 22.04 in the middle of a release, so this is a minimum change to unblock linter runs.

Authors:
   - Vyas Ramasubramani (https://github.com/vyasr)
   - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
…seen in NDS q72 in Spark (#10534)

The following change addresses a performance degradation we noticed in the `mixed_join` and `compute_mixed_join_output_size` that looks to be tied to the theoretical occupancy of these kernels, as limited by the number of registers used.

The regression is triggered by this patch: #9727, which improves handling of unreachable code paths. That said, somehow, this change is altering the number of registers these kernels need. Both `mixed_join` and `compute_mixed_join_output_size` are very sensitive to the register count, per NSight compute. With the patch, the register required changed from 92 to 102, and 118 to 141 respectively. 

The fix here hints the compiler what our block size is (128 threads). This, from our testing, allows the compiler to reduce the number of registers required to 128 for `compute_mixed_join_output_size` and 96 for `mixed_join`. This lead to better occupancy (I think @nvdbaranec measured it going from 30% to 50%) and I saw the wall clock time of q72 (which started all this) to go from 133s to 121s, which is within the ballpark I'd expect.

Authors:
   - Alessandro Bellina (https://github.com/abellina)

Approvers:
   - Mike Wilson (https://github.com/hyperbolic2346)
CMake 3.23 appears to have a bug relating to CUDA support that breaks our builds.

Authors:
   - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #10512 (4c84184) into main (53a31d1) will increase coverage by 75.73%.
The diff coverage is n/a.

❗ Current head 4c84184 differs from pull request most recent head b50ae82. Consider uploading reports for the commit b50ae82 to get more accurate results

@@             Coverage Diff             @@
##             main   #10512       +/-   ##
===========================================
+ Coverage   10.42%   86.15%   +75.73%     
===========================================
  Files         119      141       +22     
  Lines       20607    22510     +1903     
===========================================
+ Hits         2148    19394    +17246     
+ Misses      18459     3116    -15343     
Impacted Files Coverage Δ
...ython/custreamz/custreamz/tests/test_dataframes.py 99.39% <0.00%> (-0.01%) ⬇️
python/custreamz/custreamz/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/__init__.py 82.35% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/fuzzer.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_orc.py 100.00% <0.00%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/udf/pipeline.py
python/dask_cudf/dask_cudf/tests/test_struct.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/udf/templates.py 100.00% <0.00%> (ø)
... and 111 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 774d859...b50ae82. Read the comment docs.

This commit introduces JNI bindings to retrieve the major and minor CUDA compute capability versions for the current CUDA device.

This feature enables introspection from `spark-rapids` to detect the GPU architecture, for model-specific behaviour.
This is required from NVIDIA/spark-rapids/pull/5122, to work around the erroneous behaviour of JNI `fixed_width_convert_to_rows()` on Pascal GPUs (#10569), (which in turn produces failures like NVIDIA/spark-rapids/issues/4980).

Authors:
   - MithunR (https://github.com/mythrocks)

Approvers:
   - https://github.com/nvdbaranec
   - Jason Lowe (https://github.com/jlowe)
   - Nghia Truong (https://github.com/ttnghia)
galipremsagar and others added 3 commits April 5, 2022 11:15
@bdice spotted some copyright failures in #10512

This is because CI copyright checks were introduced in between of `22.04` release and some files that were modified prior to that didn't have their copyrights updated, so this PR updates those missing files.

Authors:
   - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
Replaces the deprecated use of `as_gpu_matrix` with `to_cupy`, and also updates some deprecated calls to `cupy.fromDlpack`.

Closes #10582

Authors:
   - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - AJ Schmidt (https://github.com/ajschmidt8)
@raydouglass raydouglass merged commit 803c42a into main Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet