-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v22.08 #11444
Merged
Merged
[RELEASE] cudf v22.08 #11444
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
Signed-off-by: Peixin Li <pxli@nyu.edu> update build version of cudfjni to 22.08.0-SNAPSHOT Authors: - Peixin (https://github.com/pxLi) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #10910
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
This PR makes `Buffer.ptr` read-only and introduce `Buffer.from_buffer`: ```python @classmethod def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0): """ Create a buffer from another buffer Parameters ---------- buffer : Buffer The base buffer, which will also be set as the owner of the memory allocation. size : int, optional Size of the memory allocation (default: `buffer.size`). offset : int, optional Start offset relative to `buffer.ptr`. """ ``` This is mainly motivated by my work on [spilling](#10746) by making it a bit easier to reason about the relationship between buffers. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Bradley Dice (https://github.com/bdice) - Ashwin Srinath (https://github.com/shwina) URL: #10872
…h` (#10838) This PR registers uses the (presumably shortly merged) dask `Grouper` dispatch to register `cudf.core.groupby.Grouper` objects to `cudf.DataFrame` objects. This should allow our own Grouper objects to be used in critical places in dask rather than pandas objects. This solution is favorable IMO rather than changing cuDF to handle pandas grouper objects directly. Xref dask/dask#9074 Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10838
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
Fixes parts of #9373 added missing documentation to fix doxygen warnings in multiple files fixes 93 warnings. Authors: - Karthikeyan (https://github.com/karthikeyann) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10913
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
Cleans up the `regcomp.cpp` source to fix class names, comments, and simplify logic around processing operators and operands returned by the parser. Several class member variables used for state are moved or eliminated. Some member functions and variables are renamed. Cleanup of the parser logic will be in a follow-on PR. Reference #3582 Follow on to #10843 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Yunsong Wang (https://github.com/PointKernel) URL: #10879
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
Files in the groupby benchmark do not need to be in `.cu` extension---they don't contain any device code. This PR changes them to the `.cpp` extension. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) - Conor Hoekstra (https://github.com/codereport) URL: #10985
Adds duration columns to benchmarks of the formats that support these types (everything except ORC). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #10933
This PR changes the Python build system for cudf to use scikit-build and leverage CMake under the hood. This PR depends on rapidsai/rapids-cmake#198. Once that PR is merged, I can update the pull of rapids-cmake into the cudf Python CMake build. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Robert Maynard (https://github.com/robertmaynard) - AJ Schmidt (https://github.com/ajschmidt8) - Ashwin Srinath (https://github.com/shwina) URL: #10919
Closes #10909 This PR was intended to fix a bug in the `distinct` implementation where the stream parameter was not passed when invoking `static_map::contains`. During the work, @ttnghia Pointed out that the `contains` + `thrust::copy_if` logic can be simplified by using `static_map::retrieve_all`. Finally, the PR fetches a newer version of `cuco` to utilize `retrieve_all` and fixes a bug in unit tests where results should be sorted before comparison. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10916
Issue #10755 Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127. The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded. This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries. Also fixed `start_row` in row group info array in the reader (unrelated). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10989
…uble. (#10891) This PR changes a requirement to ensure that both value inputs to a sort-groupby covariance computation are convertible to double. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec - David Wendt (https://github.com/davidwendt) URL: #10891
… files (#10912) Fixes parts of #9373 added missing documentation to fix doxygen warnings in multiple files - cpp/include/cudf/io/avro.hpp - cpp/include/cudf/io/csv.hpp - cpp/include/cudf/io/json.hpp - cpp/include/cudf/io/orc.hpp - cpp/include/cudf/io/orc_metadata.hpp - cpp/include/cudf/io/parquet.hpp fixes 194 warnings Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #10912
Fixes parts of #9373 added missing documentation in aggregation.hpp to fix doxygen warnings fixes 108 warnings. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #10887
Fixes parts of #9373 added missing documentation to fix doxygen warnings in multiple files cpp/include/cudf/*.hpp fixes 40 warnings Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #10896
This PR adds the missing `#pragma once` in few header files in libcudf. minor include cleanup "" to <> Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #11004
Arrow version pinnings were relaxed in this commit: d740c3c, this PR performs the same change in dev env. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #11418
Fixes: rapidsai/docs#284 This PR fixes day(light) & night(dark) mode color schemes which makes text and a lot of html elements look unclear. In dark mode: Before: <img width="1489" alt="Screen Shot 2022-07-28 at 10 01 38 PM" src="https://user-images.githubusercontent.com/11664259/181674172-48e9dd8f-9fb9-447c-a63b-4a6f359b2c4f.png"> After: <img width="1569" alt="Screen Shot 2022-07-29 at 3 31 54 PM" src="https://user-images.githubusercontent.com/11664259/181838929-f27d664a-eb4c-4a72-8ad9-cf54246b3098.png"> In light mode: Before: <img width="1545" alt="Screen Shot 2022-07-28 at 10 03 36 PM" src="https://user-images.githubusercontent.com/11664259/181674247-2307b7a4-0dd5-410a-9cb2-ca18d641d89d.png"> After: <img width="1506" alt="Screen Shot 2022-07-29 at 3 31 07 PM" src="https://user-images.githubusercontent.com/11664259/181838856-fd0abb85-cc56-4392-8cef-182bb790fff4.png"> Introduced darker color schemes such that code text highlightings are visible properly in dark mode: Before: <img width="741" alt="Screen Shot 2022-07-28 at 10 06 08 PM" src="https://user-images.githubusercontent.com/11664259/181674530-aa78290f-b011-437e-a955-4e85bbbee5e9.png"> After: <img width="704" alt="Screen Shot 2022-07-28 at 10 06 15 PM" src="https://user-images.githubusercontent.com/11664259/181674545-3d0ba553-8b35-49b1-972a-a5fb1e33b0b9.png"> Introduced custom javascript method that will add hover text to "**Theme switcher**" button: <img width="552" alt="Screen Shot 2022-07-28 at 9 59 40 PM" src="https://user-images.githubusercontent.com/11664259/181674649-091c4b27-aa4b-4752-a8c8-45b7e71e3417.png"> <img width="622" alt="Screen Shot 2022-07-28 at 9 59 28 PM" src="https://user-images.githubusercontent.com/11664259/181674651-88bf5388-bf81-4633-9360-88ca5df88b85.png"> Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #11400
This PR resolves the following error showing up in latest `distributed`: ```python python/dask_cudf/dask_cudf/tests/test_distributed.py EE [100%] =========================================================================================== ERRORS =========================================================================================== _____________________________________________________________________________ ERROR at setup of test_basic[True] _____________________________________________________________________________ file /nvme/0/pgali/cudf/python/dask_cudf/dask_cudf/tests/test_distributed.py, line 24 @pytest.mark.parametrize("delayed", [True, False]) def test_basic(loop, delayed): # noqa: F811 file /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/distributed/utils_test.py, line 145 @pytest.fixture def loop(loop_in_thread): E fixture 'loop_in_thread' not found > available fixtures: benchmark, benchmark_weave, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, cleanup, current_cases, doctest_namespace, loop, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, testrun_uid, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory, worker_id > use 'pytest --fixtures [testpath]' for help on them. /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/distributed/utils_test.py:145 ____________________________________________________________________________ ERROR at setup of test_basic[False] _____________________________________________________________________________ file /nvme/0/pgali/cudf/python/dask_cudf/dask_cudf/tests/test_distributed.py, line 24 @pytest.mark.parametrize("delayed", [True, False]) def test_basic(loop, delayed): # noqa: F811 file /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/distributed/utils_test.py, line 145 @pytest.fixture def loop(loop_in_thread): E fixture 'loop_in_thread' not found > available fixtures: benchmark, benchmark_weave, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, cleanup, current_cases, doctest_namespace, loop, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, testrun_uid, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory, worker_id > use 'pytest --fixtures [testpath]' for help on them. ``` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #11428
…11429) Fixes #11425 Changes the `CUDF_VERSION_Arrow` cmake variable to be a cache entry so that a user can override it by providing `-DCUDF_VERSION_Arrow` at configure time. Happy to repeat this pattern for other dependencies if desired. Authors: - Keith Kraus (https://github.com/kkraus14) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) URL: #11429
This PR uses a documented template auto-generated by `doxygen` and inserts our custom js & css links to it. The process is documented here: https://doxygen.nl/manual/customize.html#minor_tweaks_header_css Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - David Wendt (https://github.com/davidwendt) URL: #11430
This PR pins `dask` & `distributed` to `2022.7.1` for `22.08` release. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Charles Blackmon-Luca (https://github.com/charlesbluca) - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/jakirkham URL: #11433
ajschmidt8
requested review from
galipremsagar,
isVoid,
PointKernel and
jrhemstad
August 2, 2022 20:02
github-actions
bot
added
CMake
CMake build issue
conda
Java
Affects Java cuDF API.
Python
Affects Python cuDF API.
libcudf
Affects libcudf (C++/CUDA) code.
labels
Aug 2, 2022
Codecov Report
@@ Coverage Diff @@
## main #11444 +/- ##
===========================================
+ Coverage 10.56% 86.47% +75.90%
===========================================
Files 116 144 +28
Lines 18677 22856 +4179
===========================================
+ Hits 1974 19765 +17791
+ Misses 16703 3091 -13612
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Generic atomic operations are currently implemented using an `atomicCAS` in a loop that determines when the current thread's result is the one that was actually saved to the address. The final check is performed by directly comparing the values, which can lead to infinite loops when the value being set is a `NaN` because all comparisons involving `NaN`s return false. This PR fixes that issues by casting the data to an integral type and comparing those, bypassing `NaN` comparison. This error was discovered in in hash-based aggregates `min` and `max`, and fixing it is a blocker for NVIDIA/spark-rapids#5989. Authors: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr)
This PR adds Java bindings for adding binary option for `ParquetOptions` Authors: Approvers: - Jim Brennan (https://github.com/jbrennan333) - Mike Wilson (https://github.com/hyperbolic2346)
This PR fixes a flaky test introduced by #11272, cudf joins by default does not guarantee return orders and may lead to occasional test regression. This PR adds `sort` argument to make sure result is deterministic. Note that `index.union` and `index.intersection` may also include random output ordering, but by default these methods sorts the result before returning so `sort` argument does not need to be modified. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/brandon-b-miller - Nghia Truong (https://github.com/ttnghia)
This PR switches the loading of `custom.js` to `defer` because we will need the entire page to be loading until the methods in this script can even execute correctly. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8)
In `cudf::detail::label_segments`, when the input lists column has empty/nulls lists at the end of the column, its `offsets` column will contain out-of-bound indices. This leads to invalid memory access bug. Such bug is elusive and doesn't show up consistently. Test failures reported in NVIDIA/spark-rapids#6249 are due to this. The existing unit tests already cover such corner case. Unfortunately, the bug didn't show up until being tested on some systems. Even that, it was very difficult to reproduce it. Closes #11495. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Tobias Ribizel (https://github.com/upsj) - Bradley Dice (https://github.com/bdice) - Jim Brennan (https://github.com/jbrennan333) - Alessandro Bellina (https://github.com/abellina) - Karthikeyan (https://github.com/karthikeyann)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-22.08
and v22.08 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-22.08
until release (merging of this PR).What is the purpose of this PR?
branch-22.08
intomain
for the release