Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-23.12 to branch-24.02 #14422

Merged
merged 18 commits into from
Nov 16, 2023

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 16, 2023

Manual forward merge from 23.12 to 24.02. This PR should not be squashed. Closes #14406.

vuule and others added 17 commits November 10, 2023 00:00
Update the nvCOMP version used for cuIO compression/decompression to 3.0.4.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#13815
All of these wrappers have now been upstreamed into Cython as of Cython 3.0.3.

Contributes to rapidsai#14023

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)
  - Jake Awe (https://github.com/AyodeAwe)

URL: rapidsai#14382
Creates a normalizing offsets iterator that returns an int64 value given either a int32 or int64 column data.
Depends on rapidsai#14206

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#14234
…rapidsai#14364)

* Update dependency lists

* Update wheel building to stop needing manual installations

* Update wheel dependency with alpha spec

* Rename the package

* Update update-version.sh

* Update conda/recipes/dask-cudf/meta.yaml

Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>

* Make pip/conda dependencies consistent and fix recipe

* dfg

* Apply suggestions from code review

---------

Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>
…sai#14399)

Corrects failures seen in C++ CI where libnvbench.so can't be found

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14399
…14388)

Closes rapidsai#14384. `x.startswith(y)` is not a good enough check for if `x` is a subdirectory of `y`. It causes `pandasai` to be reported as a sub-package of `pandas`.

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - https://github.com/brandon-b-miller

URL: rapidsai#14388
Refactor the currently outdated cudf_kafka build setup to use skbuild instead.

Authors:
  - Jeremy Dyer (https://github.com/jdye64)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#14292
Adds a new BytePairEncoding class to cuDF
```
>>> import cudf
>>> from cudf.core.byte_pair_encoding import BytePairEncoder
>>> mps = cudf.read_text('merges.txt', delimiter='\n', strip_delimiters=True)
>>> bpe = BytePairEncoder(mps)
>>> str_series = cudf.Series(['This is a sentence', 'thisisit'])
>>> bpe(str_series)
0    This is a sent ence
1             this is it
dtype: object
```
This class wraps the existing `nvtext::byte_pair_encoding` APIs to load the merge-pairs data and encode a column of strings.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#13891
…4393)

Fixes a bug introduced in rapidsai#14336 when trying to simplify the token-counting logic as per this discussion rapidsai#14336 (comment)
The simplification caused an error which was found when running the nvtext benchmarks.
The appropriate gtest has been updated to cover this case now.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: rapidsai#14393
This PR switches remaining usages of `dask` dependencies to use `rapids-dask-dependency`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Jake Awe (https://github.com/AyodeAwe)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#14407
This PR contributes to rapidsai#13744.
-Added stream parameters to public APIs
`cudf::io::read_csv`
`cudf::io::write_csv`
-Added stream gtests

Authors:
  - https://github.com/shrshi
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#14340
…ai#14411)

Port NVIDIA/nvbench#148 to cudf so that nvbench benchmarks work now that we always use a static version of nvbench.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14411
…apidsai#14390)

Noticed this while trying to clean up `as_column`

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14390
…idsai#14367)

Issue rapidsai#14325

Use uint when reading/writing nano stats because nanoseconds have int32 encoding (different from both unit32 and sint32, _obviously_), which does not use zigzag. 
sint32 uses zigzag, and unit32 does not allow negative numbers, so we can use uint since we'll never have negative nanoseconds.

Also disabled the nanoseconds because it should only be written after ORC-135; we don't write the version so readers get confused if nanoseconds are there. Planning to re-enable once we start writing the version.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#14367
Fixes: rapidsai#14398 
This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#14400
rapidsai#14407 added a dask dependency to custreamz, but it added too tight of a pinning by requiring the exact same version. This is not valid because rapids-dask-dependency won't release a new version corresponding to each new cudf release, so pinning to the exact same version up to the alpha creates an unsatisfiable constraint.

Authors:
   - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
   - Ray Douglass (https://github.com/raydouglass)
   - Bradley Dice (https://github.com/bdice)
   - GALI PREM SAGAR (https://github.com/galipremsagar)
@bdice bdice requested review from a team as code owners November 16, 2023 00:21
@bdice bdice added the improvement Improvement / enhancement to an existing function label Nov 16, 2023
@bdice bdice requested a review from a team as a code owner November 16, 2023 00:21
@bdice bdice added the non-breaking Non-breaking change label Nov 16, 2023
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Nov 16, 2023
@github-actions github-actions bot added CMake CMake build issue conda labels Nov 16, 2023
@raydouglass raydouglass added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 16, 2023
@raydouglass
Copy link
Member

Adding DO NOT MERGE to prevent accidental auto merges

@raydouglass raydouglass merged commit 6b4deeb into rapidsai:branch-24.02 Nov 16, 2023
62 of 65 checks passed
@raydouglass raydouglass removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.