Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v22.10 #11858

Merged
merged 272 commits into from
Oct 12, 2022
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
272 commits
Select commit Hold shift + click to select a range
a36c363
Merge pull request #11414 from rapidsai/branch-22.08
GPUtester Aug 1, 2022
1d4aa4a
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383)
bdice Aug 1, 2022
7e48e78
Merge pull request #11415 from rapidsai/branch-22.08
GPUtester Aug 1, 2022
8709e29
Merge pull request #11416 from rapidsai/branch-22.08
GPUtester Aug 1, 2022
35a7c81
Add Spark list hashing Java tests (#11379)
bdice Aug 1, 2022
f0e3607
Merge pull request #11422 from rapidsai/branch-22.08
GPUtester Aug 1, 2022
df02263
libcudf c++ example updated to CPM version 0.35.3 (#11417)
robertmaynard Aug 1, 2022
f92ba2b
Merge pull request #11423 from rapidsai/branch-22.08
GPUtester Aug 1, 2022
71a5292
Return schema info from JSON reader (#11419)
vuule Aug 1, 2022
797215b
Move cmake to the build section. (#11376)
vyasr Aug 1, 2022
2dc5c3f
Fix read_text when byte_range is aligned with field (#11371)
upsj Aug 2, 2022
0408484
Merge pull request #11432 from rapidsai/branch-22.08
GPUtester Aug 2, 2022
6ddd47c
Merge pull request #11436 from rapidsai/branch-22.08
GPUtester Aug 2, 2022
e099e01
Merge pull request #11442 from rapidsai/branch-22.08
GPUtester Aug 2, 2022
039622f
Quickly error out when trying to build with unsupported nvcc versions…
robertmaynard Aug 2, 2022
276b996
Add in JNI for parsing JSON data and getting the metadata back too. (…
revans2 Aug 3, 2022
b281fdf
Python API for the future experimental JSON reader (#11426)
vuule Aug 3, 2022
f31e5bd
Deprecate unflatten_nested_columns (#11421)
SrikarVanavasam Aug 3, 2022
651c1f9
Merge branch-22.08 into branch-22.10
ajschmidt8 Aug 3, 2022
554fb10
Convert byte_array_view to use std::byte (#11424)
hyperbolic2346 Aug 3, 2022
d86bb39
Add column constructor from device_uvector&& (#11356)
SrikarVanavasam Aug 3, 2022
9429099
Return empty dataframe when reading an ORC file using empty `columns`…
vuule Aug 4, 2022
8b8594c
Convert thrust::optional usages to std::optional (#11455)
robertmaynard Aug 4, 2022
d3244ab
Make CMake hooks verbose (#11456)
vyasr Aug 4, 2022
53a2f15
Fix regex quantifier check to include capture groups (#11373)
davidwendt Aug 4, 2022
217243c
Add missing Thrust #includes. (#11457)
bdice Aug 4, 2022
95e2206
Add groupby `max` aggregation benchmark (#11464)
ttnghia Aug 4, 2022
5700552
Merge pull request #11471 from rapidsai/branch-22.08
GPUtester Aug 4, 2022
d8c25a1
Extract Dremel encoding code from Parquet (#11461)
vyasr Aug 4, 2022
acadcf2
Create main developer guide for Python (#11235)
vyasr Aug 4, 2022
493d96b
Add developer documentation for benchmarking (#11122)
vyasr Aug 5, 2022
2e13e5f
Drop support for `skiprows` and `num_rows` in `cudf.read_parquet` (#1…
galipremsagar Aug 5, 2022
5681656
Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key f…
davidwendt Aug 5, 2022
6fa49c7
column: calculate null_count before release()ing the cudf::column (#1…
wence- Aug 5, 2022
d695129
cuDF error handling document (#7917)
isVoid Aug 6, 2022
e1a4e03
Adds JSON tokenizer (#11264)
elstehle Aug 6, 2022
6b20f2a
Add groupby `nunique` aggregation benchmark (#11472)
ttnghia Aug 8, 2022
099e83c
Unpin `dask` and `distributed` for development (#11492)
galipremsagar Aug 8, 2022
36b5b46
Disable Arrow S3 support by default. (#11470)
bdice Aug 8, 2022
0e29353
Fix a misalignment in `cudf.get_dummies` docstring (#11443)
galipremsagar Aug 8, 2022
6221539
Remove unused is_struct trait. (#11450)
bdice Aug 8, 2022
fea0bda
Move SparkMurmurHash3_32 functor. (#11489)
bdice Aug 9, 2022
4a5531c
Merge pull request #11502 from rapidsai/branch-22.08
GPUtester Aug 9, 2022
11d40a0
Add reduction `distinct_count` benchmark (#11473)
ttnghia Aug 10, 2022
80a2f2b
Update parquet fuzz tests to drop support for `skiprows` & `num_rows`…
galipremsagar Aug 10, 2022
9257549
Upgrade to `arrow-9.x` (#11507)
galipremsagar Aug 10, 2022
0df6178
Update to Thrust 1.17.0 (#11437)
bdice Aug 11, 2022
5628f57
copy_range ballot_syncs to have no execution dependency (#11508)
robertmaynard Aug 11, 2022
95935db
Refactor the `Buffer` class (#11447)
madsbk Aug 11, 2022
de06ed9
Fix cmake error after upgrading to Arrow 9 (#11513)
ttnghia Aug 11, 2022
80bce29
Fix reverse binary operators acting on a host value and cudf.Scalar (…
bdice Aug 11, 2022
e66ed15
Remove deprecated expand parameter from str.findall. (#11030)
bdice Aug 11, 2022
a67b718
Sanitize percentile_approx() output for empty input (#11498)
SrikarVanavasam Aug 11, 2022
87a5e6a
Fix Feather test warning. (#11511)
bdice Aug 11, 2022
d39b957
Remove support for skip_rows / num_rows options in the parquet reader…
nvdbaranec Aug 11, 2022
42b3bb0
Added 'crosstab' and 'pivot_table' features (#11314)
shaswat-indian Aug 11, 2022
2be93fe
Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516)
dependabot[bot] Aug 12, 2022
6035cc2
Adds the end-to-end JSON parser implementation (#11388)
elstehle Aug 12, 2022
819dc2a
Fix invalid results from conditional-left-anti-join in debug build (#…
davidwendt Aug 12, 2022
9c22da5
Add fluent API builder to `data_profile` (#11479)
vuule Aug 12, 2022
e5b92df
Add regex ASCII flag support for matching builtin character classes (…
davidwendt Aug 15, 2022
a221d47
Deprecate `skiprows` and `num_rows` in `read_orc` (#11522)
galipremsagar Aug 15, 2022
dd0ff30
Fixing crash when writing binary nested data in parquet (#11526)
hyperbolic2346 Aug 15, 2022
2e72db1
find_package(cudf) + arrow9 usable with cudf build directory (#11535)
robertmaynard Aug 15, 2022
c19c8c9
Struct support for `NULL_EQUALS` binary operation (#11520)
rwlee Aug 15, 2022
5e31073
Refactor pad_side and strip_type enums into side_type enum (#11438)
davidwendt Aug 16, 2022
8e20721
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493)
robertmaynard Aug 16, 2022
0c4b319
Control Parquet page size through Python API (#11454)
etseidl Aug 16, 2022
63a47d9
Add `create_random_column` function to the data generator (#11490)
vuule Aug 16, 2022
4178a51
Adding optional parquet reader schema (#11524)
hyperbolic2346 Aug 16, 2022
46c5e90
Add hexadecimal value separators (#11527)
bdice Aug 16, 2022
65a7821
Removing unnecessary asserts in parquet tests (#11544)
hyperbolic2346 Aug 16, 2022
abd4302
Use the new JSON parser when the experimental reader is selected (#11…
vuule Aug 17, 2022
e6191da
Reuse MurmurHash3_32 in Parquet page data. (#11528)
bdice Aug 17, 2022
89fa003
JNI support for writing binary columns in parquet (#11556)
revans2 Aug 17, 2022
127d574
Fully support nested types in `cudf::contains` (#10656)
ttnghia Aug 17, 2022
288c81f
Support additional dictionary bit widths in Parquet writer (#11547)
etseidl Aug 18, 2022
be57c5e
Remove unused cpp/img folder (#11554)
davidwendt Aug 18, 2022
7ad1a8b
xfail custreamz display test for now (#11567)
shwina Aug 18, 2022
5ffee5c
Fix JNI for TableWithMeta to use schema_info instead of column_names …
jlowe Aug 19, 2022
f42d117
Fix groupby failures in dask_cudf CI (#11561)
rjzamora Aug 19, 2022
7322070
Fix for: error when assigning a value to an empty series (#11523)
shaswat-indian Aug 19, 2022
be78351
Fix for pivot: error when 'values' is a multicharacter string (#11538)
shaswat-indian Aug 22, 2022
dd1c27a
Add byte_range to multibyte_split benchmark + NVBench refactor (#11562)
upsj Aug 22, 2022
1d48809
Truncate parquet column indexes (#11403)
etseidl Aug 22, 2022
a5dcb32
Handle hyphen as literal for regex cclass when incomplete range (#11557)
davidwendt Aug 22, 2022
8f3cc74
Support nested types in `lists::contains` (#10548)
ttnghia Aug 22, 2022
2c06e51
Add ability to write `list(struct)` columns as `map` type in orc writ…
galipremsagar Aug 23, 2022
e431440
Clean up ORC reader benchmarks with NVBench (#11543)
PointKernel Aug 23, 2022
da6b3ed
Enable more Pydocstyle rules (#11582)
bdice Aug 23, 2022
8b9f203
Correct distribution data type in `quantiles` benchmark (#11584)
vuule Aug 24, 2022
c666d7c
Fix an issue with `to_arrow` when column name type is not a string (#…
galipremsagar Aug 24, 2022
8410fc1
Fix warnings due to compiler regression with `if constexpr` (#11581)
ttnghia Aug 25, 2022
5ee4b3b
Refactor string/numeric conversion utilities (#11545)
davidwendt Aug 25, 2022
38616fe
Enable using upstream jitify2 (#11287)
shwina Aug 25, 2022
c8c8025
Adds support for json lines format to the nested JSON reader (#11534)
elstehle Aug 25, 2022
ae8e1df
Use stream in Java API. (#11601)
bdice Aug 25, 2022
096bbc4
Add casting operators to masked UDFs (#11578)
brandon-b-miller Aug 25, 2022
78692b9
Fix multibyte_split benchmark for host buffers (#11583)
upsj Aug 25, 2022
2e142cb
Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#…
bdice Aug 26, 2022
5f15ed4
Fix compile warning in nested_json_gpu.cu (#11607)
davidwendt Aug 26, 2022
ccd72f2
Add strings 'like' function (#11558)
davidwendt Aug 26, 2022
0af0c59
Merge branch 'branch-22.08' into branch-22.10-merge-22.08
bdice Aug 26, 2022
c562f8f
Remove duplicate header.
bdice Aug 26, 2022
a932c07
Merge pull request #11608 from bdice/branch-22.10-merge-22.08
ajschmidt8 Aug 26, 2022
05a553b
Change default value of `ordered` to `False` in `CategoricalDtype` (#…
galipremsagar Aug 26, 2022
48dc168
Remove deprecated Series.applymap. (#11031)
bdice Aug 26, 2022
4b4f6c8
Fix exception in segmented-reduce benchmark (#11588)
davidwendt Aug 29, 2022
d241458
Add is_timestamp test for leap second (60) (#11594)
davidwendt Aug 29, 2022
ecf4662
Move cudf::strings::findall_record to cudf::strings::findall (#11575)
davidwendt Aug 29, 2022
009e244
Fix incorrect memory resource used in rolling temp columns (#11618)
mythrocks Aug 29, 2022
03f3543
Improve ORC writer benchmark with nvbench (#11598)
PointKernel Aug 29, 2022
c1768bd
changing version of cmake to 3.23.3 (#11619)
hyperbolic2346 Aug 30, 2022
a7b2e0c
Remove use of CUDA driver API calls from libcudf (#11370)
shwina Aug 30, 2022
b4dd2d5
Single-pass `multibyte_split` (#11500)
upsj Aug 30, 2022
2ea96de
Move split_utils.cuh to strings/detail (#11585)
davidwendt Aug 30, 2022
0328e5d
Rework contains_scalar to check nulls at runtime (#11622)
davidwendt Aug 31, 2022
9b3fdaf
Tune multibyte_split kernel (#11587)
upsj Aug 31, 2022
cc15765
Move type-dispatcher calls from traits.hpp to traits.cpp (#11616)
davidwendt Aug 31, 2022
4e45256
Removed converted type for INT32 and INT64 since they do not convert …
hyperbolic2346 Aug 31, 2022
e5c8776
Adds Nested Json benchmark (#11466)
karthikeyann Sep 1, 2022
8ad0290
Fix compile error in benchmark nested_json.cpp (#11637)
davidwendt Sep 1, 2022
7857a30
Fix host scalars construction of nested types (#11612)
galipremsagar Sep 1, 2022
c273da4
Add control of Parquet column index creation to python (#11453)
etseidl Sep 1, 2022
f382403
Generic type casting to support the new nested JSON reader (#11613)
elstehle Sep 2, 2022
dc0d8d1
Update zfill to match Python output (#11634)
davidwendt Sep 2, 2022
488c7ad
Preserve order if necessary when deduping categoricals internally (#1…
brandon-b-miller Sep 2, 2022
a2783ec
Refactor parquet reader benchmarks with nvbench (#11611)
PointKernel Sep 2, 2022
a660060
Fix incorrect `nullCount` in `get_json_object` (#11633)
trxcllnt Sep 2, 2022
2e4d880
Generate unique keys table in java JNI `contiguousSplitGroups` (#11614)
res-life Sep 5, 2022
a444c17
Simplify `hostdevice_vector` (#11631)
upsj Sep 5, 2022
23b8345
fixes overflows in benchmarks (#11649)
elstehle Sep 6, 2022
fb5af4b
Cache cudf.Scalar (#11246)
shwina Sep 6, 2022
1742a4d
Refactor dask_cudf groupby to use apply_concat_apply (#11571)
rjzamora Sep 6, 2022
ba4f715
Fix some libcudf detail calls not passing the stream variable (#11642)
davidwendt Sep 6, 2022
bc7109e
Handle some zero-sized corner cases in dlpack interop (#11449)
wence- Sep 6, 2022
f34ad53
Call set_null_count on a returning column if null-count is known (#11…
davidwendt Sep 7, 2022
c439647
Fix pandoc pinning. (#11658)
bdice Sep 7, 2022
66b5a0c
Refactor strings strip functor to details header (#11635)
davidwendt Sep 7, 2022
e7f04cd
Update git metadata (#11647)
bdice Sep 7, 2022
0684ee1
Fix bug in `device_write()`: it uses an incorrect size (#11651)
madsbk Sep 8, 2022
d3e8f6d
Add support for `group_keys` in `groupby` (#11659)
galipremsagar Sep 8, 2022
37612ee
Revert removal of skip_rows / num_rows options from the Parquet reade…
nvdbaranec Sep 8, 2022
6a97858
Fix regex negated classes to not automatically include new-lines (#11…
davidwendt Sep 8, 2022
d6d8d92
Fix invalid regex quantifier check to not include alternation (#11654)
davidwendt Sep 9, 2022
c8f57dd
Add `gdb` pretty-printers for simple types (#11499)
upsj Sep 9, 2022
6cace8e
Fix multi-file remote datasource bug (#11655)
rjzamora Sep 9, 2022
f485667
Update to mypy 0.971 (#11640)
wence- Sep 9, 2022
9f8db66
Maintain the index name after `.loc` (#11677)
shwina Sep 10, 2022
9f9a55d
List lexicographic comparator (#11129)
devavret Sep 12, 2022
44d4e31
Ignore protobuf generated files in `mypy` checks (#11685)
galipremsagar Sep 12, 2022
866434f
Fix issue with extracting nested column data & dtype preservation (#1…
galipremsagar Sep 12, 2022
dca285b
Check conda recipe headers with pre-commit (#11669)
bdice Sep 12, 2022
39ad65f
Remove redundant style check for clang-format. (#11668)
bdice Sep 12, 2022
578e65f
Enable ZSTD compression in ORC and Parquet writers (#11551)
vuule Sep 12, 2022
d6952ba
Publish C++ developer docs (#11475)
vyasr Sep 12, 2022
4681bdc
Ensure that all cudf tests and benchmarks are conda env aware (#11666)
robertmaynard Sep 12, 2022
e99e069
Fix an issue related to `Multindex` when `group_keys=True` (#11689)
galipremsagar Sep 12, 2022
7b0d597
Fix encode/decode of negative timestamps in ORC reader/writer (#11586)
vuule Sep 13, 2022
7e86a1b
Default to Snappy compression in `to_orc` when using cuDF or Dask (#1…
vuule Sep 13, 2022
69cb31d
Support DECIMAL order-by for RANGE window functions (#11645)
mythrocks Sep 13, 2022
0032a7c
Fix compile error due to missing header (#11697)
ttnghia Sep 13, 2022
18bfbe7
Fix `DataFrame.from_arrow` to preserve type metadata (#11698)
galipremsagar Sep 14, 2022
d1d879e
Drop split_out=None test from groupby.agg (#11704)
wence- Sep 14, 2022
66f6960
Modify ORC reader timestamp parsing to match the apache reader behavi…
vuule Sep 14, 2022
9c3afc3
Special-case multibyte_split for single-byte delimiter (#11681)
upsj Sep 14, 2022
75d126a
Add generic type inference for cuIO (#11121)
PointKernel Sep 15, 2022
972708a
Include decimal in supported types for range window order-by columns …
mythrocks Sep 19, 2022
cccf191
Add missing copyright headers. (#11712)
bdice Sep 19, 2022
68746ae
Fix get_thrust.cmake format at patch command (#11715)
davidwendt Sep 19, 2022
482d8ed
Remove isort exclusions (#11680)
bdice Sep 19, 2022
b2ffea7
Transfer correct dtype to exploded column (#11687)
wence- Sep 19, 2022
87f56e8
Update to Thrust 1.17.2 to fix cub ODR issues (#11665)
robertmaynard Sep 19, 2022
bf2c751
Adds GPU implementation of JSON-token-stream to JSON-tree (#11518)
karthikeyann Sep 19, 2022
0ba4675
Adds type inference and type conversion for leaf-columns to the neste…
elstehle Sep 20, 2022
d10406f
Add regex capture-group parameter to auto convert to non-capture grou…
davidwendt Sep 20, 2022
0528b38
Add read-only functions on string dtypes to `DataFrame.apply` and `Se…
brandon-b-miller Sep 20, 2022
387c5ff
Upgrade `pandas` to `1.5` (#11617)
galipremsagar Sep 21, 2022
5c91739
Don't assume stream is a compile-time constant expression (#11725)
vyasr Sep 21, 2022
02d5e83
Add a `__dataframe__` method to the protocol dataframe object (#11692)
rgommers Sep 21, 2022
a91853d
Adds option to take explicit nested schema for nested JSON reader (#1…
elstehle Sep 22, 2022
d4f46fc
Refactor CSV reader benchmarks with nvbench (#11678)
PointKernel Sep 22, 2022
46bd87a
Refactor parquet writer benchmarks with nvbench (#11623)
PointKernel Sep 22, 2022
f227d7d
Fix ORC string sum statistics (#11740)
vuule Sep 22, 2022
5430fbd
Update libcudf documentation build command in DOCUMENTATION.md (#11735)
davidwendt Sep 22, 2022
9363095
Disable very large column gtest for contiguous-split (#11706)
davidwendt Sep 22, 2022
25e5e17
Add `strings_udf` package for python 3.9 (#11730)
brandon-b-miller Sep 22, 2022
204a09c
Reduces memory requirements in JSON parser and adds bytes/s and peak …
elstehle Sep 22, 2022
451c837
Add ability to construct `ListColumn` when size is `None` (#11745)
galipremsagar Sep 23, 2022
d9f2c2b
Resolve dask_cudf failures caused by upstream groupby changes (#11755)
rjzamora Sep 23, 2022
9a5f39a
Ensure that all tests launch kernels on cudf's default stream (#11726)
vyasr Sep 23, 2022
006b254
JSON tree traversal (#11610)
karthikeyann Sep 24, 2022
6131bd6
Add hasNull statistic reading ability to ORC (#11747)
devavret Sep 26, 2022
fb703b1
Pass `dtype` param to avoid `pd.Series` warnings (#11761)
galipremsagar Sep 26, 2022
cd60462
Ensure all libcudf APIs run on cudf's default stream (#11759)
vyasr Sep 26, 2022
2952a04
Fix cudf::lists::sort_lists for NaN and Infinity values (#11703)
davidwendt Sep 26, 2022
afbff54
Add full 24-bit dictionary support to Parquet writer (#11580)
etseidl Sep 26, 2022
7a35ed9
Update strings udf version updater script (#11772)
galipremsagar Sep 26, 2022
a945377
Add doc section for `list` & `struct` handling (#11770)
galipremsagar Sep 26, 2022
11156cc
Fix issue with set-item incase of `list` and `struct` types (#11760)
galipremsagar Sep 26, 2022
e64c2da
Fix return type of `Index.isna` & `Index.notna` (#11769)
galipremsagar Sep 26, 2022
d24bce5
Remove `kwargs` in `read_csv` & `to_csv` (#11762)
galipremsagar Sep 27, 2022
0a430fa
Fix `cudf::partition*` APIs that do not return offsets for empty outp…
ttnghia Sep 27, 2022
c5d555a
JSON Column creation in GPU (#11714)
karthikeyann Sep 27, 2022
831ef04
Add BGZIP `data_chunk_reader` (#11652)
upsj Sep 27, 2022
35b0a52
Enable `schema_element` & `keep_quotes` support in json reader (#11746)
galipremsagar Sep 27, 2022
a270ae6
Add `istitle` to string UDFs (#11738)
brandon-b-miller Sep 27, 2022
466a90d
Document that minimum required CMake version is now 3.23.1 (#11751)
robertmaynard Sep 27, 2022
d8feede
Reduce code duplication for `dask` & `distributed` nightly/stable ins…
galipremsagar Sep 27, 2022
4005a7f
Build `cudf` locally before building `strings_udf` conda packages in …
brandon-b-miller Sep 27, 2022
bcf361f
Expose "explicit-comms" option in shuffle-based dask_cudf functions (…
rjzamora Sep 27, 2022
1003e33
Add docs for use of string data to `DataFrame.apply` and `Series.appl…
brandon-b-miller Sep 27, 2022
5a416a0
Fix an issue in cudf::row_bit_count involving structs and lists at mu…
nvdbaranec Sep 27, 2022
da04725
Fix regex out-of-bounds write in strided rows logic (#11797)
davidwendt Sep 28, 2022
9e9ba6e
Revert problematic shuffle=explicit-comms changes (#11803)
rjzamora Sep 28, 2022
2e4acbb
move strings_udf package build to the same place as where other pytho…
brandon-b-miller Sep 28, 2022
5a4afec
Support shuffle-based groupby aggregations in dask_cudf (#11800)
rjzamora Sep 28, 2022
ddfd07f
Fix operator `NotImplemented` issue with `numpy` (#11816)
galipremsagar Sep 29, 2022
3731b4c
Fix `is_valid` checks in `Scalar._binaryop` (#11818)
wence- Sep 29, 2022
4023b65
Add examples for Nested JSON reader (#11814)
Sep 29, 2022
628c857
Disable nvCOMP DEFLATE integration (#11811)
vuule Sep 29, 2022
5fad289
Fix copyright check issues in pre-commit (#11711)
bdice Sep 29, 2022
2041caa
Use CubinLinker for CUDA Minor Version Compatibility (#11701)
gmarkall Sep 29, 2022
920b58f
Update docstring for cudf.read_text (#11799)
Sep 29, 2022
3f9b3fe
Fix bug in new shuffle-based groupby implementation (#11836)
rjzamora Sep 30, 2022
597e325
solve multiple issues
brandon-b-miller Oct 3, 2022
299ac31
cleanup
brandon-b-miller Oct 3, 2022
1f58e17
style
brandon-b-miller Oct 3, 2022
9896770
missed change
brandon-b-miller Oct 3, 2022
d9ddd83
Merge pull request #11846 from brandon-b-miller/reset-strings-udf-cec
jolorunyomi Oct 3, 2022
dfd3d89
Pin `dask` and `distributed` for release (#11822)
galipremsagar Oct 3, 2022
78aa210
update notebook
brandon-b-miller Oct 4, 2022
ff7dfff
address reviews
brandon-b-miller Oct 4, 2022
cd11a00
handle ptx file paths
galipremsagar Oct 4, 2022
37a6f1a
fix parser
galipremsagar Oct 4, 2022
0efcff6
Apply suggestions from code review
galipremsagar Oct 4, 2022
17281d8
Apply suggestions from code review
galipremsagar Oct 4, 2022
72d44b1
Update python/strings_udf/strings_udf/__init__.py
galipremsagar Oct 4, 2022
8d0fef1
change logic to address reviews
galipremsagar Oct 5, 2022
483c3ad
pic suffix_a only when it matches with device compute capability
galipremsagar Oct 5, 2022
9eef5fa
change from list to var
galipremsagar Oct 5, 2022
d84824a
change from list to var
galipremsagar Oct 5, 2022
3011d96
fix cmake
galipremsagar Oct 5, 2022
a9e7471
Merge pull request #11861 from brandon-b-miller/doc-string-udf-notebooks
jolorunyomi Oct 5, 2022
59a3152
Merge pull request #11862 from galipremsagar/fix_ptx_file_path
jolorunyomi Oct 5, 2022
4c4bce9
Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856)
vuule Oct 7, 2022
22beba9
Fixes bug in temporary decompression space estimation before calling …
abellina Oct 7, 2022
61e2499
Apply codestyle changes
abellina Oct 7, 2022
17868b7
Merge pull request #11879 from abellina/fix_read_parquet_ztd_temp_size
jolorunyomi Oct 8, 2022
f817d96
update changelog
raydouglass Oct 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions cpp/.clang-format → .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ AlignTrailingComments: true
AllowAllArgumentsOnNextLine: true
AllowAllConstructorInitializersOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: true
AllowShortBlocksOnASingleLine: true
AllowShortCaseLabelsOnASingleLine: true
AllowShortEnumsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
Expand All @@ -27,7 +27,7 @@ AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: Yes
BinPackArguments: false
BinPackArguments: false
BinPackParameters: false
BraceWrapping:
AfterClass: false
Expand Down
5 changes: 4 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
python/cudf/cudf/_version.py export-subst
CHANGELOG.md merge=union
python/strings_udf/strings_udf/_version.py export-subst
python/cudf_kafka/cudf_kafka/_version.py export-subst
python/custreamz/custreamz/_version.py export-subst
python/dask_cudf/dask_cudf/_version.py export-subst
10 changes: 6 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,19 @@ cudf.egg-info/
python/build
python/*/build
python/cudf/cudf-coverage.xml
python/cudf/*/_lib/**/*\.cpp
python/cudf/*/_lib/**/*.cpp
python/cudf/*/_lib/**/*.h
python/cudf/*/_lib/.nfs*
python/cudf/*/_cuda/*\.cpp
python/cudf/*/_cuda/*.cpp
python/cudf/*.ipynb
python/cudf/.ipynb_checkpoints
python/*/record.txt
python/cudf_kafka/*/_lib/**/*\.cpp
python/cudf_kafka/*/_lib/**/*.cpp
python/cudf_kafka/*/_lib/**/*.h
python/custreamz/*/_lib/**/*\.cpp
python/custreamz/*/_lib/**/*.cpp
python/custreamz/*/_lib/**/*.h
python/strings_udf/strings_udf/_lib/*.cpp
python/strings_udf/strings_udf/*.ptx
.Python
env/
develop-eggs/
Expand Down
38 changes: 28 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ repos:
# project can specify its own first/third-party packages.
args: ["--config-root=python/", "--resolve-all-configs"]
files: python/.*
exclude: (__init__.py|setup.py)$
types_or: [python, cython, pyi]
- repo: https://github.com/psf/black
rev: 22.3.0
Expand All @@ -26,10 +25,15 @@ repos:
files: python/.*\.(py|pyx|pxd)$
types: [file]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v0.782'
rev: 'v0.971'
hooks:
- id: mypy
args: ["--config-file=setup.cfg", "python/cudf/cudf", "python/dask_cudf/dask_cudf", "python/custreamz/custreamz", "python/cudf_kafka/cudf_kafka"]
additional_dependencies: [types-cachetools]
args: ["--config-file=setup.cfg",
"python/cudf/cudf",
"python/custreamz/custreamz",
"python/cudf_kafka/cudf_kafka",
"python/dask_cudf/dask_cudf"]
pass_filenames: false
- repo: https://github.com/PyCQA/pydocstyle
rev: 6.1.1
Expand All @@ -40,9 +44,8 @@ repos:
rev: v11.1.0
hooks:
- id: clang-format
files: \.(cu|cuh|h|hpp|cpp|inl)$
types_or: [file]
args: ['-fallback-style=none', '-style=file', '-i']
types_or: [c, c++, cuda]
args: ["-fallback-style=none", "-style=file", "-i"]
- repo: local
hooks:
- id: no-deprecationwarning
Expand All @@ -60,6 +63,8 @@ repos:
# of dependencies, so we'll have to update this manually.
additional_dependencies:
- cmakelang==0.6.13
verbose: true
require_serial: true
- id: cmake-lint
name: cmake-lint
entry: ./cpp/scripts/run-cmake-format.sh cmake-lint
Expand All @@ -69,13 +74,14 @@ repos:
# of dependencies, so we'll have to update this manually.
additional_dependencies:
- cmakelang==0.6.13
verbose: true
require_serial: true
- id: copyright-check
name: copyright-check
# This hook's use of Git tools appears to conflict with
# existing CI invocations so we don't invoke it during CI runs.
stages: [commit]
entry: python ./ci/checks/copyright.py --git-modified-only
entry: python ./ci/checks/copyright.py --git-modified-only --update-current-year
language: python
pass_filenames: false
additional_dependencies: [gitpython]
- id: doxygen-check
name: doxygen-check
entry: ./ci/checks/doxygen.sh
Expand All @@ -84,6 +90,18 @@ repos:
language: system
pass_filenames: false
verbose: true
- id: headers-recipe-check
name: headers-recipe-check
entry: ./ci/checks/headers_test.sh
files: |
(?x)^(
^cpp/include/|
^conda/.*/meta.yaml
)
types_or: [file]
language: system
pass_filenames: false
verbose: false

default_language_version:
python: python3
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# cuDF 22.10.00 (Date TBD)

Please see https://github.com/rapidsai/cudf/releases/tag/v22.10.00a for the latest changes to this development branch.

# cuDF 22.08.00 (17 Aug 2022)

## 🚨 Breaking Changes
Expand Down
9 changes: 8 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Compilers:

* `gcc` version 9.3+
* `nvcc` version 11.5+
* `cmake` version 3.20.1+
* `cmake` version 3.23.1+

CUDA/GPU:

Expand Down Expand Up @@ -380,6 +380,13 @@ Now code linters and formatters will be run each time you commit changes.

You can skip these checks with `git commit --no-verify` or with the short version `git commit -n`.

## Developer Guidelines

The [C++ Developer Guide](cpp/docs/DEVELOPER_GUIDE.md) includes details on contributing to libcudf C++ code.

The [Python Developer Guide](https://docs.rapids.ai/api/cudf/stable/developer_guide/index.html) includes details on contributing to cuDF Python code.


## Attribution

Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
Expand Down
13 changes: 11 additions & 2 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libcudf cudf cudfjar dask_cudf benchmarks tests libcudf_kafka cudf_kafka custreamz -v -g -n -l --allgpuarch --disable_nvtx --opensource_nvcomp --show_depr_warn --ptds -h --build_metrics --incl_cache_stats"
VALIDARGS="clean libcudf cudf cudfjar dask_cudf benchmarks tests libcudf_kafka cudf_kafka custreamz strings_udf -v -g -n -l --allgpuarch --disable_nvtx --opensource_nvcomp --show_depr_warn --ptds -h --build_metrics --incl_cache_stats"
HELP="$0 [clean] [libcudf] [cudf] [cudfjar] [dask_cudf] [benchmarks] [tests] [libcudf_kafka] [cudf_kafka] [custreamz] [-v] [-g] [-n] [-h] [--cmake-args=\\\"<args>\\\"]
clean - remove all existing build artifacts and configuration (start
over)
Expand Down Expand Up @@ -329,7 +329,16 @@ fi
if buildAll || hasArg cudf; then

cd ${REPODIR}/python/cudf
python setup.py build_ext --inplace -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_LIBRARY_PATH=${LIBCUDF_BUILD_DIR} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
python setup.py build_ext --inplace -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_LIBRARY_PATH=${LIBCUDF_BUILD_DIR} -DCMAKE_CUDA_ARCHITECTURES=${CUDF_CMAKE_CUDA_ARCHITECTURES} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
if [[ ${INSTALL_TARGET} != "" ]]; then
python setup.py install --single-version-externally-managed --record=record.txt -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_LIBRARY_PATH=${LIBCUDF_BUILD_DIR} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
fi
fi

if buildAll || hasArg strings_udf; then

cd ${REPODIR}/python/strings_udf
python setup.py build_ext --inplace -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_LIBRARY_PATH=${LIBCUDF_BUILD_DIR} -DCMAKE_CUDA_ARCHITECTURES=${CUDF_CMAKE_CUDA_ARCHITECTURES} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
if [[ ${INSTALL_TARGET} != "" ]]; then
python setup.py install --single-version-externally-managed --record=record.txt -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_LIBRARY_PATH=${LIBCUDF_BUILD_DIR} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
fi
Expand Down
7 changes: 5 additions & 2 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ export LIBCUDF_KERNEL_CACHE_PATH="$HOME/.jitify-cache"
# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0

# Dask version to install when `INSTALL_DASK_MAIN=0`
export DASK_STABLE_VERSION="2022.9.2"

function remove_libcudf_kernel_cache_dir {
EXITCODE=$?
logger "removing kernel cache dir: $LIBCUDF_KERNEL_CACHE_PATH"
Expand Down Expand Up @@ -82,8 +85,8 @@ if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
gpuci_logger "gpuci_mamba_retry update dask"
gpuci_mamba_retry update dask
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.7.1 conda-forge::distributed==2022.7.1 conda-forge::dask-core==2022.7.1 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.7.1 conda-forge::distributed==2022.7.1 conda-forge::dask-core==2022.7.1 --force-reinstall
gpuci_logger "gpuci_mamba_retry install conda-forge::dask=={$DASK_STABLE_VERSION} conda-forge::distributed=={$DASK_STABLE_VERSION} conda-forge::dask-core=={$DASK_STABLE_VERSION} --force-reinstall"
gpuci_mamba_retry install conda-forge::dask=={$DASK_STABLE_VERSION} conda-forge::distributed=={$DASK_STABLE_VERSION} conda-forge::dask-core=={$DASK_STABLE_VERSION} --force-reinstall
fi

# Install the master version of streamz
Expand Down
Loading