Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v0.17 #6935

Merged
merged 344 commits into from
Dec 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
344 commits
Select commit Hold shift + click to select a range
0fa8e29
Use correct stream in hash_join.
jrhemstad Oct 27, 2020
ecc8193
changelog.
jrhemstad Oct 27, 2020
3644dbb
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Oct 27, 2020
e620a73
Fix memory usage calculation (#6596)
galipremsagar Oct 27, 2020
e94ed01
Add function to create hashed vocabulary file from raw vocabulary (#6…
VibhuJawa Oct 27, 2020
b0b389e
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Oct 27, 2020
c40e27a
Merge pull request #6603 from jrhemstad/fix-hash-join-stream
jrhemstad Oct 27, 2020
1ddd81d
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Oct 28, 2020
bdba041
Update JNI to new RMM cuda_stream_view API (#6612)
jlowe Oct 28, 2020
13b06a0
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Oct 28, 2020
d3f0fb6
int96 changes
hyperbolic2346 Oct 28, 2020
8b7730e
Fix JNI native dependency load order (#6617)
jlowe Oct 28, 2020
5f35809
Improve subword tokenizer docs (#6608)
VibhuJawa Oct 28, 2020
6b6f0cc
Add dictionary support to cudf::unary_operation (#6540)
davidwendt Oct 29, 2020
64bab8f
Add missing device_scalar stream parameters. (#6582)
harrism Oct 29, 2020
a81d07b
Add in java column to row conversion (#6578)
revans2 Oct 29, 2020
35f9b23
Add strings::contains API with target column parameter (#6598)
davidwendt Oct 29, 2020
851c881
Add support for conversion to Pandas nullable dtypes and fix related …
galipremsagar Oct 29, 2020
ea0b5d2
Fix integer overflow in ORC encoder (#6607)
vuule Oct 29, 2020
b0b3ad6
Updating to write INT96 type instead of INTERVAL
hyperbolic2346 Oct 30, 2020
4c6cbb1
linting
hyperbolic2346 Oct 30, 2020
b504d7a
Adding some documentation
hyperbolic2346 Oct 30, 2020
b04eb98
Adding changelog
hyperbolic2346 Oct 30, 2020
321c896
Merge branch 'branch-0.17' into mwilson/int96
vuule Oct 30, 2020
06cb559
Revert bad CMake changes for JNI (#6629)
revans2 Oct 30, 2020
8aef966
Add operator overloading to column and clean up error messages (#6623)
galipremsagar Oct 30, 2020
275d462
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Oct 30, 2020
b2f875d
Add AVRO fuzz tests with varying function parameters (#6489)
galipremsagar Oct 30, 2020
2452776
Fix timezone offset when reading ORC files (#6601)
vuule Oct 30, 2020
f7e6270
support `cudf.to_numeric` (#6592)
isVoid Oct 30, 2020
1d72e0f
Fix Java HostColumnVector unnecessarily loading native dependencies (…
jlowe Oct 30, 2020
54f3c0e
Update scatter APIs to use reference wrapper / const scalar (#6579)
brandon-b-miller Oct 30, 2020
679a074
Fix ORC boolean column corruption issue (#6636)
rgsl888prabhu Nov 2, 2020
2df2f3a
Fix subword tokenizer metadata for token count equal to max_sequence_…
davidwendt Nov 2, 2020
2942e6f
Add error message for unsupported `axis` parameter in DataFrame APIs …
galipremsagar Nov 2, 2020
e01ab96
Update `to_pandas` api docs (#6622)
galipremsagar Nov 2, 2020
6dfc6a3
small changes
rgsl888prabhu Nov 2, 2020
cdde9e4
Merge branch 'branch-0.17' of https://github.com/rapidsai/cudf into 6…
rgsl888prabhu Nov 2, 2020
ac351d2
Add support for `pipe` API (#6638)
galipremsagar Nov 2, 2020
b678dc2
review changes
rgsl888prabhu Nov 2, 2020
1912637
Add dictionary support to libcudf join APIs (#6556)
davidwendt Nov 2, 2020
ff022e6
review changes
rgsl888prabhu Nov 3, 2020
7bea264
Implement `cudf::round` floating point and integer types (`HALF_UP`) …
codereport Nov 3, 2020
86cdb58
Add ability to set scalar values in `cudf.DataFrame` (#6610)
galipremsagar Nov 3, 2020
094c325
Pin cmake policies to cmake 3.17 version (#6545)
Nov 3, 2020
06e74ef
review changes
rgsl888prabhu Nov 3, 2020
21a0a33
Replaced SHFL_XOR calls with cub::WarpReduce (#6653)
kaatish Nov 4, 2020
5cf7106
Add cudf::test::dictionary_column_wrapper class (#6635)
davidwendt Nov 4, 2020
494b8aa
Fix the bug where if a PTX instruction contains "st.param" the whole …
hummingtree Nov 4, 2020
c643818
Add the python test for the originally failing python lambda.
hummingtree Nov 4, 2020
b769bfd
Changelog.
hummingtree Nov 4, 2020
3f5e3d9
Fix the added pytest so it actually tests what we want to test ...
hummingtree Nov 4, 2020
13a38c7
enable decimal type in HostColumnVector
sperlingxx Oct 27, 2020
ed9425b
fix typo
sperlingxx Oct 28, 2020
31551c8
add changelog line
sperlingxx Oct 28, 2020
e57b3d7
enable decimalType within nestedTypes
sperlingxx Oct 28, 2020
eae0a10
Update java/src/test/java/ai/rapids/cudf/DecimalColumnVectorTest.java
sperlingxx Oct 29, 2020
c79c3d1
address comments
sperlingxx Oct 29, 2020
83a0d89
address comments
sperlingxx Oct 29, 2020
0413e17
Update java/src/test/java/ai/rapids/cudf/DecimalColumnVectorTest.java
sperlingxx Oct 30, 2020
6452679
a lot of refinement
sperlingxx Oct 30, 2020
f56c4d9
some refinement
sperlingxx Oct 30, 2020
a8e9c83
addressed comments
sperlingxx Nov 2, 2020
69786ed
add decimalFromDoubles
sperlingxx Nov 2, 2020
5951de8
refine doc for cv.fromDecimals
sperlingxx Nov 2, 2020
5cdda6f
refine
sperlingxx Nov 3, 2020
06aba54
fix doc
sperlingxx Nov 3, 2020
29cf5d6
refine
sperlingxx Nov 4, 2020
f6b8e02
refine
sperlingxx Nov 5, 2020
8015692
INT96 changes for Parquet writer
razajafri Nov 5, 2020
a98eac8
updated changelog
razajafri Nov 5, 2020
ada1a0c
Support fixed-point decimal for ColumnVector
sperlingxx Nov 5, 2020
b4e651e
[REVIEW] Fix csv writer handling embedded comma delimiter (#6643)
davidwendt Nov 5, 2020
2cf893c
[REVIEW] Implement `cudf::round` floating point and integer types (`H…
codereport Nov 5, 2020
1556a49
[REVIEW] Disallow `fixed_point` `cudf::concatenate` with different sc…
codereport Nov 5, 2020
65cb685
Make the explanation for st.param.*** clearer.
hummingtree Nov 5, 2020
c1baf12
review changes
rgsl888prabhu Nov 5, 2020
a9d9329
Reading ORC statistics (#6142)
calebwin Nov 5, 2020
77351ef
changes
rgsl888prabhu Nov 5, 2020
1f8f262
Add cudf::dictionary::make_dictionary_pair_iterator (#6651)
davidwendt Nov 5, 2020
f962a5d
Fix issue where index name of caller object is being modified in csv …
galipremsagar Nov 6, 2020
390c34f
[REVIEW] Fix integer parsing in CSV and JSON for values outside of in…
kaatish Nov 6, 2020
7f9578e
Updating based on review comments
hyperbolic2346 Nov 6, 2020
2fb54db
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Nov 6, 2020
a268a9c
Updating comment and fixing an issue spotted in diff
hyperbolic2346 Nov 6, 2020
45fe6c9
Correct use of CUDA_ARCH.
jrhemstad Nov 6, 2020
1e1c323
Add tests for release_assert.
jrhemstad Nov 6, 2020
be9e335
Cover different CSV reader/writer options in benchmarks (#6644)
vuule Nov 6, 2020
faada81
Parameterize avro and json benchmark (#6673)
rgsl888prabhu Nov 6, 2020
8159c99
Update to use custom main() to avoid RMM interfering with death test.
jrhemstad Nov 6, 2020
5fba8b9
Update death test logic.
jrhemstad Nov 6, 2020
24f725d
format.
jrhemstad Nov 6, 2020
a4a6ef7
Doc.
jrhemstad Nov 6, 2020
762b2df
changelog.
jrhemstad Nov 6, 2020
b1a3de2
Update error_handling_test.cu
jrhemstad Nov 6, 2020
d6d83af
Inadvertent change revert
hyperbolic2346 Nov 6, 2020
bfb1b88
Fix issue related to `na_values` input in `read_csv` (#6693)
galipremsagar Nov 6, 2020
2ef56ed
Merge branch 'branch-0.17' into fix-release-assert
jrhemstad Nov 6, 2020
bc43878
Fix handling of empty column name in csv writer (#6692)
galipremsagar Nov 7, 2020
aac682b
Explicitly set legacy or per-thread default stream in JNI (#6690)
rongou Nov 7, 2020
1c31f0e
Add DecimalDtype to cuDF (#6675)
codereport Nov 7, 2020
3d44ed5
Implement `cudf::round` `decimal32` & `decimal64` (`HALF_UP` and `HAL…
codereport Nov 8, 2020
7223835
Merge branch 'branch-0.17' into hotfix/python_lambda_fail
harrism Nov 8, 2020
ce848ee
Merge pull request #6670 from hummingtree/hotfix/python_lambda_fail
hummingtree Nov 9, 2020
24dcc8e
Fix leak warnings in JNI unit tests (#6704)
jlowe Nov 9, 2020
ca8f998
Project Flash script changes
raydouglass Nov 9, 2020
0485085
Raise informative error while converting a pandas dataframe with dupl…
galipremsagar Nov 9, 2020
8dd9323
Fix issue when `numpy.str_` is given as input to string parameters in…
galipremsagar Nov 9, 2020
940f1a4
Add call to cudaStreamSynchronize() in ::get_value()
nvdbaranec Nov 9, 2020
73fb0e2
Changelog for 6713
nvdbaranec Nov 9, 2020
84bf578
format.
jrhemstad Nov 9, 2020
4db8ed9
Merge branch 'branch-0.17' into fix-release-assert
jrhemstad Nov 9, 2020
3bd593d
Apply `na_rep` to column names in csv writer (#6708)
galipremsagar Nov 9, 2020
e4404e4
[REVIEW] Fix an out-of-bounds indexing error in gather() for LIST typ…
nvdbaranec Nov 9, 2020
6042739
Merge branch 'branch-0.17' into get_value_fix
harrism Nov 10, 2020
96ee051
Add nested type support to Java table serialization (#6705)
jlowe Nov 10, 2020
88821fb
Handle index=False in dask_cudf.read_parquet (#6722)
rjzamora Nov 10, 2020
75b1880
Add a comment to get_value indicating that it synchronizes the stream.
nvdbaranec Nov 10, 2020
8be124d
Merge branch 'branch-0.17' into get_value_fix
nvdbaranec Nov 10, 2020
b85c52b
Merge branch 'get_value_fix' of github.com:nvdbaranec/cudf into get_v…
nvdbaranec Nov 10, 2020
08254af
FIX Use artifact conda channel
raydouglass Nov 10, 2020
7a493b9
Implement `cudf::cast` for `decimal32/64` to/from integer and floatin…
codereport Nov 10, 2020
52bb044
Updating per review comments to computer julian calendar epoch differ…
hyperbolic2346 Nov 10, 2020
e7b69d1
Add serialization methods for ListColumn (#6721)
rjzamora Nov 10, 2020
e89bb41
review changes
rgsl888prabhu Nov 10, 2020
ef61bef
remove headers
rgsl888prabhu Nov 10, 2020
a7e975b
Merge pull request #6696 from jrhemstad/fix-release-assert
jrhemstad Nov 10, 2020
d356040
Ensure CONDA_PREFIX is on the LD_LIBRARY_PATH
raydouglass Nov 10, 2020
f6cf409
Fix cuDF benchmarks build with static Arrow lib and fix rapids-compos…
harrism Nov 11, 2020
e734d5b
Remove reinterpret_cast conversions between pointer types in Avro (#6…
vuule Nov 11, 2020
b2d281b
Add dictionary support to cudf::quantile (#6676)
davidwendt Nov 11, 2020
aaba250
Remove 2nd type-dispatcher call from cudf::reduce for simple operatio…
davidwendt Nov 11, 2020
52979d1
Ensure CONDA_PREFIX is on the LD_LIBRARY_PATH
raydouglass Nov 11, 2020
8502ccf
CLN Remove specific gpu arch spec
raydouglass Nov 11, 2020
e82294c
Grammar change in documentation
nvdbaranec Nov 11, 2020
41eeff7
Adding map method for series (#6459)
marlenezw Nov 11, 2020
fae4769
DOC Update changelog
raydouglass Nov 11, 2020
00e073b
Fix implementation of `dtype` parameter in `cudf.read_csv` (#6720)
galipremsagar Nov 11, 2020
666bfae
Add Java bindings for is_timestamp (#6739)
andygrove Nov 12, 2020
73c3e1d
Fix concat bug in dask_cudf Series/Index creation (#6742)
rjzamora Nov 12, 2020
95b7edb
Merge remote-tracking branch 'origin/branch-0.17' into project-flash
raydouglass Nov 12, 2020
899ca0e
Add Java API to concatenate serialized tables to ContiguousTable (#6748)
jlowe Nov 12, 2020
974c4c4
Update nested JNI builder so we can do it incrementally. (#6749)
revans2 Nov 12, 2020
89c6cdd
Fix cudf python docs and associated build warnings (#6728)
galipremsagar Nov 12, 2020
fbf12f3
Add ORC fuzz tests with varying function parameters (#6571)
galipremsagar Nov 12, 2020
627940d
[REVIEW] Fix orc read corruption on boolean column (#6702)
rgsl888prabhu Nov 12, 2020
f323721
`RangeIndex` support for step parameter (#6662)
isVoid Nov 12, 2020
3fc8142
Remove macros from ORC reader and writer (#6698)
kaatish Nov 12, 2020
dea44d9
Replace raw streams with rmm::cuda_stream_view (part 1) (#6646)
harrism Nov 13, 2020
400d9a7
Refactoring cooperative loading with single thread loading (#6559)
vuule Nov 13, 2020
fb789bf
Merge pull request #6713 from nvdbaranec/get_value_fix
jrhemstad Nov 13, 2020
1d5eec6
cuDF python scalars (#6297)
brandon-b-miller Nov 13, 2020
bd564a0
Fix DataFrame initialization from list of dicts (#6632)
brandon-b-miller Nov 13, 2020
4fa2391
Binary operations support for decimal type in cudf Java (#6734)
nartal1 Nov 13, 2020
158cb6b
Add Java/JNI bindings for round (#6761)
nartal1 Nov 13, 2020
fcf2eee
Fix sort order of parameters in `test_scalar_invalid_implicit_convers…
galipremsagar Nov 14, 2020
0683df9
Filtering ORC (#6116)
calebwin Nov 15, 2020
21b28ba
Struct column support for cudf::concatenate (#6652)
nvdbaranec Nov 16, 2020
aebb4dd
Merge branch 'branch-0.17' into project-flash
raydouglass Nov 16, 2020
7220598
Implement `cudf::cast` for `decimal32/64` to/from different `type_id`…
codereport Nov 16, 2020
bee3229
Fix hash join hash values mapping to reserved empty value (#6735)
jrhemstad Nov 17, 2020
42fe218
Support creating decimal vectors from scalar (#6723)
sperlingxx Nov 17, 2020
cf4b838
Merge remote-tracking branch 'upstream/branch-0.17' into mwilson/int96
hyperbolic2346 Nov 17, 2020
a7b4f1e
Move `cudf::cast` tests to separate test file (#6780)
codereport Nov 17, 2020
69203f1
Merge pull request #6737 from raydouglass/project-flash
raydouglass Nov 17, 2020
8ae857c
Update java reduction APIs to reflect C++ changes [skip ci] (#6787)
revans2 Nov 17, 2020
01b8b5c
Add nested type support to ColumnVector#getDeviceMemorySize (#6786)
jlowe Nov 17, 2020
68dafdd
Adding docstring in to ioutils.py
hyperbolic2346 Nov 17, 2020
546b9c3
Support building decimal columns with Table.TestBuilder (#6770)
sperlingxx Nov 18, 2020
3827052
Cover different ORC and Parquet reader/writer options in benchmarks (…
vuule Nov 18, 2020
3bf5606
Use `void` return type for kernel wrapper functions instead of return…
vuule Nov 18, 2020
f093627
Implement `cudf::unary_operation` for `decimal32` & `decimal64` (#6777)
codereport Nov 18, 2020
a7fec22
Rework ColumnViewAccess and its usage (#6751)
Nov 18, 2020
4c1ed29
Fix race conditions in parquet (#6766)
rgsl888prabhu Nov 19, 2020
d834777
Fix output size for orc read for skip_rows option (#6686)
kaatish Nov 19, 2020
1a80df9
[REVIEW] Rename `unary_op` to `unary_operator` (#6789)
codereport Nov 19, 2020
953c24d
Implement `cudf::clamp` for `decimal32` and `decimal64` (#6792)
codereport Nov 19, 2020
8c420ae
Fix AVRO reader issues with empty input (#6794)
vuule Nov 19, 2020
99cee1c
Cupy fallback for __array_function__ and __array_ufunc__ for cudf.Se…
VibhuJawa Nov 19, 2020
0f0e748
Parquet writer list statistics (#6703)
devavret Nov 20, 2020
f3ccf1c
Merge branch 'branch-0.17' into mwilson/int96
hyperbolic2346 Nov 20, 2020
263ec65
Add support for create_metadata_file in dask_cudf (#6796)
rjzamora Nov 20, 2020
7e51022
Replace raw streams with rmm::cuda_stream_view (part 2) (#6648)
harrism Nov 20, 2020
de5577c
[REVIEW] Add dictionary support to cudf::minmax (#6764)
davidwendt Nov 20, 2020
71d4c34
Merge pull request #6625 from hyperbolic2346/mwilson/int96
hyperbolic2346 Nov 20, 2020
5380744
Fix JNI build (#6824)
jlowe Nov 20, 2020
9b656ef
Fix resource management in Java ColumnBuilder (#6826)
jlowe Nov 20, 2020
062bf85
Fix `read_avro` docs (#6798)
galipremsagar Nov 21, 2020
bc154af
Add support for join parameter in cudf concat (#6336)
marlenezw Nov 21, 2020
b391cb7
Support scatter() for list columns (#6768)
mythrocks Nov 21, 2020
b5f2e3c
Use CMake 3.19 for RMM when building cuDF jar (#6819)
GaryShen2008 Nov 23, 2020
dbeac89
Use settings.xml if existing for internal build (#6833)
GaryShen2008 Nov 23, 2020
5cabd73
Enable copy_if for fixed-point decimal columns (#6805)
sperlingxx Nov 23, 2020
591bead
[REVIEW] Optimization and nested type support for contiguous_split. (…
nvdbaranec Nov 23, 2020
db066de
Enable workaround to write categorical columns in csv (#6829)
galipremsagar Nov 23, 2020
cdd72c9
Fix result representation in groupby.apply (#6790)
galipremsagar Nov 24, 2020
fd72e5f
Fix categorical scalar insertion (#6830)
VibhuJawa Nov 24, 2020
8cc23bd
Replace raw streams with rmm::cuda_stream_view (part 3) (#6744)
harrism Nov 24, 2020
17666c4
Enable `expand=False` in `.str.split` and `.str.rsplit` (#6813)
galipremsagar Nov 24, 2020
0e7ffcf
First class support for unbounded window function bounds (#6811)
mythrocks Nov 24, 2020
632ac54
Add LogicalType to Parquet reader (#6511)
karthikeyann Nov 24, 2020
6d9b139
fix uint32_t undefined errors
rongou Nov 24, 2020
490f01a
add to changelog
rongou Nov 24, 2020
e9aedb2
Implement `cudf::copy_range` for `decimal32` and `decimal64` (#6843)
codereport Nov 25, 2020
e1e3047
Split out cudf::distinct_count from drop_duplicates.cu (#6822)
davidwendt Nov 25, 2020
4eff46f
INT96 changes for Parquet writer
razajafri Nov 5, 2020
2bb8480
updated changelog
razajafri Nov 5, 2020
6e276be
addressed review comments
razajafri Nov 25, 2020
38dc99d
Merge branch 'parquet_writer_int96' of github.com:razajafri/cudf into…
razajafri Nov 25, 2020
d91ddaf
reverted CMake changes
razajafri Nov 25, 2020
6f198b4
Implement `cudf::copy_if_else` for `decimal32` and `decimal64` (#6845)
codereport Nov 25, 2020
5d45e03
updated changelog
razajafri Nov 25, 2020
250e405
Merge pull request #6848 from razajafri/parquet_writer_int96
razajafri Nov 25, 2020
f3b0e06
Avoid gather when copying strings view from start of strings column (…
jlowe Nov 26, 2020
c34d9bf
Add support for scatter() on lists-of-struct columns (#6817)
mythrocks Nov 26, 2020
a6331bf
Correct the param order of writeParquetBufferBegin
GaryShen2008 Nov 27, 2020
ff4f6f0
update Changelog
GaryShen2008 Nov 27, 2020
0b58244
reduce HtoD copies in `cudf::concatenate` #6605
karthikeyann Nov 28, 2020
45bd967
Merge pull request #6854 from GaryShen2008/fix-writeParquetBufferBegin
razajafri Nov 28, 2020
f0f53c7
Fix `.str.replace_with_backrefs` docs examples (#6855)
galipremsagar Nov 29, 2020
1771a8f
Replace cuio macros with constexpr and inline functions (#6782)
kaatish Nov 29, 2020
0492519
Merge remote-tracking branch 'upstream/branch-0.17' into fix-cstdint
rongou Nov 30, 2020
76799a1
Move template param to member var to improve compile of hash/groupby.…
davidwendt Nov 30, 2020
f3c9322
Fix contiguous split of null string columns (#6853)
sperlingxx Nov 30, 2020
b8e1ca6
Merge pull request #6844 from rongou/fix-cstdint
rongou Nov 30, 2020
cdc53b7
Move align_ptr_for_type() from cuda.cuh to alignment.hpp(#6859)
davidwendt Dec 1, 2020
83d2146
Fix compile error in type_dispatch_benchmark.cu (#6861)
davidwendt Dec 1, 2020
1c81827
Add dictionary support to cudf::reduce(#6666)
davidwendt Dec 1, 2020
bd537b6
Push DeviceScalar to cython-only (#6800)
brandon-b-miller Dec 1, 2020
0ddba3d
Verify that concatenating columns does not overflow size_type(#6809)
nvdbaranec Dec 2, 2020
ff66c5e
Refactor `std::array` usage in row group index writing in ORC(#6807)
rgsl888prabhu Dec 2, 2020
42644cc
Specify git branches to avoid pip unresolvable issues(#6869)
jdye64 Dec 2, 2020
edd1af1
Fix index handling in parquet reader and writer(#6771)
galipremsagar Dec 2, 2020
a4bdf24
add groupby hash mean aggregation, 2-pass method of hash groupby(#6392)
karthikeyann Dec 2, 2020
f854938
Force local artifact install(#6806)
raydouglass Dec 2, 2020
220c988
Improve Dockerfile(#6619)
igormp Dec 2, 2020
a2d2726
Remove bounds check for `cudf::gather`(#6875)
isVoid Dec 2, 2020
f137ed1
Support selecting different hash functions in hash_partition(#6726)
gaohao95 Dec 3, 2020
5336301
Fix typo and `0-d` numpy array handling in binary operation(#6887)
rgsl888prabhu Dec 3, 2020
70ebbee
Handle index when dispatching __array_function__ and __array_ufunc__ …
VibhuJawa Dec 3, 2020
73cca47
Serial murmur3 hash with configurable seed(#6781)
rwlee Dec 3, 2020
9fb69a6
Add parquet chunked writing ability for list columns(#6831)
devavret Dec 3, 2020
b9ef96c
Adding `decimal32` and `decimal64` support to parquet reading(#6808)
hyperbolic2346 Dec 4, 2020
f5e76fb
Support read_parquet with paths resolving to multiple files(#6815)
ayushdg Dec 4, 2020
1af9bc0
Update JNI to new gather boundary check API [skip ci] (#6899)
jlowe Dec 4, 2020
e22c3ae
Fix missing clone overrides on derived aggregations(#6898)
jlowe Dec 4, 2020
cd7a0ad
Parquet option for strictly decimal reading (#6908)
sperlingxx Dec 4, 2020
30bbb39
Create agg() function for dataframes(#6483)
skirui-source Dec 4, 2020
bd321d1
Enable groupby `list` aggregation for strings(#6914)
shwina Dec 4, 2020
00ca246
Update CHANGELOG.md
raydouglass Dec 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,6 @@ ENV/

# Dask
dask-worker-space/

# protobuf
**/*_pb2.py
212 changes: 211 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,201 @@
# cuDF 0.17.0 (10 Dec 2020)

## New Features

- PR #6116 Add `filters` parameter to Python `read_orc` function or filtering
- PR #6848 Added Java bindings for writing parquet files with INT96 timestamps
- PR #6460 Add is_timestamp format check API
- PR #6647 Implement `cudf::round` floating point and integer types (`HALF_EVEN`)
- PR #6562 Implement `cudf::round` floating point and integer types (`HALF_UP`)
- PR #6685 Implement `cudf::round` `decimal32` & `decimal64` (`HALF_UP` and `HALF_EVEN`)
- PR #6711 Implement `cudf::cast` for `decimal32/64` to/from integer and floating point
- PR #6777 Implement `cudf::unary_operation` for `decimal32` & `decimal64`
- PR #6729 Implement `cudf::cast` for `decimal32/64` to/from different `type_id`
- PR #6792 Implement `cudf::clamp` for `decimal32` and `decimal64`
- PR #6845 Implement `cudf::copy_if_else` for `decimal32` and `decimal64`
- PR #6805 Implement `cudf::detail::copy_if` for `decimal32` and `decimal64`
- PR #6843 Implement `cudf::copy_range` for `decimal32` and `decimal64`
- PR #6528 Enable `fixed_point` binary operations
- PR #6460 Add is_timestamp format check API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates line 7.

Suggested change
- PR #6460 Add is_timestamp format check API

- PR #6568 Add function to create hashed vocabulary file from raw vocabulary
- PR #6142 Add Python `read_orc_statistics` function for reading file- and stripe-level statistics
- PR #6581 Add JNI API to check if PTDS is enabled
- PR #6615 Add support for list and struct types to contiguous_split
- PR #6625 Add INT96 timestamp writing option to parquet writer
- PR #6592 Add `cudf.to_numeric` function
- PR #6598 Add strings::contains API with target column parameter
- PR #6638 Add support for `pipe` API
- PR #6737 New build process (Project Flash)
- PR #6652 Add support for struct columns in concatenate
- PR #6675 Add DecimalDtype to cuDF
- PR #6739 Add Java bindings for is_timestamp
- PR #6808 Add support for reading decimal32 and decimal64 from parquet
- PR #6781 Add serial murmur3 hashing
- PR #6811 First class support for unbounded window function bounds
- PR #6768 Add support for scatter() on list columns
- PR #6796 Add create_metadata_file in dask_cudf
- PR #6765 Cupy fallback for __array_function__ and __array_ufunc__ for cudf.Series
- PR #6817 Add support for scatter() on lists-of-struct columns
- PR #6805 Implement `cudf::detail::copy_if` for `decimal32` and `decimal64`
- PR #6483 Add `agg` function to aggregate dataframe using one or more operations
- PR #6726 Support selecting different hash functions in hash_partition
- PR #6619 Improve Dockerfile
- PR #6831 Added parquet chunked writing ability for list columns

## Improvements

- PR #6430 Add struct type support to `to_arrow` and `from_arrow`
- PR #6384 Add CSV fuzz tests with varying function parameters
- PR #6385 Add JSON fuzz tests with varying function parameters
- PR #6398 Remove function constructor macros in parquet reader
- PR #6432 Add dictionary support to `cudf::upper_bound` and `cudf::lower_bound`
- PR #6461 Replace index type-dispatch call with indexalator in cudf::scatter
- PR #6415 Support `datetime64` in row-wise op
- PR #6457 Replace index type-dispatch call with indexalator in `cudf::gather`
- PR #6413 Replace Python NVTX package with conda-forge source
- PR #6442 Remove deprecated `DataFrame.from_gpu_matrix`, `DataFrame.to_gpu_matrix`, `DataFrame.add_column` APIs and method parameters
- PR #6502 Add dictionary support to `cudf::merge`
- PR #6471 Replace index type-dispatch call with indexalator in cudf::strings::substring
- PR #6485 Add File IO to cuIO benchmarks
- PR #6504 Update Java bindings version to 0.17-SNAPSHOT
- PR #6875 Remove bounds check for `cudf::gather`
- PR #6489 Add `AVRO` fuzz tests with varying function parameters
- PR #6540 Add dictionary support to `cudf::unary_operation`
- PR #6537 Refactor ORC timezone
- PR #6527 Refactor DeviceColumnViewAccess to avoid JNI returning an array
- PR #6690 Explicitly set legacy or per-thread default stream in JNI
- PR #6545 Pin cmake policies to cmake 3.17 version
- PR #6556 Add dictionary support to `cudf::inner_join`, `cudf::left_join` and `cudf::full_join`
- PR #6557 Support nullable timestamp columns in time range window functions
- PR #6566 Remove `reinterpret_cast` conversions between pointer types in ORC
- PR #6544 Remove `fixed_point` precise round
- PR #6552 Use `assert_exceptions_equal` to assert exceptions in pytests
- PR #6555 Adapt JNI build to libcudf composition of multiple libraries
- PR #6559 Refactoring cooperative loading with single thread loading.
- PR #6564 Load JNI library dependencies with a thread pool
- PR #6571 Add ORC fuzz tests with varying function parameters
- PR #6578 Add in java column to row conversion
- PR #6573 Create `cudf::detail::byte_cast` for `cudf::byte_cast`
- PR #6597 Use thread-local to track CUDA device in JNI
- PR #6599 Replace `size()==0` with `empty()`, `is_empty()`
- PR #6514 Initial work for decimal type in Java/JNI
- PR #6605 Reduce HtoD copies in `cudf::concatenate` of string columns
- PR #6608 Improve subword tokenizer docs
- PR #6610 Add ability to set scalar values in `cudf.DataFrame`
- PR #6612 Update JNI to new RMM cuda_stream_view API
- PR #6646 Replace `cudaStream_t` with `rmm::cuda_stream_view` (part 1)
- PR #6648 Replace `cudaStream_t` with `rmm::cuda_stream_view` (part 2)
- PR #6744 Replace `cudaStream_t` with `rmm::cuda_stream_view` (part 3)
- PR #6579 Update scatter APIs to use reference wrapper / const scalar
- PR #6614 Add support for conversion to Pandas nullable dtypes and fix related issue in `cudf.to_json`
- PR #6622 Update `to_pandas` api docs
- PR #6623 Add operator overloading to column and clean up error messages
- PR #6644 Cover different CSV reader/writer options in benchmarks
- PR #6741 Cover different ORC and Parquet reader/writer options in benchmarks
- PR #6651 Add cudf::dictionary::make_dictionary_pair_iterator
- PR #6666 Add dictionary support to `cudf::reduce`
- PR #6635 Add cudf::test::dictionary_column_wrapper class
- PR #6702 Fix orc read corruption on boolean column
- PR #6676 Add dictionary support to `cudf::quantile`
- PR #6673 Parameterize avro and json benchmark
- PR #6609 Support fixed-point decimal for HostColumnVector
- PR #6703 Add list column statistics writing to Parquet writer
- PR #6662 `RangeIndex` supports `step` parameter
- PR #6712 Remove `reinterpret_cast` conversions between pointer types in Avro
- PR #6705 Add nested type support to Java table serialization
- PR #6709 Raise informative error while converting a pandas dataframe with duplicate columns
- PR #6727 Remove 2nd type-dispatcher call from cudf::reduce
- PR #6749 Update nested JNI builder so we can do it incrementally
- PR #6748 Add Java API to concatenate serialized tables to ContiguousTable
- PR #6764 Add dictionary support to `cudf::minmax`
- PR #6734 Binary operations support for decimal type in cudf Java
- PR #6761 Add Java/JNI bindings for round
- PR #6776 Use `void` return type for kernel wrapper functions instead of returning `cudaError_t`
- PR #6786 Add nested type support to ColumnVector#getDeviceMemorySize
- PR #6780 Move `cudf::cast` tests to separate test file
- PR #6809 size_type overflow checking when concatenating columns
- PR #6789 Rename `unary_op` to `unary_operator`
- PR #6770 Support building decimal columns with Table.TestBuilder
- PR #6815 Add wildcard path support to `read_parquet`
- PR #6800 Push DeviceScalar to cython-only
- PR #6822 Split out `cudf::distinct_count` from `drop_duplicates.cu`
- PR #6813 Enable `expand=False` in `.str.split` and `.str.rsplit`
- PR #6829 Enable workaround to write categorical columns in csv
- PR #6819 Use CMake 3.19 for RMM when building cuDF jar
- PR #6833 Use settings.xml if existing for internal build
- PR #6839 Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series
- PR #6835 Move template param to member var to improve compile of hash/groupby.cu
- PR #6837 Avoid gather when copying strings view from start of strings column
- PR #6859 Move align_ptr_for_type() from cuda.cuh to alignment.hpp
- PR #6807 Refactor `std::array` usage in row group index writing in ORC
- PR #6914 Enable groupby `list` aggregation for strings
- PR #6908 Parquet option for strictly decimal reading

## Bug Fixes

- PR #6446 Fix integer parsing in CSV and JSON for values outside of int64 range
- PR #6506 Fix DateTime type value truncation while writing to csv
- PR #6509 Disable JITIFY log printing
- PR #6517 Handle index equality in `Series` and `DataFrame` equality checks
- PR #6519 Fix end-of-string marking boundary condition in subword-tokenizer
- PR #6543 Handle `np.nan` values in `isna`/`isnull`/`notna`/`notnull`
- PR #6549 Fix memory_usage calls for list columns
- PR #6575 Fix JNI RMM initialize with no pool allocator limit
- PR #6636 Fix orc boolean column corruption issue
- PR #6582 Add missing `device_scalar` stream parameters
- PR #6596 Fix memory usage calculation
- PR #6595 Fix JNI build, broken by to_arrow() signature change
- PR #6601 Fix timezone offset when reading ORC files
- PR #6603 Use correct stream in hash_join.
- PR #6616 Block `fixed_point` `cudf::concatenate` with different scales
- PR #6607 Fix integer overflow in ORC encoder
- PR #6617 Fix JNI native dependency load order
- PR #6621 Fix subword tokenizer metadata for token count equal to max_sequence_length
- PR #6629 Fix JNI CMake
- PR #6633 Fix Java HostColumnVector unnecessarily loading native dependencies
- PR #6643 Fix csv writer handling embedded comma delimiter
- PR #6640 Add error message for unsupported `axis` parameter in DataFrame APIs
- PR #6686 Fix output size for orc read for skip_rows option
- PR #6710 Fix an out-of-bounds indexing error in gather() for lists
- PR #6670 Fix a bug where PTX parser fails to correctly parse a python lambda generated UDF
- PR #6687 Fix issue where index name of caller object is being modified in csv writer
- PR #6735 Fix hash join where row hash values would end up equal to the reserved empty key value
- PR #6696 Fix release_assert.
- PR #6692 Fix handling of empty column name in csv writer
- PR #6693 Fix issue related to `na_values` input in `read_csv`
- PR #6701 Fix issue when `numpy.str_` is given as input to string parameters in io APIs
- PR #6704 Fix leak warnings in JNI unit tests
- PR #6713 Fix missing call to cudaStreamSynchronize in get_value
- PR #6708 Apply `na_rep` to column names in csv writer
- PR #6720 Fix implementation of `dtype` parameter in `cudf.read_csv`
- PR #6721 Add missing serialization methods for ListColumn
- PR #6722 Fix index=False bug in dask_cudf.read_parquet
- PR #6766 Fix race conditions in parquet
- PR #6728 Fix cudf python docs and associated build warnings
- PR #6732 Fix cuDF benchmarks build with static Arrow lib and fix rapids-compose cuDF JNI build
- PR #6742 Fix concat bug in dask_cudf Series/Index creation
- PR #6632 Fix DataFrame initialization from list of dicts
- PR #6767 Fix sort order of parameters in `test_scalar_invalid_implicit_conversion` pytest
- PR #6771 Fix index handling in parquet reader and writer
- PR #6787 Update java reduction APIs to reflect C++ changes
- PR #6790 Fix result representation in groupby.apply
- PR #6794 Fix AVRO reader issues with empty input
- PR #6798 Fix `read_avro` docs
- PR #6824 Fix JNI build
- PR #6826 Fix resource management in Java ColumnBuilder
- PR #6830 Fix categorical scalar insertion
- PR #6844 Fix uint32_t undefined errors
- PR #6854 Fix the parameter order of writeParquetBufferBegin
- PR #6855 Fix `.str.replace_with_backrefs` docs examples
- PR #6853 Fix contiguous split of null string columns
- PR #6861 Fix compile error in type_dispatch_benchmark.cu
- PR #6869 Avoid dependency resolution failure in latest version of pip by explicitly specifying versions for dask and distributed
- PR #6806 Force install of local conda artifacts
- PR #6887 Fix typo and `0-d` numpy array handling in binary operation
- PR #6898 Fix missing clone overrides on derived aggregations
- PR #6899 Update JNI to new gather boundary check API


# cuDF 0.16.0 (21 Oct 2020)

## New Features
Expand Down Expand Up @@ -29,6 +227,10 @@
- PR #6301 Add JNI bindings to nvcomp
- PR #6328 Java and JNI bindings for getMapValue/map_lookup
- PR #6371 Use ColumnViewAccess on Host side
- PR #6392 add hash based groupby mean aggregation
- PR #6511 Add LogicalType to Parquet reader
- PR #6297 cuDF Python Scalars
- PR #6723 Support creating decimal vectors from scalar

## Improvements

Expand Down Expand Up @@ -114,8 +316,8 @@
- PR #6326 Simplify interal csv/json kernel parameters
- PR #6308 Add dictionary support to cudf::scatter with scalar
- PR #6367 Add JNI bindings for byte casting
- PR #6346 Remove macros from CompactProtocolWriter
- PR #6312 Conda recipe dependency cleanup
- PR #6346 Remove macros from CompactProtocolWriter
- PR #6347 Add dictionary support to cudf::copy_range
- PR #6352 Add specific Topic support for Kafka "list_topics()" metadata requests
- PR #6332 Add support to return csv as string when `path=None` in `to_csv`
Expand All @@ -127,9 +329,16 @@
- PR #6400 Removed unused variables
- PR #6409 Allow CuPy 8.x
- PR #6407 Add RMM_LOGGING_LEVEL flag to Java docker build
- PR #6425 Factor out csv parse_options creation to pure function
- PR #6438 Fetch nvcomp v1.1.0 for JNI build
- PR #6459 Add `map` method to series
- PR #6379 Add list hashing functionality to MD5
- PR #6498 Add helper method to ColumnBuilder with some nits
- PR #6336 Add `join` functionality in cudf concat
- PR #6653 Replaced SHFL_XOR calls with cub::WarpReduce
- PR #6751 Rework ColumnViewAccess and its usage
- PR #6698 Remove macros from ORC reader and writer
- PR #6782 Replace cuio macros with constexpr and inline functions

## Bug Fixes

Expand All @@ -142,6 +351,7 @@
- PR #6118 Fix Java build for ORC read args change and update package version
- PR #6121 Replace calls to get_default_resource with get_current_device_resource
- PR #6128 Add support for numpy RandomState handling in `sample`
- PR #6134 Fix CUDA C/C++ debug builds
- PR #6137 Fix issue where `np.nan` is being return instead of `NAT` for datetime/duration types
- PR #6298 Fix gcc-9 compilation error in dictionary/remove_keys.cu
- PR #6172 Fix slice issue with empty column
Expand Down
9 changes: 6 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,15 @@ ARG CC=5
ARG CXX=5
RUN apt update -y --fix-missing && \
apt upgrade -y && \
apt install -y \
apt install -y --no-install-recommends \
git \
gcc-${CC} \
g++-${CXX} \
libboost-all-dev \
tzdata
tzdata && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Install conda
ADD https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh /miniconda.sh
Expand Down Expand Up @@ -70,7 +73,7 @@ RUN source activate cudf && \
mkdir -p /cudf/cpp/build && \
cd /cudf/cpp/build && \
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} && \
make -j install
make -j"$(nproc)" install

# cuDF build/install
RUN source activate cudf && \
Expand Down
8 changes: 4 additions & 4 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@ conda install "rmm=$MINOR_VERSION.*" "cudatoolkit=$CUDA_REL" \
# conda install "your-pkg=1.0.0"

# Install the master version of dask, distributed, and streamz
logger "pip install git+https://github.com/dask/distributed.git --upgrade --no-deps"
pip install "git+https://github.com/dask/distributed.git" --upgrade --no-deps
logger "pip install git+https://github.com/dask/dask.git --upgrade --no-deps"
pip install "git+https://github.com/dask/dask.git" --upgrade --no-deps
logger "pip install git+https://github.com/dask/distributed.git@master --upgrade --no-deps"
pip install "git+https://github.com/dask/distributed.git@master" --upgrade --no-deps
logger "pip install git+https://github.com/dask/dask.git@master --upgrade --no-deps"
pip install "git+https://github.com/dask/dask.git@master" --upgrade --no-deps
logger "pip install git+https://github.com/python-streamz/streamz.git --upgrade --no-deps"
pip install "git+https://github.com/python-streamz/streamz.git" --upgrade --no-deps

Expand Down
35 changes: 23 additions & 12 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,24 +55,35 @@ conda config --set ssl_verify False
# BUILD - Conda package builds
################################################################################

gpuci_logger "Build conda pkg for libcudf"
gpuci_conda_retry build conda/recipes/libcudf
if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
CONDA_BUILD_ARGS=""
CONDA_CHANNEL=""
else
CONDA_BUILD_ARGS="--dirty --no-remove-work-dir"
CONDA_CHANNEL="-c $WORKSPACE/ci/artifacts/cudf/cpu/conda-bld/"
fi

gpuci_logger "Build conda pkg for libcudf_kafka"
gpuci_conda_retry build conda/recipes/libcudf_kafka
if [ "$BUILD_LIBCUDF" == '1' ]; then
gpuci_logger "Build conda pkg for libcudf"
gpuci_conda_retry build conda/recipes/libcudf $CONDA_BUILD_ARGS

gpuci_logger "Build conda pkg for cudf"
gpuci_conda_retry build conda/recipes/cudf --python=$PYTHON
gpuci_logger "Build conda pkg for libcudf_kafka"
gpuci_conda_retry build conda/recipes/libcudf_kafka $CONDA_BUILD_ARGS
fi

gpuci_logger "Build conda pkg for dask-cudf"
gpuci_conda_retry build conda/recipes/dask-cudf --python=$PYTHON
if [ "$BUILD_CUDF" == '1' ]; then
gpuci_logger "Build conda pkg for cudf"
gpuci_conda_retry build conda/recipes/cudf --python=$PYTHON $CONDA_BUILD_ARGS $CONDA_CHANNEL

gpuci_logger "Build conda pkg for cudf_kafka"
gpuci_conda_retry build conda/recipes/cudf_kafka --python=$PYTHON
gpuci_logger "Build conda pkg for dask-cudf"
gpuci_conda_retry build conda/recipes/dask-cudf --python=$PYTHON $CONDA_BUILD_ARGS $CONDA_CHANNEL

gpuci_logger "Build conda pkg for custreamz"
gpuci_conda_retry build conda/recipes/custreamz --python=$PYTHON
gpuci_logger "Build conda pkg for cudf_kafka"
gpuci_conda_retry build conda/recipes/cudf_kafka --python=$PYTHON $CONDA_BUILD_ARGS $CONDA_CHANNEL

gpuci_logger "Build conda pkg for custreamz"
gpuci_conda_retry build conda/recipes/custreamz --python=$PYTHON $CONDA_BUILD_ARGS $CONDA_CHANNEL
fi
################################################################################
# UPLOAD - Conda packages
################################################################################
Expand Down
6 changes: 6 additions & 0 deletions ci/cpu/prebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ if [[ "$PYTHON" == "3.7" ]] && [[ "$CUDA" == "10.1" ]]; then
else
export UPLOAD_LIBCUDF_KAFKA=0
fi

if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
#If project flash is not activate, always build both
export BUILD_LIBCUDF=1
export BUILD_CUDF=1
fi
Loading