-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v21.06 #8418
[RELEASE] cudf v21.06 #8418
Conversation
Fix automerge 0.19 --> 0.20
Replaces instances of `rmm::device_vector` with `rmm::device_uvector` in gather detail functions and in gather tests. Also adds a utility factory to create a device_uvector containing all zeros, `cudf::detail::make_zero_device_uvector_async()` (also sync() version). Contributes to #7287 This speeds up small gathers, especially gathers that result in a lot of random accesses in multiple columns (`coalesce_o` benchmarks below). ``` (rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/gather_vector.json ./gather_uvector.json Comparing /home/mharris/rapids/cudf/cpp/build/gather_vector.json to ./gather_uvector.json Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------- Gather/double_coalesce_x/1024/1/manual_time -0.2107 -0.1394 39220 30955 59030 50802 Gather/double_coalesce_x/2048/1/manual_time -0.2020 -0.1347 39390 31434 59189 51214 Gather/double_coalesce_x/4096/1/manual_time -0.3780 -0.2978 50106 31168 72432 50863 Gather/double_coalesce_x/8192/1/manual_time -0.2221 -0.1525 40875 31795 60573 51336 Gather/double_coalesce_x/16384/1/manual_time -0.2173 -0.1498 41056 32134 60631 51547 Gather/double_coalesce_x/32768/1/manual_time -0.5170 -0.4413 67973 32830 93277 52116 Gather/double_coalesce_x/65536/1/manual_time -0.2144 -0.1567 43154 33901 62330 52562 Gather/double_coalesce_x/131072/1/manual_time -0.2190 -0.1610 45139 35252 63802 53532 Gather/double_coalesce_x/262144/1/manual_time -0.1796 -0.1336 49142 40317 66700 57792 Gather/double_coalesce_x/524288/1/manual_time -0.1448 -0.1153 66965 57268 83863 74192 Gather/double_coalesce_x/1048576/1/manual_time -0.1144 -0.0940 84745 75048 104572 94738 Gather/double_coalesce_x/2097152/1/manual_time -0.0756 -0.0637 129918 120090 149744 140208 Gather/double_coalesce_x/4194304/1/manual_time -0.0482 -0.0442 211133 200956 231364 221128 Gather/double_coalesce_x/8388608/1/manual_time -0.0274 -0.0257 381666 371191 401358 391041 Gather/double_coalesce_x/16777216/1/manual_time -0.0172 -0.0169 715312 703014 735392 722967 Gather/double_coalesce_x/33554432/1/manual_time -0.0103 -0.0103 1385742 1371471 1405822 1391363 Gather/double_coalesce_x/67108864/1/manual_time -0.0104 -0.0105 2742975 2714326 2763156 2734156 Gather/double_coalesce_x/1024/2/manual_time -0.1733 -0.1273 49524 40940 69564 60707 Gather/double_coalesce_x/2048/2/manual_time -0.1771 -0.1301 49582 40801 69514 60468 Gather/double_coalesce_x/4096/2/manual_time -0.2218 -0.1731 52703 41012 73221 60547 Gather/double_coalesce_x/8192/2/manual_time -0.1816 -0.1355 51293 41978 70950 61333 Gather/double_coalesce_x/16384/2/manual_time -0.2193 -0.1753 54360 42438 74558 61491 Gather/double_coalesce_x/32768/2/manual_time -0.2082 -0.1667 54656 43278 74339 61950 Gather/double_coalesce_x/65536/2/manual_time -0.2184 -0.1824 57593 45015 76367 62434 Gather/double_coalesce_x/131072/2/manual_time -0.1687 -0.1318 57243 47587 75106 65204 Gather/double_coalesce_x/262144/2/manual_time -0.1428 -0.1188 65898 56488 82826 72984 Gather/double_coalesce_x/524288/2/manual_time -0.0977 -0.0840 94131 84931 114061 104483 Gather/double_coalesce_x/1048576/2/manual_time -0.0765 -0.0668 130704 120700 150830 140753 Gather/double_coalesce_x/2097152/2/manual_time -0.1651 -0.1693 241447 201587 266641 221508 Gather/double_coalesce_x/4194304/2/manual_time -0.0370 -0.0366 367010 353439 387585 373393 Gather/double_coalesce_x/8388608/2/manual_time -0.0463 -0.0488 698669 666290 721442 686251 Gather/double_coalesce_x/16777216/2/manual_time -0.0190 -0.0200 1310748 1285850 1331922 1305220 Gather/double_coalesce_x/33554432/2/manual_time -0.0103 -0.0116 2551725 2525356 2574809 2545046 Gather/double_coalesce_x/67108864/2/manual_time -0.0077 -0.0076 5052399 5013498 5071455 5032820 Gather/double_coalesce_x/1024/4/manual_time -0.1258 -0.0971 65293 57082 85152 76886 Gather/double_coalesce_x/2048/4/manual_time -0.1314 -0.1025 66836 58056 86580 77704 Gather/double_coalesce_x/4096/4/manual_time -0.1375 -0.1084 68618 59185 88189 78628 Gather/double_coalesce_x/8192/4/manual_time -0.1244 -0.0985 69209 60602 88567 79841 Gather/double_coalesce_x/16384/4/manual_time -0.1274 -0.1023 69878 60973 88767 79685 Gather/double_coalesce_x/32768/4/manual_time -0.1300 -0.1074 71739 62413 89585 79963 Gather/double_coalesce_x/65536/4/manual_time -0.1190 -0.0964 74570 65697 91701 82866 Gather/double_coalesce_x/131072/4/manual_time -0.1246 -0.1042 81384 71245 98254 88015 Gather/double_coalesce_x/262144/4/manual_time -0.0938 -0.0797 99592 90251 119523 109995 Gather/double_coalesce_x/524288/4/manual_time -0.0692 -0.0606 150809 140368 170768 160422 Gather/double_coalesce_x/1048576/4/manual_time -0.0464 -0.0416 222369 212056 242091 232029 Gather/double_coalesce_x/2097152/4/manual_time -0.0309 -0.0294 375601 363979 395799 384156 Gather/double_coalesce_x/4194304/4/manual_time -0.0225 -0.0219 677860 662615 697954 682665 Gather/double_coalesce_x/8388608/4/manual_time -0.0120 -0.0116 1283815 1268472 1303842 1288669 Gather/double_coalesce_x/16777216/4/manual_time -0.0066 -0.0065 2485080 2468648 2504552 2488272 Gather/double_coalesce_x/33554432/4/manual_time -0.0065 -0.0066 4906254 4874451 4926452 4893939 Gather/double_coalesce_x/67108864/4/manual_time -0.0051 -0.0052 9738184 9688070 9758293 9707412 Gather/double_coalesce_x/1024/8/manual_time -0.0923 -0.0780 99750 90543 119460 110148 Gather/double_coalesce_x/2048/8/manual_time -0.0943 -0.0812 104464 94616 124136 114057 Gather/double_coalesce_x/4096/8/manual_time -0.0826 -0.0704 104342 95729 123625 114924 Gather/double_coalesce_x/8192/8/manual_time -0.0863 -0.0745 106355 97181 125424 116081 Gather/double_coalesce_x/16384/8/manual_time -0.0909 -0.0833 109639 99675 127917 117260 Gather/double_coalesce_x/32768/8/manual_time -0.1033 -0.0904 111931 100369 129199 117524 Gather/double_coalesce_x/65536/8/manual_time -0.0790 -0.0718 117680 108383 134615 124948 Gather/double_coalesce_x/131072/8/manual_time -0.0698 -0.0648 128666 119686 149236 139564 Gather/double_coalesce_x/262144/8/manual_time -0.0627 -0.0584 176862 165765 196982 185487 Gather/double_coalesce_x/524288/8/manual_time -0.0423 -0.0409 260787 249757 281275 269766 Gather/double_coalesce_x/1048576/8/manual_time -0.0621 -0.0643 429172 402535 451741 422675 Gather/double_coalesce_x/2097152/8/manual_time -0.0184 -0.0183 710338 697286 730524 717175 Gather/double_coalesce_x/4194304/8/manual_time -0.0121 -0.0122 1314264 1298322 1334627 1318292 Gather/double_coalesce_x/8388608/8/manual_time -0.0077 -0.0080 2526130 2506596 2546852 2526362 Gather/double_coalesce_x/16777216/8/manual_time -0.0068 -0.0071 4936576 4902969 4957924 4922492 Gather/double_coalesce_x/33554432/8/manual_time -0.0060 -0.0061 9764109 9705650 9784089 9724860 Gather/double_coalesce_x/67108864/8/manual_time +0.0024 +0.0040 19377452 19423850 19395926 19472639 Gather/double_coalesce_o/1024/1/manual_time -0.2392 -0.1605 40288 30653 60137 50487 Gather/double_coalesce_o/2048/1/manual_time -0.2229 -0.1488 40499 31474 60259 51293 Gather/double_coalesce_o/4096/1/manual_time -0.2244 -0.1510 40255 31220 59939 50889 Gather/double_coalesce_o/8192/1/manual_time -0.2386 -0.1638 41493 31591 61076 51074 Gather/double_coalesce_o/16384/1/manual_time -0.2302 -0.1595 41620 32038 61060 51322 Gather/double_coalesce_o/32768/1/manual_time -0.2312 -0.1580 42084 32356 61113 51459 Gather/double_coalesce_o/65536/1/manual_time -0.2309 -0.1669 43309 33310 62104 51738 Gather/double_coalesce_o/131072/1/manual_time -0.2152 -0.1568 45496 35707 63771 53769 Gather/double_coalesce_o/262144/1/manual_time -0.1761 -0.1350 50637 41719 67984 58803 Gather/double_coalesce_o/524288/1/manual_time -0.1268 -0.1018 77047 67276 93533 84013 Gather/double_coalesce_o/1048576/1/manual_time -0.0656 -0.0607 139799 130622 157435 147879 Gather/double_coalesce_o/2097152/1/manual_time -0.0339 -0.0315 310673 300142 327832 317499 Gather/double_coalesce_o/4194304/1/manual_time -0.0130 -0.0126 745909 736213 763176 753529 Gather/double_coalesce_o/8388608/1/manual_time -0.0107 -0.0124 1692938 1674891 1712953 1691766 Gather/double_coalesce_o/16777216/1/manual_time -0.0065 -0.0065 3573639 3550361 3590792 3567319 Gather/double_coalesce_o/33554432/1/manual_time -0.0181 -0.0182 7436123 7301471 7453930 7318261 Gather/double_coalesce_o/67108864/1/manual_time -0.0044 -0.0043 14884920 14819239 14902115 14837304 Gather/double_coalesce_o/1024/2/manual_time -0.2005 -0.1460 49442 39531 69426 59287 Gather/double_coalesce_o/2048/2/manual_time -0.2277 -0.1716 51763 39974 72190 59805 Gather/double_coalesce_o/4096/2/manual_time -0.2166 -0.1606 51292 40183 71093 59676 Gather/double_coalesce_o/8192/2/manual_time -0.2113 -0.1591 51906 40937 71570 60185 Gather/double_coalesce_o/16384/2/manual_time -0.2197 -0.1690 52719 41139 72230 60020 Gather/double_coalesce_o/32768/2/manual_time -0.2017 -0.1526 52711 42078 71603 60679 Gather/double_coalesce_o/65536/2/manual_time -0.1927 -0.1507 54591 44073 72393 61487 Gather/double_coalesce_o/131072/2/manual_time -0.1870 -0.1466 59596 48452 77229 65910 Gather/double_coalesce_o/262144/2/manual_time -0.1340 -0.1114 69884 60522 86743 77084 Gather/double_coalesce_o/524288/2/manual_time -0.0926 -0.0804 108312 98280 126445 116280 Gather/double_coalesce_o/1048576/2/manual_time -0.1730 -0.1738 272091 225021 293411 242413 Gather/double_coalesce_o/2097152/2/manual_time -0.1032 -0.1047 620392 556378 640418 573389 Gather/double_coalesce_o/4194304/2/manual_time -0.0101 -0.0100 1433137 1418717 1450168 1435703 Gather/double_coalesce_o/8388608/2/manual_time -0.0109 -0.0099 3305723 3269542 3319432 3286452 Gather/double_coalesce_o/16777216/2/manual_time -0.1273 -0.1264 7984650 6968108 7994776 6984542 Gather/double_coalesce_o/33554432/2/manual_time -0.0020 -0.0021 14398179 14368669 14415640 14384911 Gather/double_coalesce_o/67108864/2/manual_time -0.0010 -0.0008 29210287 29181238 29225105 29200345 Gather/double_coalesce_o/1024/4/manual_time -0.1157 -0.0896 65460 57884 85237 77598 Gather/double_coalesce_o/2048/4/manual_time -0.1334 -0.1051 67124 58169 86838 77711 Gather/double_coalesce_o/4096/4/manual_time -0.1322 -0.1035 68776 59687 88221 79089 Gather/double_coalesce_o/8192/4/manual_time -0.1390 -0.1142 70076 60334 89617 79381 Gather/double_coalesce_o/16384/4/manual_time -0.2317 -0.2069 79393 60994 100311 79553 Gather/double_coalesce_o/32768/4/manual_time -0.1678 -0.1471 75173 62562 93681 79902 Gather/double_coalesce_o/65536/4/manual_time -0.2437 -0.2228 86798 65643 106518 82784 Gather/double_coalesce_o/131072/4/manual_time -0.1594 -0.1438 89554 75282 107471 92020 Gather/double_coalesce_o/262144/4/manual_time -0.1328 -0.1207 110766 96061 131076 115259 Gather/double_coalesce_o/524288/4/manual_time -0.1644 -0.1578 193707 161855 213742 180009 Gather/double_coalesce_o/1048576/4/manual_time -0.1602 -0.1590 494013 414850 513599 431942 Gather/double_coalesce_o/2097152/4/manual_time -0.2039 -0.2083 1341067 1067671 1370419 1084992 Gather/double_coalesce_o/4194304/4/manual_time -0.1348 -0.1357 3219329 2785257 3241492 2801606 Gather/double_coalesce_o/8388608/4/manual_time -0.1156 -0.1168 7312386 6467063 7341849 6484533 Gather/double_coalesce_o/16777216/4/manual_time -0.1382 -0.1402 16057935 13838182 16093164 13837500 Gather/double_coalesce_o/33554432/4/manual_time -0.1464 -0.1484 33459659 28562634 33518865 28545498 Gather/double_coalesce_o/67108864/4/manual_time -0.1525 -0.1536 68465952 58022936 68487360 57968251 Gather/double_coalesce_o/1024/8/manual_time -0.1390 -0.1210 105500 90841 125697 110486 Gather/double_coalesce_o/2048/8/manual_time -0.3868 -0.3634 153916 94385 178698 113762 Gather/double_coalesce_o/4096/8/manual_time -0.3547 -0.3322 147806 95380 171262 114369 Gather/double_coalesce_o/8192/8/manual_time -0.3625 -0.3416 151157 96369 174958 115193 Gather/double_coalesce_o/16384/8/manual_time -0.3517 -0.3346 150529 97588 172980 115105 Gather/double_coalesce_o/32768/8/manual_time -0.3548 -0.3371 155101 100069 176852 117230 Gather/double_coalesce_o/65536/8/manual_time -0.3685 -0.3515 168954 106693 190043 123249 Gather/double_coalesce_o/131072/8/manual_time -0.3300 -0.3157 187773 125801 212423 145360 Gather/double_coalesce_o/262144/8/manual_time -0.2923 -0.2822 247248 174988 270806 194371 Gather/double_coalesce_o/524288/8/manual_time -0.2347 -0.2317 374475 286568 396719 304800 Gather/double_coalesce_o/1048576/8/manual_time -0.1865 -0.1882 986328 802362 1010052 819931 Gather/double_coalesce_o/2097152/8/manual_time -0.0397 -0.0401 2187582 2100684 2206334 2117788 Gather/double_coalesce_o/4194304/8/manual_time -0.0209 -0.0213 5659923 5541733 5679230 5558246 Gather/double_coalesce_o/8388608/8/manual_time -0.0246 -0.0248 13221763 12896366 13239906 12912104 Gather/double_coalesce_o/16777216/8/manual_time -0.0525 -0.0526 29191054 27657719 29210728 27672969 Gather/double_coalesce_o/33554432/8/manual_time -0.0815 -0.0816 62161763 57097037 62180671 57109769 Gather/double_coalesce_o/67108864/8/manual_time -0.0266 -0.0270 119143801 115977739 119214918 115998559 ``` Authors: - Mark Harris (https://github.com/harrism) - Devavret Makkar (https://github.com/devavret) - Ashwin Srinath (https://github.com/shwina) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Devavret Makkar (https://github.com/devavret) - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) URL: #7758
There are a few classes that need `obj.foo` to behave like `obj['foo']`. They were implementing this independently, but getting it just right can be tricky, so this centralizes that logic into a single mixin. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) - Michael Wang (https://github.com/isVoid) URL: #7845
Closes #7879, adds the ability to coerce an `int` or `Decimal` to a different `Decimal64Dtype` where possible and begins to plumb `pa.scalar` into some useful places within `cudf.Scalar` Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Keith Kraus (https://github.com/kkraus14) - Paul Taylor (https://github.com/trxcllnt) URL: #7899
While debugging a GDS issue, found that all the other jni functions make this call before doing anything. Adding these doesn't actually fix the GDS problem, but it seems a prudent thing to do. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #7983
Removes `device_vector` in favour of either `device_uvector` or `device_buffer` as appropriate in parquet reader and writer. Contributes to #7287 Depends on #7758 Authors: - Devavret Makkar (https://github.com/devavret) - Mark Harris (https://github.com/harrism) Approvers: - Mark Harris (https://github.com/harrism) - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) - Mike Wilson (https://github.com/hyperbolic2346) URL: #7853
Update CPM with a [fix for `FETCHCONTENT_BASE_DIR`](cpm-cmake/CPM.cmake#244). Authors: - Paul Taylor (https://github.com/trxcllnt) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: #7982
Changes the `_global_set` union operation happening in `_is_supported()` to ```python _global_set = _global_set.union(set(arg[col])) ``` Since `set.union()` doesn't actually modify the set in place. Before this PR, passing something like `{"a": ["unsupported_agg"]}` into `_is_supported()` would always return `True`. cc @rjzamora Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Richard (Rick) Zamora (https://github.com/rjzamora) - Keith Kraus (https://github.com/kkraus14) URL: #7959
Fixes #5682. - Structure `nvstrdesc_s` was replaced with `thrust::pair<const char*, size_type>;`. - `nvstrdesc_s` related logical functions such as `nvstr_is_lesser`, `nvstr_is_greater` etc. were removed. - Include directives for headers included by source files residing in the same directory were made relative as per the developer guide. - `make_column` function related to `column_buffer` was moved from a header file to an implementation file. Authors: - Kumar Aatish (https://github.com/kaatish) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec - Devavret Makkar (https://github.com/devavret) - Keith Kraus (https://github.com/kkraus14) URL: #7841
This reverts commit 3327f7b. We have to revert this because the dependent project is broken and my system is in a broken state. Authors: - Raza Jafri (https://github.com/razajafri) Approvers: - Rong Ou (https://github.com/rongou) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Robert (Bobby) Evans (https://github.com/revans2) URL: #7987
This PR removes the `_char_width` member from the libcudf `cudf::string_view` class. This member was being used to record the character width of the bytes in the string but only if all the characters have the same width. This occurs when the string only contains ASCII encoded data which is all single-byte UTF-8 characters. The same optimization can be inferred when the existing `_length` and `_bytes` are equal. This change reduces the memory footprint of this class from 24 bytes to 16 bytes and therefore matches the size of `thrust::pair<char*,size_type>`. Using this class in a vector would thereby reduce the memory requirements for that vector by 1/3. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Mike Wilson (https://github.com/hyperbolic2346) URL: #7914
This PR updates the CUDA version used in the build scripts. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #7984
Resolves issue with ORC reader. There were two issues, There was a missing check to keep number of streams that needs to be accessed. The position which was being used to calculate buffer length was wrong, and assigned non-zero value for a stream whose length is zero. Authors: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) Approvers: - Ayush Dattagupta (https://github.com/ayushdg) - Paul Taylor (https://github.com/trxcllnt) - Devavret Makkar (https://github.com/devavret) URL: #7988
This PR relaxes pandas version pinning which was introduced in `0.19`. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Paul Taylor (https://github.com/trxcllnt) - Ray Douglass (https://github.com/raydouglass) - Keith Kraus (https://github.com/kkraus14) URL: #7992
This PR add support for `lower_bound` and `upper_bound` binary searchs for structs column. This closes #7690. In addition to adding binary search for structs, I also did some refactoring for `tests/search/search_test.cpp`, extracting dictionary search test from it. As such, basic search tests, dictionary search tests and (the new) struct search tests are put in separate source files. This is easier to access and future maintainance. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - David Wendt (https://github.com/davidwendt) - Keith Kraus (https://github.com/kkraus14) URL: #7865
…` is non-numeric dtype (#7897) Pandas interprets `idx` in the expression `sr[idx]` as an absolute position in the series `sr` when `idx`'s `dtype` is different from that of `sr`'s `Index`. In Pandas, the indexing takes both an integer and a string as the index: ``` >>> import pandas as pd >>> x = pd.Series([1,2,3], index=pd.Index(["a", "b", "c"])) >>> x["b"] 2 >>> x[1] 2 ``` Whereas cuDF treats `idx `as a value to look up in `sr`'s Index, which can lead to different behaviors when indices have non-integral dtypes: ``` >>> import cudf >>> x = cudf.Series([1,2,3], index=cudf.Index(["a", "b", "c"])) >>> x["b"] 2 >>> x[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/series.py", line 921, in __getitem__ return self.loc[arg] File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 120, in __getitem__ raise KeyError(arg) KeyError: 1 ``` This PR fixes the mismatch behavior in cuDF by deferring to `iloc` when a Series has a non-numerical Index and the indexer `idx `is an integer-like value ` : int, cudf Scalar, numpy int [np.int8, np.uint32, int64 `,,,] Fixes: #7622 Replaces: #7775 Authors: - Sheilah Kirui (https://github.com/skirui-source) Approvers: - Michael Wang (https://github.com/isVoid) - Keith Kraus (https://github.com/kkraus14) URL: #7897
) Refactor combine.cu to split out `join_strings()` and enable `concatenate()` to use `make_strings_children` utility. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jake Hemstad (https://github.com/jrhemstad) URL: #7937
[gpuCI] Forward-merge branch-0.19 to branch-0.20 [skip ci]
This PR is to add an API named `contiguousSplitGroups` in JNI which will split the groups in a table after a `groupby` operation, instead of executing an aggregate on each group, along with its unit tests. This API will be used by some Spark operators ( e.g. Python UDFs ) to process the data group by group. Other changes: - Renames the `AggregateOperation` to `GroupByOperation` which sounds better, since it is retuned from exactly a `groupby` call. - Adds some additional fields to `GroupByOptions` which will be used by native `groupby` to propably achieve a better performance. Signed-off-by: Firestarman <firestarmanllc@gmail.com> Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: #7954
…#7843) The null values in the position column didn't match up to expectations exactly. It can't be directly copied from the exploded column as the exploded column may contain null values that shouldn't be null in the position column. Fixes #7787 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Mark Harris (https://github.com/harrism) - Jason Lowe (https://github.com/jlowe) - Nghia Truong (https://github.com/ttnghia) - Jake Hemstad (https://github.com/jrhemstad) URL: #7843
…7772) Introduces `make_optional_iterator` for nullable column and scalars, as the first step in fixing issues brought up in #6952 and #7573. The iterator produces `thrust::optional<T>` to better represent nullable column elements and scalars. `make_optional_iterator` supports three different `contains_null` modes: - `YES` means that the column supports nulls and has null values, therefore the optional might not contain a value - `NO` means that the column has no null values, therefore the optional will always have a value - `DYNAMIC` defers the assumption of nullability to runtime with the users stating on construction of the iterator if column has nulls. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Paul Taylor (https://github.com/trxcllnt) - David Wendt (https://github.com/davidwendt) URL: #7772
Added support for Decimal/fixed-point column in ORC reader along with test cases. All decimal columns would be read as Decimal64 type column, and if precision is >18, it will loudly fail. This PR also remove couple of options which are of no use after the addition of Decimal support. #7126 Authors: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) Approvers: - Devavret Makkar (https://github.com/devavret) - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #7970
The `cudf::strings::replace_nulls()` is a public API that was replaced by `cudf::replace_nulls()`. The strings one should not be used since the base one handles any column type. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Conor Hoekstra (https://github.com/codereport) - Nghia Truong (https://github.com/ttnghia) - Christopher Harris (https://github.com/cwharris) - Mike Wilson (https://github.com/hyperbolic2346) URL: #7965
closes #4882 Added groupby.product support in both hash and sort groupby. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Jake Hemstad (https://github.com/jrhemstad) - https://github.com/brandon-b-miller URL: #7763
`DOCKER_IMAGE` is out of date since #7953 was merged. This fixes that. Authors: - Conor Hoekstra (https://github.com/codereport) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #8013
Closes #6836 Overall reduces test execution time and improves coverage: - Replace around 2.5K test cases with multiple tests that only vary related options. - Correctly verify the output with multicharacter `line_terminator` option (not supported by readers). - Add a `seed` call before the random generator is used in one of the tests. - Simplify a few tests by removing irrelevant comparisons. - Use buffer output instead of file in affected tests (could be applied to many more). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/brandon-b-miller URL: #7851
This enables the quantile method for columns of type `decimal` Authors: - https://github.com/ChrisJar Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #7927
closes #7628 This PR adds support to setting a column in the dataframe when the provided column name is a new column name. The specified rows can be of a single row label, a collection of row labels, or slices. The value-to-set can be column-like object or scalar. E.g. you can now do this: ``` >>> x = cudf.DataFrame() >>> x.loc[:, "a"] = [1, 2, 3] # set a new column with list >>> x a 0 1 1 2 2 3 >>> x.loc[[1, 2], "b"] = ["abc", "cba"] # set part of the new column with list >>> x a b 0 1 <NA> 1 2 abc 2 3 cba >>> x.loc[:, "c"] = 5 # set the new column to the scalar >>> x a b c 0 1 <NA> 5 1 2 abc 5 2 3 cba 5 ``` Authors: - Michael Wang (https://github.com/isVoid) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #8012
Closes #8011 Dask-cuDF currently reads a single stripe to infer metadata in `read_orc`. When the first path corresponds to an empty file, there is no stripe "0" to read. This PR includes a simple fix (and test coverage). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: #8021
…/writer (#7805) Issue #7287 Replaces `device_vector` with `device_uvector` and `device_span`. Removed the `device_vector` data members. Performance impact: - Writer: None - Reader: ~up to 10% slower, will look into this.~ None Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Devavret Makkar (https://github.com/devavret) - Kumar Aatish (https://github.com/kaatish) - Mark Harris (https://github.com/harrism) URL: #7805
Fixes: #8323 Also fixes a recently introduced bug in the test column equality checker. The code was previously relying on accesses to device memory being transparently handled by `thrust::device_vector` Authors: - https://github.com/nvdbaranec Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Devavret Makkar (https://github.com/devavret) - Nghia Truong (https://github.com/ttnghia) URL: #8350
Fixes: #8200 This PR adds support for merging b/w categorical data by implementing `union_categoricals_dispatch` in `dask-cudf`. This PR is dependent on dask upstream changes: dask/dask#7699 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Keith Kraus (https://github.com/kkraus14) - Vibhu Jawa (https://github.com/VibhuJawa) - Ashwin Srinath (https://github.com/shwina) URL: #8332
# Summary Support space in WORKSPACE Authors: - Joseph (https://github.com/jolorunyomi) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/GaryShen2008 URL: #7956
This fixes an issue where the precision assigned to the result of a decimal binary operation may exceed the maximum precision. Closes #8291 Authors: - https://github.com/ChrisJar Approvers: - Michael Wang (https://github.com/isVoid) - Christopher Harris (https://github.com/cwharris) - Nghia Truong (https://github.com/ttnghia) URL: #8194
Fixes the rolling-window part of #7611. All the rolling window functions return empty results when the input aggregation column is empty, just as they should. But the column types are incorrectly set to match the input type. While this is alright for `[MIN(), MAX(), LEAD(), LAG()]`, it is incorrect for some aggregations: Aggregation | Input Types | Output Type | --------------|----------------------|-----------------------------------| COUNT_VALID | All types | INT32 | COUNT_ALL | All types | INT32 | ROW_NUMBER | All types | INT32 | SUM | Numerics (e.g. INT8) | 64-bit promoted type (e.g. INT64) | SUM | Chrono | Same as input type | SUM | All else | Unsupported | MEAN | Numerics | FLOAT64 | MEAN | Chrono | FLOAT64 | MEAN | All else | Unsupported | COLLECT_LIST | All types T | LIST with child of type T | This mapping is congruent with `cudf::target_type_t` from `<cudf/detail/aggregation/aggregation.hpp>`. This commit corrects the type of the output column that results from an empty input. It adds test for all the combinations listed above. Note: This is dependent on #8158, and should be merged after that is committed. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec - Vyas Ramasubramani (https://github.com/vyasr) URL: #8274
Allow installing Dask + Distributed 2021.05.1 to be installed. ~~This isn't released yet, but this tees things up so we are ready to go once it comes out.~~ Associated integration PR ( rapidsai/integration#284 ) Authors: - https://github.com/jakirkham Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Benjamin Zaitlen (https://github.com/quasiben) - Christopher Harris (https://github.com/cwharris) - Jordan Jacobelli (https://github.com/Ethyling) URL: #8392
Codecov Report
@@ Coverage Diff @@
## main #8418 +/- ##
==========================================
+ Coverage 82.30% 82.83% +0.53%
==========================================
Files 101 109 +8
Lines 17053 17896 +843
==========================================
+ Hits 14035 14824 +789
- Misses 3018 3072 +54
Continue to review full report at Codecov.
|
[REVIEW] Pin dask for ci in `21.06`
Author: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #8446
❄️ Code freeze for
branch-21.06
and v21.06 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-21.06
until release (merging of this PR).What is the purpose of this PR?
branch-21.06
intomain
for the release