-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v0.18 #7405
[RELEASE] cudf v0.18 #7405
Commits on Nov 24, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 80464ce - Browse repository at this point
Copy the full SHA 80464ceView commit details
Commits on Nov 30, 2020
-
Add a cmake option to link to GDS/cuFile (#6847)
Add a cmake find module to locate cuFile. If found, add the include directory and link to the shared library. This shouldn't have any effect if cuFile is not installed locally.
Configuration menu - View commit details
-
Copy full SHA for 0e94bab - Browse repository at this point
Copy the full SHA 0e94babView commit details
Commits on Dec 1, 2020
-
Merge pull request #6866 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 2ed7e13 - Browse repository at this point
Copy the full SHA 2ed7e13View commit details -
Merge pull request #6867 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for a091304 - Browse repository at this point
Copy the full SHA a091304View commit details
Commits on Dec 2, 2020
-
Merge pull request #6874 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for c0e03d6 - Browse repository at this point
Copy the full SHA c0e03d6View commit details -
Merge pull request #6876 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 018d036 - Browse repository at this point
Copy the full SHA 018d036View commit details -
Merge pull request #6877 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 7aa3863 - Browse repository at this point
Copy the full SHA 7aa3863View commit details -
Merge pull request #6878 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 36c03a5 - Browse repository at this point
Copy the full SHA 36c03a5View commit details -
Merge pull request #6879 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 48adcc0 - Browse repository at this point
Copy the full SHA 48adcc0View commit details -
Merge pull request #6880 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 36d5205 - Browse repository at this point
Copy the full SHA 36d5205View commit details
Commits on Dec 3, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 536d23a - Browse repository at this point
Copy the full SHA 536d23aView commit details -
Merge pull request #6890 from kkraus14/fix_automerge
Keith Kraus authoredDec 3, 2020 Configuration menu - View commit details
-
Copy full SHA for 737e715 - Browse repository at this point
Copy the full SHA 737e715View commit details -
Merge pull request #6896 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 3d80bb8 - Browse repository at this point
Copy the full SHA 3d80bb8View commit details
Commits on Dec 4, 2020
-
Merge pull request #6900 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for c6f39b1 - Browse repository at this point
Copy the full SHA c6f39b1View commit details -
Merge pull request #6904 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 009c307 - Browse repository at this point
Copy the full SHA 009c307View commit details -
Merge pull request #6906 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for dd6cf15 - Browse repository at this point
Copy the full SHA dd6cf15View commit details -
Merge pull request #6910 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 8c8e05f - Browse repository at this point
Copy the full SHA 8c8e05fView commit details -
Merge pull request #6913 from rapidsai/branch-0.17
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Configuration menu - View commit details
-
Copy full SHA for 522103d - Browse repository at this point
Copy the full SHA 522103dView commit details
Commits on Dec 6, 2020
-
Implement DataFrame.quantile for datetime and timedelta data types(#6902
) This implements the `non_numeric` argument for `DataFrame.quantile` meaning that it now works on `datetime` and `timedelta` data. However, because of the difference in how `DataFrame.iloc` behaves between Pandas and cuDF, this implementation returns a DataFrame when `non_numeric=False` even when Pandas returns a Series Passes tests locally This closes #6799 Authors: - Chris Jarrett <cjarrett@dt08.aselab.nvidia.com> - ChrisJar <chris.jarrett.0@gmail.com> Approvers: - Keith Kraus URL: #6902
Configuration menu - View commit details
-
Copy full SHA for 214dccc - Browse repository at this point
Copy the full SHA 214dcccView commit details
Commits on Dec 7, 2020
-
Fix rmm_mode=managed parameter for gtests(#6912)
When using parameter `--rmm_mode=managed` for gtests `Invalid RMM allocation mode: managed` exception is thrown. The logic in `include/cudf_test/base_fixture.hpp` is just missing a return statement. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Paul Taylor - Mark Harris URL: #6912
Configuration menu - View commit details
-
Copy full SHA for 598a14d - Browse repository at this point
Copy the full SHA 598a14dView commit details
Commits on Dec 8, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 917759b - Browse repository at this point
Copy the full SHA 917759bView commit details -
Fix
columns
&index
handling in dataframe constructor(#6838)Configuration menu - View commit details
-
Copy full SHA for f6b16ab - Browse repository at this point
Copy the full SHA f6b16abView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9120992 - Browse repository at this point
Copy the full SHA 9120992View commit details -
Update to official libcu++ on Github(#6275)
Update to libcu++ on Github. Authors: - ptaylor <paul.e.taylor@me.com> - Paul Taylor <paul.e.taylor@me.com> Approvers: - Mark Harris - Keith Kraus - Christopher Harris - Mark Harris URL: #6275
Configuration menu - View commit details
-
Copy full SHA for 78f9789 - Browse repository at this point
Copy the full SHA 78f9789View commit details -
Remove **kwargs from string/categorical methods(#6750)
This PR removes `**kwargs` from the string/categorical accessors where unnecessary, and exposes keyword arguments like `inplace` to the user directly. If we want to maintain parity with Pandas APIs for Dask/others using cuDF internally, we can consider using the approach described in #6135, which will automatically raise `NotimplementedError` when unsupported kwargs are passed. Authors: - Ashwin Srinath <shwina@users.noreply.github.com> Approvers: - GALI PREM SAGAR - Keith Kraus - Keith Kraus URL: #6750
Configuration menu - View commit details
-
Copy full SHA for 8a1a6d7 - Browse repository at this point
Copy the full SHA 8a1a6d7View commit details
Commits on Dec 9, 2020
-
Fix N/A detection for empty fields in CSV reader(#6922)
Fixes #6682, #6680 Currently, empty fields are treated as N/A regardless on parsing options. However, the desired behavior is to handle empty fields the same way as fields with special values (apply default_na_values, na_filter logic). This PR irons out the behavior so it matches Pandas in this regard. - Tries now support matching empty strings. - The list of special NA values is now generated more robustly, so it has correct elements in any parameter combination. - Empty string is added to the list of special NA values. - Empty string string ("/"/"") is added to NA value list if empty string ("") is included (mirrors Pandas behavior). - Added tests for previously failing parameter combinations. - Reworked some of the tests to check against Pandas results instead of assumed desired behavior. Authors: - vuule <vmilovanovic@nvidia.com> - vuule <vukasin.milovanovic.87@gmail.com> - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com> - Vukasin Milovanovic <vmilovanovic@nvidia.com> Approvers: - Ram (Ramakrishna Prabhu) - Christopher Harris - Keith Kraus URL: #6922
Configuration menu - View commit details
-
Copy full SHA for 17c8f97 - Browse repository at this point
Copy the full SHA 17c8f97View commit details -
fix libcu++ include path for jni(#6948)
The include directory was renamed from `simt` to `cuda`. Authors: - Rong Ou <rong.ou@gmail.com> Approvers: - Jason Lowe URL: #6948
Configuration menu - View commit details
-
Copy full SHA for 83b1851 - Browse repository at this point
Copy the full SHA 83b1851View commit details -
Fix cudf::merge gtest for dictionary columns(#6942)
The `cudf::merge` API expects the key columns to be sorted. This means that if null rows are included, these null entries should all appear either at beginning or at the end of the column depending on the null_order for the sort. The `MergeDictionaryTest.WithNull` gtest placed null rows in the middle of the column. The expected results should also have included null entries at the beginning or the end. This PR also includes an extra test for checking merge results are consistent with the sort parameters `cudf::order` and `cudf::null_order`. This test also includes a larger number of rows to ensure `thrust::merge` requires more than one tile/block in its runtime logic. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Ram (Ramakrishna Prabhu) - Vukasin Milovanovic URL: #6942
Configuration menu - View commit details
-
Copy full SHA for b45fd4d - Browse repository at this point
Copy the full SHA b45fd4dView commit details -
Update Java bindings version to 0.18-SNAPSHOT(#6949)
Updating the Java bindings package version to match the libcudf version. Authors: - Jason Lowe <jlowe@nvidia.com> Approvers: - Robert (Bobby) Evans URL: #6949
Configuration menu - View commit details
-
Copy full SHA for 44eeb70 - Browse repository at this point
Copy the full SHA 44eeb70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d230ee - Browse repository at this point
Copy the full SHA 6d230eeView commit details -
Add in basic support to JNI for logical_cast(#6954)
This exposes `logical_cast` through a JNI API. I also updated some of the test code to take a ColumnView instead of a ColumnVector so I could test it more easily. Authors: - Robert (Bobby) Evans <bobby@apache.org> Approvers: - Jason Lowe URL: #6954
Configuration menu - View commit details
-
Copy full SHA for a301e65 - Browse repository at this point
Copy the full SHA a301e65View commit details
Commits on Dec 10, 2020
-
Use simplified
rmm::exec_policy
(#6939)Updates libcudf to use the new, simplified `rmm::exec_policy` and include the new refactored headers `rmm/exec_policy.hpp` and `rmm/device_vector.hpp` The new `exec_policy` can be passed directly to Thrust, no longer any need to call `rmm::exec_policy(stream)->on(stream)`. Depends on rapidsai/rmm#647
Configuration menu - View commit details
-
Copy full SHA for f117b68 - Browse repository at this point
Copy the full SHA f117b68View commit details -
Fix type comparison for java(#6970)
As a part of trying to support upper and lower bounds for decimal I found that type checking for this function was broken because it used `==` for equality instead of `.equals`. Looking further I found a few other places where this was a bug (one in ColumnVector that is mostly a performance issue and one in Scalar) I decided to update all of the code to use .equals for comparison of types to make it consistent so it is less likely to have bugs like this crop up in the future. I also took the opportunity to internally move away from using `isTimestamp` (which is deprecated) to `isTimestampType` Authors: - Robert (Bobby) Evans <bobby@apache.org> Approvers: - Jason Lowe URL: #6970
Configuration menu - View commit details
-
Copy full SHA for dc05261 - Browse repository at this point
Copy the full SHA dc05261View commit details -
Add Java bindings for URL conversion(#6972)
Adding Java bindings for the `url_decode` and `url_encode` functions. Authors: - Jason Lowe <jlowe@nvidia.com> Approvers: - Robert (Bobby) Evans - Kuhu Shukla URL: #6972
Configuration menu - View commit details
-
Copy full SHA for d028db6 - Browse repository at this point
Copy the full SHA d028db6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 89938fa - Browse repository at this point
Copy the full SHA 89938faView commit details -
Align
Series.groupby
API to match Pandas(#6964)Currently we're missing a few kwargs in `Series.groupby` which is causing issues due to the dask change in dask/dask#6854 Adds the missing kwargs and validates that we support the values passed in. Authors: - Keith Kraus <keith.j.kraus@gmail.com> Approvers: - GALI PREM SAGAR - Michael Wang URL: #6964
Keith Kraus authoredDec 10, 2020 Configuration menu - View commit details
-
Copy full SHA for f965d9a - Browse repository at this point
Copy the full SHA f965d9aView commit details
Commits on Dec 11, 2020
-
Configuration menu - View commit details
-
Copy full SHA for ea9c689 - Browse repository at this point
Copy the full SHA ea9c689View commit details -
Fix groupby agg/apply behaviour when no key columns are provided(#6945)
Configuration menu - View commit details
-
Copy full SHA for b136469 - Browse repository at this point
Copy the full SHA b136469View commit details -
Make
cudf::round
forfixed_point
whenscale = -decimal_places
a…… no-op(#6975) @nartal1 found a small bug while working on: NVIDIA/spark-rapids#1244 Problem is that for `fixed_point`, when the column `scale = -decimal_places`, it should be a no-op. Fix is to make it a no-op. Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - David - Karthikeyan URL: #6975
Configuration menu - View commit details
-
Copy full SHA for 13acc98 - Browse repository at this point
Copy the full SHA 13acc98View commit details -
Remove duplicate file array_tests.cpp(#6953)
array_tests.cu with same content in same directory exists. (can't convert to .cpp because .cuh is included in array_tests.cu and template source code is tested) Also, array_tests.cpp is not referred in `cpp/tests/CMakeLists.txt` Authors: - Karthikeyan <6488848+karthikeyann@users.noreply.github.com> Approvers: - David - Vukasin Milovanovic URL: #6953
Configuration menu - View commit details
-
Copy full SHA for d842327 - Browse repository at this point
Copy the full SHA d842327View commit details -
Add groupby idxmin, idxmax aggregation(#6856)
Addresses groupby part of #2188 - [x] Add cython interfaces for aggregation argmin, argmax as idxmin, idxmax - [x] unit tests Authors: - Karthikeyan Natarajan <karthikeyann@users.noreply.github.com> - Karthikeyan <6488848+karthikeyann@users.noreply.github.com> Approvers: - David - Ram (Ramakrishna Prabhu) - GALI PREM SAGAR - Jake Hemstad URL: #6856
Configuration menu - View commit details
-
Copy full SHA for 2b656b0 - Browse repository at this point
Copy the full SHA 2b656b0View commit details -
Add
replace_null
API withreplace_policy
parameter,fixed_width
…… column support(#6907) Part 1 for issue #1361 - Adds `PRECEDING` and `FOLLOWING` options to `replace_nulls` in `libcudf`. This PR provides support for `fixed_width_type` type columns. - Adds Cython binding Authors: - Michael Wang <michaelwang0905@gmail.com> - Michael Wang <isVoid@users.noreply.github.com> Approvers: - Ashwin Srinath - Jake Hemstad - Mark Harris - Mark Harris URL: #6907
Configuration menu - View commit details
-
Copy full SHA for f2b9a36 - Browse repository at this point
Copy the full SHA f2b9a36View commit details -
Minor
cudf::round
internal refactoring(#6976)This is a small cleanup that replaces a `cudf::binary_operation` with a much cleaner `cudf::cast`. Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - Vukasin Milovanovic - Mark Harris URL: #6976
Configuration menu - View commit details
-
Copy full SHA for c017cb4 - Browse repository at this point
Copy the full SHA c017cb4View commit details -
Add null mask
fixed_point_column_wrapper
constructors(#6951)Currently has changes for #6950 included. The full set of null mask `fixed_point_column_wrapper` constructors aren't supported. This PR adds them all and also adds unit tests for each of them across difference `fixed_point` API tests. **To Do List:** * [x] Add constructors * [x] Add basic unit test * [x] Add all unit tests * [x] Update docs Authors: - Mark Harris <mharris@nvidia.com> - Conor Hoekstra <codereport@outlook.com> Approvers: - null - Vukasin Milovanovic URL: #6951
Configuration menu - View commit details
-
Copy full SHA for 252f478 - Browse repository at this point
Copy the full SHA 252f478View commit details -
Fix default parameter values of
write_csv
andwrite_parquet
(#6967)Fixes #6671, #6851 - Set the `rows_per_chunk` in `csv_writer_options` to the size of the input table. - Change `rows_per_chunk` type to `size_type` (used for number of rows). - Set the default compression in `to_parquet`/`write_parquet` to "snappy". Authors: - vuule <vmilovanovic@nvidia.com> Approvers: - Keith Kraus - Conor Hoekstra - Ram (Ramakrishna Prabhu) - Mark Harris URL: #6967
Configuration menu - View commit details
-
Copy full SHA for df5d452 - Browse repository at this point
Copy the full SHA df5d452View commit details -
Fix java cufile tests when cufile is not installed(#6987)
Just disables the tests when cufile is not installed. Authors: - Robert (Bobby) Evans <bobby@apache.org> Approvers: - Kuhu Shukla URL: #6987
Configuration menu - View commit details
-
Copy full SHA for 3c15d30 - Browse repository at this point
Copy the full SHA 3c15d30View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5735da5 - Browse repository at this point
Copy the full SHA 5735da5View commit details -
Merge pull request #6995 from shwina/branch-0.18-merge-0.17
Keith Kraus authoredDec 11, 2020 Configuration menu - View commit details
-
Copy full SHA for 4c26155 - Browse repository at this point
Copy the full SHA 4c26155View commit details -
Avoid inserting null elements into join hash table when nulls are tre…
…ated as unequal(#6943) This change mirrors what is done in `groupby` to eliminate null-containing columns from the join hash table if nulls not equal is set. This prevents absolute runaway of the process. I added benchmarks for joins with nulls and I can't even get it to finish without these changes. The 195ms test without nulls takes 2,000,000ms to complete and the larger tests I haven't had the patience to even see complete. With this change, the timings are faster than without nulls proportional to the % of nulls. Meaning half the table is nulls means the query is twice as fast as the non-null version, which makes sense. closes #6052 Authors: - Mike Wilson <knobby@burntsheep.com> - Mike Wilson <hyperbolic2346@users.noreply.github.com> Approvers: - Jake Hemstad - Jake Hemstad - null - Mark Harris URL: #6943
Configuration menu - View commit details
-
Copy full SHA for ab8c931 - Browse repository at this point
Copy the full SHA ab8c931View commit details
Commits on Dec 12, 2020
-
Fix timestamp parsing in ORC reader for timezones without transitions(#…
…6959) Fixes #6947 When TZif file has no transitions (e.g. GMT), `build_timezone_transition_table` has an out-of-bounds read that leads to undefined behavior and intermittent issues. This PR makes two changes to behavior: 1. When there are no transitions, the ancient rule is initialized from the first time offset (instead of the first transition rule, which does not exist in this case). 2. When there are no transitions and the time offset is zero, an empty table is returned (avoid using a no-op table in CUDA). Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com> Approvers: - GALI PREM SAGAR - null - Ram (Ramakrishna Prabhu) - David URL: #6959
Configuration menu - View commit details
-
Copy full SHA for 929c3f4 - Browse repository at this point
Copy the full SHA 929c3f4View commit details
Commits on Dec 13, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 2ede7df - Browse repository at this point
Copy the full SHA 2ede7dfView commit details -
Disable some pragma unroll statements in thrust sort.h(#6982)
Configuration menu - View commit details
-
Copy full SHA for b986220 - Browse repository at this point
Copy the full SHA b986220View commit details -
Fix nullmask offset handling in parquet and orc writer(#6889)
Configuration menu - View commit details
-
Copy full SHA for 8dbaa2f - Browse repository at this point
Copy the full SHA 8dbaa2fView commit details -
Pass numeric scalars of the same dtype through numeric binops(#6938)
Allows for scalars of the same dtype as a column to be passed along a fast codepath to libcudf, instead of being inspected to reduce their dtype beforehand. Authors: - brandon-b-miller <brmiller@nvidia.com> - GALI PREM SAGAR <sagarprem75@gmail.com> Approvers: - GALI PREM SAGAR URL: #6938
Configuration menu - View commit details
-
Copy full SHA for b0cb9db - Browse repository at this point
Copy the full SHA b0cb9dbView commit details
Commits on Dec 14, 2020
-
Fix Thrust unroll patch command(#7002)
PR #6982 added a `PATCH_COMMAND` when fetching Thrust to remove unrolling in `thrust::sort`, thereby improving compile time and performance in some cases. But the command failed on local builds from source (At least on my machine under rapids-compose). This PR simplifies the command. Authors: - Mark Harris <mharris@nvidia.com> Approvers: - Keith Kraus URL: #7002
Configuration menu - View commit details
-
Copy full SHA for 29c0af1 - Browse repository at this point
Copy the full SHA 29c0af1View commit details -
Check output size overflow on strings gather(#6997)
Closes #6801 This PR adds an extra reduce call in the libcudf gather specialization logic for strings column. This will check to make sure the output size of the gather does not exceed the size limit for the child characters column. The offsets column is first created with the individual output string sizes. Then the reduce call will add these sizes to check for overflow. Also added a gtest to check for the overflow condition. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Devavret Makkar - Karthikeyan URL: #6997
Configuration menu - View commit details
-
Copy full SHA for f37d42d - Browse repository at this point
Copy the full SHA f37d42dView commit details -
fix excluding cufile tests by default(#6988)
I think this is how I started and verified it was working, but then I was trying to exclude the source as well, which didn't work for tests. Then I realized we need the source built for the plugin so remove it. Anyway, no presubmit CI check is really painful. :( @revans2 ```console $ mvn test ... [WARNING] Tests run: 775, Failures: 0, Errors: 0, Skipped: 4 $ mvn test -DUSE-GDS=ON ... [WARNING] Tests run: 777, Failures: 0, Errors: 0, Skipped: 4 ``` Authors: - Rong Ou <rong.ou@gmail.com> Approvers: - Robert (Bobby) Evans URL: #6988
Configuration menu - View commit details
-
Copy full SHA for 2a2b4d6 - Browse repository at this point
Copy the full SHA 2a2b4d6View commit details -
Remove warning in from_dlpack and to_dlpack methods(#7001)
Fix #6926 . Hi! When invoking from_dlpack() and to_dlpack, the following warnings are displayed: from_dlpack() ``` /opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/dlpack.py:33: UserWarning: WARNING: cuDF from_dlpack() assumes column-major (Fortran order) input. If the input tensor is row-major, transpose it before passing it to this function. res = libdlpack.from_dlpack(pycapsule_obj) ``` to_dlpack() ``` /opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/dlpack.py:74: UserWarning: WARNING: cuDF to_dlpack() produces column-major (Fortran order) output. If the output tensor needs to be row major, transpose the output of this function. return libdlpack.to_dlpack(gdf_cols) ``` I think those warnings should be removed, because it contains information that should be available in the API documentation, and not necessarily displayed each time the methods are invoked. Some users, like me, love to have their notebooks/code without warnings. Even if it is possible to disable those warnings, I think the user should not go that way, because the warning is just repeating what the API documentation should cover. Hope it helps! Miguel Authors: - Miguel Martínez <26169771+miguelusque@users.noreply.github.com> - GALI PREM SAGAR <sagarprem75@gmail.com> Approvers: - GALI PREM SAGAR - Ram (Ramakrishna Prabhu) URL: #7001
Configuration menu - View commit details
-
Copy full SHA for a5515f2 - Browse repository at this point
Copy the full SHA a5515f2View commit details
Commits on Dec 15, 2020
-
Add null count test for apply_boolean_mask(#6903)
This PR adds a test to exercise the issue described in #6733. This issue was only reproduced on a laptop Pascal GPU, but I think it's a good test to have. In summary, `copy_if`, used by `apply_boolean_mask` computes the output null count during as part of its custom scatter kernel, rather than using `cudf::count_unset_bits`. #6733 describes an issue where the former is different from the latter. So it's good to have a test that verifies they get the same null count. And since it's difficult to get a repro on a similar machine, this is a first step. Authors: - Mark Harris <mharris@nvidia.com> - Keith Kraus <kkraus@nvidia.com> Approvers: - Karthikeyan - Devavret Makkar URL: #6903
Configuration menu - View commit details
-
Copy full SHA for 15f9530 - Browse repository at this point
Copy the full SHA 15f9530View commit details -
Skip Thrust sort patch if already applied(#7009)
#7002 attempted to fix the temporary Thrust sort patch introduced in #6982 which didn't work with CMake 3.19+. This PR updates the thirdparty CMakeLists.txt file to continue if the Thrust sort patch has already been applied. Today, the first time cmake is run, the Thrust sort.h is patched. But if cmake is run again without cleaning the build directory, the build will fail, because the file has already been patched. @trxcllnt showed us the correct `patch` incantation to ignore the patch if already applied. CC @davidwendt Authors: - Mark Harris <mharris@nvidia.com> Approvers: - Paul Taylor - Keith Kraus URL: #7009
Configuration menu - View commit details
-
Copy full SHA for 515a173 - Browse repository at this point
Copy the full SHA 515a173View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d1b076 - Browse repository at this point
Copy the full SHA 6d1b076View commit details -
Configuration menu - View commit details
-
Copy full SHA for b370963 - Browse repository at this point
Copy the full SHA b370963View commit details -
Implement cudf.DateOffset for months(#6775)
Implements `cudf.DateOffset` - an object used for calendrical arithmetic, similar to pandas.DateOffset - for month units only. Closes #6754 Authors: - brandon-b-miller <brmiller@nvidia.com> - brandon-b-miller <53796099+brandon-b-miller@users.noreply.github.com> - Keith Kraus <kkraus@nvidia.com> Approvers: - GALI PREM SAGAR - Keith Kraus - Keith Kraus URL: #6775
Configuration menu - View commit details
-
Copy full SHA for 1963111 - Browse repository at this point
Copy the full SHA 1963111View commit details
Commits on Dec 16, 2020
-
Add pytest-xdist to dev environment.yml(#6958)
Resolves: #6370 This PR enables the parallel execution of pytests of `cudf`, `dask_cudf` & `custreamz` in CI. The changes also include adding `pytest-xdist` to dev environments. With these changes, here is the change in pytest execution times in CI: | module | without pytest-xdist | with pytest-xdist(n=6) | | ----------- | ----------- | -----------| | cudf | 1 hr | 14 min | | dask_cudf | 4 min | 1 min | | custreamz | 6 min | 2 min | Related Integration changes: rapidsai/integration#188 Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - AJ Schmidt - Keith Kraus URL: #6958
Configuration menu - View commit details
-
Copy full SHA for 8c1f01e - Browse repository at this point
Copy the full SHA 8c1f01eView commit details -
Pin librdkakfa to gcc 7 compatible version (#7021)
`librdkakfa` 1.5.3 required gcc 9.3 (https://anaconda.org/conda-forge/librdkafka/files?version=1.5.3) and `libcudf_kakfa` (https://anaconda.org/rapidsai-nightly/libcudf_kafka/files?version=0.18.0a201215) is being built requiring 1.5.3 which is not compatible with the rest of RAPIDS. Authors: - Ray Douglass Approvers: - Mike Wendt
Configuration menu - View commit details
-
Copy full SHA for 6bc71c8 - Browse repository at this point
Copy the full SHA 6bc71c8View commit details -
Fix loc behaviour when key of incorrect type is used(#6993)
* Fixes #6823 * Raise a `KeyError` similar to Pandas rather than an `IndexError` when loc fails * Improve tests to compare more directly with Pandas behaviour Authors: - Ashwin Srinath <shwina@users.noreply.github.com> Approvers: - Ram (Ramakrishna Prabhu) - Michael Wang - GALI PREM SAGAR URL: #6993
Configuration menu - View commit details
-
Copy full SHA for 7ca3fad - Browse repository at this point
Copy the full SHA 7ca3fadView commit details
Commits on Dec 17, 2020
-
Fix round operator's HALF_EVEN computation for negative integers(#7014)
Found a small bug while working on NVIDIA/spark-rapids#1244. For negative integers, it was not rounding to nearest even number. Authors: - Niranjan Artal <nartal@nvidia.com> - Conor Hoekstra <codereport@outlook.com> Approvers: - Conor Hoekstra - Mark Harris URL: #7014
Configuration menu - View commit details
-
Copy full SHA for e5d3742 - Browse repository at this point
Copy the full SHA e5d3742View commit details -
Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2)(#6980)This PR resolves a part of #3556. Supporting `cudf::reduce`: 1. Part 1 (`MIN`, `MAX`, `SUM` & `PRODUCT` & `NUNIQUE`) #6814 2. Part 2 (the rest)
◀️ **Reduction Ops:** **Done in Previous PR** ✔️ `SUM, ///< sum reduction` ✔️ `PRODUCT, ///< product reduction` ✔️ `MIN, ///< min reduction` ✔️ `MAX, ///< max reduction` ✔️ `NUNIQUE, ///< count number of unique elements` **Not supported by `cudf::reduce`:** * [x] `COUNT_VALID, ///< count number of valid elements` * [x] `COUNT_ALL, ///< count number of elements` * [x] `COLLECT, ///< collect values into a list` * [x] `LEAD, ///< window function, accesses row at specified offset following current row` * [x] `LAG, ///< window function, accesses row at specified offset preceding current row` * [x] `PTX, ///< PTX UDF based reduction` * [x] `CUDA ///< CUDA UDf based reduction` * [x] `ARGMAX, ///< Index of max element` * [x] `ARGMIN, ///< Index of min element` * [x] `ROW_NUMBER, ///< get row-number of element` **Won't be supported:** * [x] `ANY, ///< any reduction` * [x] `ALL, ///< all reduction` **To Do / Investigate:** * [x] `SUM_OF_SQUARES, ///< sum of squares reduction` * [x] `MEDIAN, ///< median reduction` * [x] `QUANTILE, ///< compute specified quantile(s)` * [x] `NTH_ELEMENT, ///< get the nth element` **Deferred until requested** * [x] `MEAN, ///< arithmetic mean reduction` * [x] `VARIANCE, ///< groupwise variance` * [x] `STD, ///< groupwise standard deviation` Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - null - Karthikeyan - David URL: #6980Configuration menu - View commit details
-
Copy full SHA for ae17c14 - Browse repository at this point
Copy the full SHA ae17c14View commit details -
Extend
replace_nulls_policy
tostring
anddictionary
type(#7004)Follow up for PR #6907 - `replace_null` policy function now supports `string` and `dictionary` dtype column. Since original implementation depends only on column validity and index, this extension trivially removes SFINAE on `replace_null` functor and removes `type_dispatcher`. Authors: - Michael Wang <isVoid@users.noreply.github.com> Approvers: - Mark Harris - Karthikeyan URL: #7004
Configuration menu - View commit details
-
Copy full SHA for da60cce - Browse repository at this point
Copy the full SHA da60cceView commit details -
Restore usual instance/subclass checking to cudf.DateOffset(#7029)
`pd.DateOffset` uses a metaclass that overrides the usual instance/subclass checking behaviour. Any subclass of `pd._libs.tslibs.offsets.BaseOffset` will be reported as a subclass of `pd.DateOffset` (itself a `pd.DateOffset`). This can lead to some surprising behaviour: ```python In [3]: isinstance(pd.DateOffset(), cudf.DateOffset) Out[3]: True ``` Note that `cudf.DateOffset` inherits from `pd.DateOffset`. But, a `pd.DateOffset` is reported as an instance of `cudf.DateOffset` -- [Child Is Father of the Man](https://en.wikipedia.org/wiki/Child_Is_Father_of_the_Man)! Authors: - Ashwin Srinath <shwina@users.noreply.github.com> Approvers: - GALI PREM SAGAR URL: #7029
Configuration menu - View commit details
-
Copy full SHA for 1c8f2a8 - Browse repository at this point
Copy the full SHA 1c8f2a8View commit details -
Add compression="infer" as default for dask_cudf.read_csv(#7013)
Closes #6850 dask_cudf version of the `dask.dataframe` changes proposed in [dask#6960](dask/dask#6960). Uses `fsspec` to infer the default `compression` argument from the suffix of the first file-path argument. Authors: - rjzamora <rzamora217@gmail.com> Approvers: - Keith Kraus URL: #7013
Configuration menu - View commit details
-
Copy full SHA for 8c8c421 - Browse repository at this point
Copy the full SHA 8c8c421View commit details -
Refactor rolling.cu to reduce compile time(#6512)
Closes #6472. `rolling.cu` is taking inordinately long to compile, slowing down the `libcudf` build. The following changes were made to mitigate this: 1. Moved `grouped_rolling_window()` and `grouped_time_based_rolling_window()` to `grouped_rolling.cu`. Common functions were moved to `rolling_detail.cuh`. 2. Normalized timestamp columns to use int64_t representations. This reduces the number of template instantiations for `time_based_grouped_rolling_window()`. 3. `grouped_*_rolling_window()` functions used to pass around fancy iterators, causing massive template instantiations. This has been changed to materialize the window offsets as separate columns, and use those with existing `rolling_window()` functions to produce the final result. These changes have been tested by running a window function test from SparkSQL, over a 2.4GB ORC file with 155M records (1.5M groups of about 97 records each on average): 1. There has been no discernible change in the end-to-end runtime. (The `nsys` profile seems to indicate that the total time spent in the `gpu_rolling` kernel has reduced. This is still being examined, to confirm.) 2. Compiling `rolling.cu` and `grouped_rolling.cu` in parallel now takes 60s as opposed to about 300s before. 3. The object file size seems to have reduced by a factor of 3. Authors: - Mithun RK <mythrocks@gmail.com> Approvers: - Vukasin Milovanovic - Karthikeyan URL: #6512
Configuration menu - View commit details
-
Copy full SHA for ce21296 - Browse repository at this point
Copy the full SHA ce21296View commit details -
Decimal casts in JNI became a NOOP(#7032)
We compared the wrong thing on a cast optimization. This fixes that. Authors: - Robert (Bobby) Evans <bobby@apache.org> Approvers: - Jason Lowe - Alessandro Bellina URL: #7032
Configuration menu - View commit details
-
Copy full SHA for c16a0a5 - Browse repository at this point
Copy the full SHA c16a0a5View commit details -
Add
method
field tofillna
for fixed width columns(#6998)Closes #1361 - Provides "`ffill`" and "`bfill`" `fillna` methods for `Numerical`, `Datetime`, `Timedelta` and `Categorical` type column. - Supports `method` parameter for `Series.fillna` and `DataFrame.fillna` Authors: - Michael Wang <isVoid@users.noreply.github.com> Approvers: - Ashwin Srinath - GALI PREM SAGAR URL: #6998
Configuration menu - View commit details
-
Copy full SHA for 4385f54 - Browse repository at this point
Copy the full SHA 4385f54View commit details -
Correct ORC docstring; other minor cuIO improvements(#7012)
Fixes #6923 Included other minor cuIO improvements that are too small for individual PRs: - Remove unnecessary NaN-related conditions in JSON, CSV. - Expand a comment in `createSerializedTrie` to make initialization clearer. Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com> Approvers: - GALI PREM SAGAR - Karthikeyan - Christopher Harris URL: #7012
Configuration menu - View commit details
-
Copy full SHA for ff56585 - Browse repository at this point
Copy the full SHA ff56585View commit details -
Fix libcudf strings logic where size_type is used to access INT32 col…
…umn data(#7020) I tried experimenting with changing the `cudf::size_type` to `int64_t` and found many, many places that assume `size_type` and `int32_t` (and `int`) are interchangeable. This PR attempts to fix some of the places where offsets column is created as INT32 but the column data is incorrectly referenced as `data<size_type>()` for example. Also, this PR fixes some places that accepts/returns only int32_t (regex internal functions) or size_type (factories) which should be casted or accounted for. This is not a full set of possible violations found but may help minimize future errors. No function has changed/added. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Conor Hoekstra - Devavret Makkar URL: #7020
Configuration menu - View commit details
-
Copy full SHA for 05653ef - Browse repository at this point
Copy the full SHA 05653efView commit details -
Correct the sampling range when sampling with replacement(#6884)
This corrects an issue with the sampling range used when replacement=True. Before, it sampled the range 0 through `num_rows` meaning it could sample `num_rows` even though it's one position out of bounds. This caused sample to return values not present in the original DataFrame. I also created exceptions for sampling on empty DataFrames that match pandas, as well as an exception for sampling when `axis=1` and `replace=True` as cudf does not support DataFrames with duplicate columns. This closes #6532 Authors: - Chris Jarrett <cjarrett@dt08.aselab.nvidia.com> - Mark Harris <mharris@nvidia.com> - ChrisJar <chris.jarrett.0@gmail.com> Approvers: - Keith Kraus - Mark Harris URL: #6884
Configuration menu - View commit details
-
Copy full SHA for 3be4428 - Browse repository at this point
Copy the full SHA 3be4428View commit details
Commits on Dec 18, 2020
-
Add
ffill
andbfill
to string columns(#7036)Follow up of PR #7004 Adds `method` field to `fillna` method in string type column to support `ffill` and `bfill`. Also involves a small change to a `datetime64` `ffill`, `bfill` test case to improve test robustness. Authors: - Michael Wang <isVoid@users.noreply.github.com> Approvers: - GALI PREM SAGAR URL: #7036
Configuration menu - View commit details
-
Copy full SHA for ae90dd9 - Browse repository at this point
Copy the full SHA ae90dd9View commit details -
Fix
read_orc
for decimal type(#7034)The `run_pos` which was being used was from data rather from secondary stream which was for scale, but resulted value was being used for secondary stream `scale`. The code change fixes that issue and also adds test case to cover the issue. closes #7016 Authors: - Ramakrishna Prabhu <ramakrishnap@nvidia.com> Approvers: - Vukasin Milovanovic - GALI PREM SAGAR - Devavret Makkar URL: #7034
Configuration menu - View commit details
-
Copy full SHA for c24171b - Browse repository at this point
Copy the full SHA c24171bView commit details -
Add Ufunc alias look up for appropriate numpy ufunc dispatching(#6973)
This PR closes #6921 by dispatching to appropriate cudf alias for numpy functions from the UFUNC_ALIASES dictionary : ```python _UFUNC_ALIASES = { "power": "pow", "equal": "eq", "not_equal": "ne", "less": "lt", "less_equal": "le", "greater": "gt", "greater_equal": "ge", "absolute": "abs", } ``` Authors: - Vibhu Jawa <vibhujawa@gmail.com> Approvers: - Keith Kraus - null URL: #6973
Configuration menu - View commit details
-
Copy full SHA for 442985a - Browse repository at this point
Copy the full SHA 442985aView commit details -
Fix backward compatibility of loading a 0.16 pkl file(#7033)
Fixes: #7025 This PR: 1. Handles loading of pickle files which have been created with rangeIndex prior to introduction of `step` parameter support. 2. Introduces special-case handling of stringcolumn size where we were previously storing it as a pickled object. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - Ram (Ramakrishna Prabhu) URL: #7033
Configuration menu - View commit details
-
Copy full SHA for 9317361 - Browse repository at this point
Copy the full SHA 9317361View commit details
Commits on Dec 19, 2020
-
Share
factorize
implementation with Index and cudf module(#6885)Share the implementation of `cudf.Series.factorize` with the `Index` class and the `cudf` module namespace. Closes #6871 Authors: - brandon-b-miller <brmiller@nvidia.com> - Keith Kraus <kkraus@nvidia.com> - brandon-b-miller <53796099+brandon-b-miller@users.noreply.github.com> Approvers: - Ashwin Srinath - Keith Kraus URL: #6885
Configuration menu - View commit details
-
Copy full SHA for 923cf49 - Browse repository at this point
Copy the full SHA 923cf49View commit details
Commits on Dec 21, 2020
-
Improve representation of
MultiIndex
(#6992)Fixes: #6936 This PR introduces changes to `MultiIndex.__repr__`, where the output is now more readable and easy to understand similar to that of pandas MultiIndex. Changes also include handling of `<NA>`, `nan` values and spacing issues around them. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - null - Keith Kraus URL: #6992
Configuration menu - View commit details
-
Copy full SHA for 7556e23 - Browse repository at this point
Copy the full SHA 7556e23View commit details
Commits on Dec 23, 2020
-
Make Doxygen comments formatting consistent(#7041)
Making this PR since wrong formatting keeps getting propagated in new PRs and (sometimes) corrected in code review. Changes: - Ironed out the formatting of Doxygen comments to match the guidelines. - Removed the outdated file with formatting examples. Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com> - vukasin <vmilovanovic@nvidia.com> Approvers: - David - Karthikeyan URL: #7041
Configuration menu - View commit details
-
Copy full SHA for 2780a8c - Browse repository at this point
Copy the full SHA 2780a8cView commit details
Commits on Dec 29, 2020
-
Update cudf python docstrings with new null representation (
<NA>
)(#…Configuration menu - View commit details
-
Copy full SHA for 4a1e465 - Browse repository at this point
Copy the full SHA 4a1e465View commit details -
Reduce number of hostdevice_vector allocations in parquet reader(#7005)
Improves performance of parquet reader on certain multi-GPU systems, which take a long time to allocate pinned memory, by reducing the number of `hostdevice_vector` allocations. Closes #7049 Authors: - Devavret Makkar <dmakkar@nvidia.com> Approvers: - null - Ram (Ramakrishna Prabhu) - Karthikeyan URL: #7005
Configuration menu - View commit details
-
Copy full SHA for 277bd9f - Browse repository at this point
Copy the full SHA 277bd9fView commit details
Commits on Dec 31, 2020
-
Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037)This PR resolves a part of #3556. Aggregation ops supported: * `MIN` * `MAX` * `COUNT` (both `null_policy` - `EX/INCLUDE`) * `LEAD` * `LAG` **To Do List:** * [x] Basic unit tests * [x] Comprehensive unit tests * [x] Implementation * [x] Figure out which rolling ops to suppport Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - Vukasin Milovanovic - Ram (Ramakrishna Prabhu) URL: #7037
Configuration menu - View commit details
-
Copy full SHA for 28d18d6 - Browse repository at this point
Copy the full SHA 28d18d6View commit details
Commits on Jan 4, 2021
-
Create sort gbenchmark for strings column(#7040)
Reference #7027 and #5698 This adds a strings column to the current gbenchmark for sort. This will help measure improvements or changes over time to the column and strings comparator functions. No code logic changed or added. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Vukasin Milovanovic - Devavret Makkar - Keith Kraus URL: #7040
Configuration menu - View commit details
-
Copy full SHA for af41136 - Browse repository at this point
Copy the full SHA af41136View commit details -
Fix to_csv delimiter handling of timestamp format(#7023)
Closes #6699 The timestamp format(s) used by the CSV writer have the form `%Y-%m-%dT%H:%M:%SZ`. This means if the column delimiter `','` or the line delimiter `\n` is either `':'` or `'-'` then the timestamp string output could conflict with these delimiters. The current logic simply removed these delimiters from the format if they detected a conflicting column or line delimiter. For example, specifying a dash `'-'` as column delimiter caused the timestamp format to change to `%Y%m%d...` (the dash is removed). I admit this was kind of hacky and also made the output inconsistent with Pandas `to_csv()`. It is easy enough to simply add double-quotes around the timestamp format to prevent these conflicts as well as make the output consistent. This PR fixes that logic. Exception logic to check for a dash as column separator was also found in [csv.py](https://github.com/rapidsai/cudf/blob/8c1f01e1fd713d873cf3d943ab409f3e9efc48f8/python/cudf/cudf/io/csv.py#L139-L149), specifically citing issue 6699 in the exception message. Also, there was a pytest specifically created to check for this exception. The exception is removed and the pytest function updated in this PR as well. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - GALI PREM SAGAR - Karthikeyan - null URL: #7023
Configuration menu - View commit details
-
Copy full SHA for ca1a4d6 - Browse repository at this point
Copy the full SHA ca1a4d6View commit details -
cudf::scan
support fordecimal32
anddecimal64
(#7063)Adding support for `cudf::scan` for `decimal32` and `decimal64`. `cudf::scan` only supports 4 operations (sum, product, min and max) but the decimal types will only support `SUM`, `MAX` and `MIN`. This PR resolves a part of #3556. Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - Jake Hemstad - Mark Harris URL: #7063
Configuration menu - View commit details
-
Copy full SHA for fc92bb9 - Browse repository at this point
Copy the full SHA fc92bb9View commit details -
Spark Murmur3 hash functionality(#7024)
Resolves #6863 Expands existing murmur3 hashing functionality to match Spark's murmur3 hashing algorithm by modifying tail processing for unaligned bytes and processing booleans as 32bit integers rather than singular bytes. Authors: - Ryan Lee <ryanlee@nvidia.com> - rwlee <rwlee@users.noreply.github.com> Approvers: - Jake Hemstad - null - Robert (Bobby) Evans - GALI PREM SAGAR URL: #7024
Configuration menu - View commit details
-
Copy full SHA for 8860baf - Browse repository at this point
Copy the full SHA 8860bafView commit details -
Upgrade nvcomp to 1.2.1(#7069)
This version is more friendly to ccache: ```console ccache -C # clear the cache time mvn clean package -DskipTests real 4m43.015s user 11m18.426s sys 0m21.891s time mvn clean package -DskipTests # everything is now cached real 0m20.265s user 0m45.810s sys 0m3.670s ``` Not sure about the ABI flag, but leaving it in causes the .so to not load: ```console /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java: symbol lookup error: /tmp/nvcomp5478764208255606671.so: undefined symbol: _ZN6nvcomp5Check8not_nullEPKvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_i ``` @jlowe Authors: - Rong Ou <rong.ou@gmail.com> Approvers: - Jason Lowe URL: #7069
Configuration menu - View commit details
-
Copy full SHA for d641688 - Browse repository at this point
Copy the full SHA d641688View commit details
Commits on Jan 5, 2021
-
Adding decimal writing support to parquet(#7017)
Since cudf doesn't support precision, the precision must be passed in as a write option. This is handled as a vector of uint8's that indicates the precision of each flattened column in order to support nested types. Partially closes #6474 Authors: - Mike Wilson <knobby@burntsheep.com> - Mike Wilson <hyperbolic2346@users.noreply.github.com> Approvers: - Vukasin Milovanovic - Mark Harris URL: #7017
Configuration menu - View commit details
-
Copy full SHA for 31c0d29 - Browse repository at this point
Copy the full SHA 31c0d29View commit details -
Add days check to cudf::is_timestamp using cuda::std::chrono classes(#…
…7028) Closes #6774 This PR adds a check for a valid day value for a year/month (if these are specified in the format) in the `cudf::is_timestamp()` API. Also, a chunk of messy year/month/day logic in a related functor was replaced with libcu++ implementation of the `year_month_day()` function instead. A gtest is also updated to include to test for an invalid day. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Vukasin Milovanovic - Devavret Makkar - Karthikeyan - Jake Hemstad URL: #7028
Configuration menu - View commit details
-
Copy full SHA for 6ebd264 - Browse repository at this point
Copy the full SHA 6ebd264View commit details -
Only upload packages that were built(#7077)
Should only upload the packages that were actually built. Project Flash sets `BUILD_CUDF` and `BUILD_LIBCUDF` as needed to control this. Skipping CI as this change only affects uploads which isn't tested by CI. Authors: - Raymond Douglass <ray@raydouglass.com> Approvers: - Dillon Cullinan URL: #7077
Configuration menu - View commit details
-
Copy full SHA for 873ab4a - Browse repository at this point
Copy the full SHA 873ab4aView commit details -
cudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#…Configuration menu - View commit details
-
Copy full SHA for 91322ba - Browse repository at this point
Copy the full SHA 91322baView commit details -
Refactor ORC
ProtobufReader
to make it more extendable(#7055)Related to #5826 Refactor the `ProtobufReader` API to facilitate expansion to support robust reading of column statistics. Changes include: - Move `orc::metadata` from `readder_impl.cu` to `orc.h` so it can be reused for statistics related APIs. - Removed duplicated code in `read_orc_statistics` - use `orc::metadata` instead. - Rename `ColumnStatistics` to `ColStatsBlob`, since that's what it currently is. - Avoid redundant copies in `read_orc_statistics`, - Replace `get_u32`, `get_i32`, etc. with templated `get`. - Replace per-type functors (e.g. `FieldUInt64`) with templated `field_reader`s to reduce code repetition. - The two type-specific parts of `FieldXYZ` functors (field enum and read impl) are now separate to avoid redundant code. - `field_reader` dispatches based on the value type, so also added `packed_field_reader` and `raw_field_reader` for packed fields and blob reads (respectively). - Replace return value based error checking in `ProtobufReader` with `CUDF_EXPECTS`. - Removed `InitSchema` from `ProtobufReader` - schema is only used to determine column names. The names are now lazily calculated in `metadata::get_column_name` Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com> Approvers: - Kumar Aatish - Conor Hoekstra URL: #7055
Configuration menu - View commit details
-
Copy full SHA for 7bf0505 - Browse repository at this point
Copy the full SHA 7bf0505View commit details -
Add dictionary support to libcudf groupby functions(#6585)
Reference #5963 Add dictionary support to groupby. - [x] argmax - [x] argmin - [x] collect - [x] count - [x] max - [x] mean* - [x] median - [x] min - [x] nth element - [x] nunique - [x] quantile - [x] std* - [x] sum* - [x] var* * _not supported due to 10.2 compile segfault_ Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Jake Hemstad - Karthikeyan URL: #6585
Configuration menu - View commit details
-
Copy full SHA for 6828e2c - Browse repository at this point
Copy the full SHA 6828e2cView commit details
Commits on Jan 6, 2021
-
Configuration menu - View commit details
-
Copy full SHA for c0920e6 - Browse repository at this point
Copy the full SHA c0920e6View commit details -
Add
unstack()
support for non-multiindexed dataframes(#7054)Configuration menu - View commit details
-
Copy full SHA for 1930432 - Browse repository at this point
Copy the full SHA 1930432View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8787a64 - Browse repository at this point
Copy the full SHA 8787a64View commit details
Commits on Jan 7, 2021
-
JNI support for creating struct column from existing columns and fixe…
…d bug in struct with no children(#7084) The primary goal of this is to add in java APIs to create a struct column from other existing columns. As a part of this work I found a very small bug in the column_vector constructor that copies data from a column view for a struct column with no children in it. Spark supports this use case so I thought it would be good to test/fix the issue. Authors: - Robert (Bobby) Evans <bobby@apache.org> Approvers: - Kuhu Shukla (@kuhushukla) - Vukasin Milovanovic (@vuule) - Jason Lowe (@jlowe) URL: #7084
Configuration menu - View commit details
-
Copy full SHA for f768da7 - Browse repository at this point
Copy the full SHA f768da7View commit details -
Handle
nan
values correctly inSeries.one_hot_encoding
(#7059)Fixes: #7056 This PR handles `nan` values separately in `one_hot_encoding` when the given input category is `None`. Previously we were combining both `nan` & `<NA>` values to be the same when cat is `None`. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - Keith Kraus (@kkraus14) URL: #7059
Configuration menu - View commit details
-
Copy full SHA for 9439ed8 - Browse repository at this point
Copy the full SHA 9439ed8View commit details -
Add
pyorc
to dev environment(#7085)This PR adds `pyorc` package to dev environment yml files. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - Christopher Harris (@cwharris) - AJ Schmidt (@ajschmidt8) URL: #7085
Configuration menu - View commit details
-
Copy full SHA for ee65a47 - Browse repository at this point
Copy the full SHA ee65a47View commit details -
Configuration menu - View commit details
-
Copy full SHA for aa38f85 - Browse repository at this point
Copy the full SHA aa38f85View commit details
Commits on Jan 8, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 30e154c - Browse repository at this point
Copy the full SHA 30e154cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 04aa30c - Browse repository at this point
Copy the full SHA 04aa30cView commit details
Commits on Jan 11, 2021
-
Adds in JNI support for creating an list column from existing columns(#…
Configuration menu - View commit details
-
Copy full SHA for 11ebc3e - Browse repository at this point
Copy the full SHA 11ebc3eView commit details
Commits on Jan 12, 2021
-
Add segmented_gather(list_column, gather_list)(#7003)
closes #6542 - [x] Add segmented_gather(list, list) - [x] Add unit tests - [x] Documentation Authors: - Karthikeyan Natarajan <karthikeyann@users.noreply.github.com> - Karthikeyan <6488848+karthikeyann@users.noreply.github.com> Approvers: - Vukasin Milovanovic (@vuule) - @nvdbaranec - AJ Schmidt (@ajschmidt8) - Jake Hemstad (@jrhemstad) URL: #7003
Configuration menu - View commit details
-
Copy full SHA for 87e414c - Browse repository at this point
Copy the full SHA 87e414cView commit details -
verify window operations on decimal with java tests(#7120)
This pull request is to verify window operations on decimal columns in java package, which is required by spark-rapids on [issue 1333](NVIDIA/spark-rapids#1333). Authors: - sperlingxx <lovedreamf@gmail.com> Approvers: - Robert (Bobby) Evans (@revans2) URL: #7120
Configuration menu - View commit details
-
Copy full SHA for 9a66576 - Browse repository at this point
Copy the full SHA 9a66576View commit details -
Add
scale
andvalue
methods tofixed_point
(#7109)This PR adds `fixed_point::scale()` and `fixed_point::value()`. It enables developers to avoid the following piece of code (which is how you can currently access scale and value). ```cpp auto si = numeric::scaled_integer<rep_type>{value}; // use si.value or si.scale ``` Note that this PR should merged after #7105 (or I can resolve conflict if it gets merged first) Authors: - Conor Hoekstra <codereport@outlook.com> - Conor Hoekstra <36027403+codereport@users.noreply.github.com> Approvers: - MithunR (@mythrocks) - David (@davidwendt) URL: #7109
Configuration menu - View commit details
-
Copy full SHA for 4da8312 - Browse repository at this point
Copy the full SHA 4da8312View commit details -
Handle nested string columns with no children in contiguous_split.(#6864
) Fixes a specific corner case: String columns with no children (a special form of empty string column that can happen) that are nested inside a list (or struct) column. This would be useful as a 0.17 PR but isn't strictly necessary, since it's pretty late. Edit: Updated the fix so that it always includes a record for src/dst buffers, even if they are of size 0 or have null data pointers. The previous method that only checked the data pointer being null was unclean and didn't handle a particularly strange case that came up with the Spark plugin: the plugin was reconstructing columns (on the receiver side of a shuffle) that had size 0 but a non-null data pointer. This is technically legal but super weird. Authors: - Dave Baranec <dbaranec@nvidia.com> - Karthikeyan <6488848+karthikeyann@users.noreply.github.com> Approvers: - Karthikeyan (@karthikeyann) - Alfred Xu (@sperlingxx) - Karthikeyan (@karthikeyann) - Alfred Xu (@sperlingxx) - Karthikeyan (@karthikeyann) - Devavret Makkar (@devavret) URL: #6864
Configuration menu - View commit details
-
Copy full SHA for d791e20 - Browse repository at this point
Copy the full SHA d791e20View commit details -
Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
f……or `decimal32` and `decimal64`(#7119) This PR resolves #7115. Add `cudf::binary_operation` support for `NULL_MAX`, `NULL_MIN` and `NULL_EQUALS` for `decimal32` and `decimal64`. Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - Mark Harris (@harrism) - David (@davidwendt) - Mike Wilson (@hyperbolic2346) URL: #7119
Configuration menu - View commit details
-
Copy full SHA for 9790ff7 - Browse repository at this point
Copy the full SHA 9790ff7View commit details -
Build libcudf with -Wall(#7105)
I discovered we're not building libcudf the `-Wall` GCC flag. This PR enables `-Wall` for GCC and nvcc, and fixes most of the errors. ~~The only error I haven't fixed yet is `-Werror=uninitialized` on this line [this line](https://github.com/rapidsai/cudf/blob/branch-0.18/cpp/include/cudf/scalar/scalar.hpp#L334), but @codereport is on it.~~ Fixed ✔️ Authors: - ptaylor <paul.e.taylor@me.com> - Conor Hoekstra <codereport@outlook.com> - Paul Taylor <paul.e.taylor@me.com> Approvers: - Conor Hoekstra (@codereport) - Keith Kraus (@kkraus14) - Mark Harris (@harrism) URL: #7105
Configuration menu - View commit details
-
Copy full SHA for 68d4791 - Browse repository at this point
Copy the full SHA 68d4791View commit details
Commits on Jan 13, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 0c7b36e - Browse repository at this point
Copy the full SHA 0c7b36eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e647d1a - Browse repository at this point
Copy the full SHA e647d1aView commit details -
Fix compilation errors in libcudf(#7138)
After recent changes in libcudf compilation in #7105, the compilation of libcudf on my local machine is broken and these changes fixed the compilation errors. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - Keith Kraus (@kkraus14) - Devavret Makkar (@devavret) - David (@davidwendt) URL: #7138
Configuration menu - View commit details
-
Copy full SHA for e0e2cf8 - Browse repository at this point
Copy the full SHA e0e2cf8View commit details
Commits on Jan 15, 2021
-
Fastpath single strings column in cudf::sort(#7075)
Closes #7027 The internal `cudf::strings::detail::sort()` function is faster sorting a single strings coumn than `cudf::sort`. Details are in the #7027 comments. Results using the sort gbenchmark: ``` Baseline: SortStrings/stringssort/1024/manual_time 1.18 ms 1.20 ms 593 SortStrings/stringssort/4096/manual_time 1.98 ms 2.00 ms 352 SortStrings/stringssort/32768/manual_time 2.73 ms 2.75 ms 256 SortStrings/stringssort/262144/manual_time 4.36 ms 4.38 ms 160 SortStrings/stringssort/2097152/manual_time 66.2 ms 66.2 ms 10 SortStrings/stringssort/16777216/manual_time 547 ms 548 ms 1 Calling cudf::strings::detail::sort from cudf::sort: SortStrings/stringssort/1024/manual_time 0.692 ms 0.711 ms 1002 SortStrings/stringssort/4096/manual_time 1.13 ms 1.15 ms 615 SortStrings/stringssort/32768/manual_time 1.59 ms 1.61 ms 440 SortStrings/stringssort/262144/manual_time 2.82 ms 2.84 ms 247 SortStrings/stringssort/2097152/manual_time 43.1 ms 43.1 ms 16 SortStrings/stringssort/16777216/manual_time 386 ms 386 ms 2 ``` Authors: - davidwendt <dwendt@nvidia.com> Approvers: - AJ Schmidt (@ajschmidt8) - Conor Hoekstra (@codereport) - Jake Hemstad (@jrhemstad) - Christopher Harris (@cwharris) URL: #7075
Configuration menu - View commit details
-
Copy full SHA for c2e9ffd - Browse repository at this point
Copy the full SHA c2e9ffdView commit details -
Fix JIT cache multi-process test flakiness in slow drives(#7142)
Fixes #6716 Authors: - Devavret Makkar <dmakkar@nvidia.com> Approvers: - @nvdbaranec - David (@davidwendt) URL: #7142
Configuration menu - View commit details
-
Copy full SHA for 5828cef - Browse repository at this point
Copy the full SHA 5828cefView commit details -
Add gbenchmarks for reduction aggregations any() and all()(#7129)
While investing the long compile times of the reduction source files `any.cu` and `all.cu` I found it necessary to build a gbenchmark to ensure changes did not effect the performance of these functions. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Conor Hoekstra (@codereport) - Vukasin Milovanovic (@vuule) - Paul Taylor (@trxcllnt) - Keith Kraus (@kkraus14) URL: #7129
Configuration menu - View commit details
-
Copy full SHA for bce9552 - Browse repository at this point
Copy the full SHA bce9552View commit details -
Add documentation for support dtypes in all IO formats(#7139)
Fixes: #7103 This PR introduces: - [x] a new doc page which contains dtypes & IO formats matrix supported by cudf currently. This matrix currently lists whether a dtype is supported by a reader / writer. How the table looks can be seen in the below screenshot. - [x] As part of this PR I have also introduced informative error messages in some IO reader/writers. - [x] Raising an error in ORC writer if there is any categorical data. ![Screenshot from 2021-01-15 09-40-57](https://user-images.githubusercontent.com/11664259/104747156-cb335200-5715-11eb-92b3-85a246fbdc8a.png) Authors: - galipremsagar <sagarprem75@gmail.com> - GALI PREM SAGAR <sagarprem75@gmail.com> Approvers: - Vukasin Milovanovic (@vuule) - Ashwin Srinath (@shwina) - AJ Schmidt (@ajschmidt8) - Keith Kraus (@kkraus14) URL: #7139
Configuration menu - View commit details
-
Copy full SHA for e86cc65 - Browse repository at this point
Copy the full SHA e86cc65View commit details
Commits on Jan 18, 2021
-
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#……7147) This PR resolves #7117 by adding support for `cudf::rolling` for the `SUM` option for `decimal32` and `decimal64`. Authors: - Conor Hoekstra <codereport@outlook.com> Approvers: - David (@davidwendt) - Karthikeyan (@karthikeyann) URL: #7147
Configuration menu - View commit details
-
Copy full SHA for 835ccf9 - Browse repository at this point
Copy the full SHA 835ccf9View commit details
Commits on Jan 19, 2021
-
Enable logic for GPU auto-detection in cudfjni(#7155)
Allow overriding `GPU_ARCHS` with an empty string in cudfjni to enable automatic detection ```bash mvn clean install -DARROW_STATIC_LIB=ON -DBoost_USE_STATIC_LIBS=ON -DGPU_ARCHS= ... [exec] -- CUDA_VERSION: 11.0 [exec] Auto detection of gpu-archs: 75 [exec] GPU_ARCHS = 75 ``` Allow `--h[elp]` switch to `$CUDF_HOME/build.sh` Authors: - Gera Shegalov <gshegalov@nvidia.com> - Gera Shegalov <gera@apache.org> Approvers: - Jason Lowe (@jlowe) - Keith Kraus (@kkraus14) URL: #7155
Configuration menu - View commit details
-
Copy full SHA for e8ecb24 - Browse repository at this point
Copy the full SHA e8ecb24View commit details -
Fix comparisons between Series and cudf.NA(#7072)
Fixes #7043, gives less than ideal results due to #7066. Authors: - brandon-b-miller <brmiller@nvidia.com> Approvers: - GALI PREM SAGAR (@galipremsagar) URL: #7072
Configuration menu - View commit details
-
Copy full SHA for 8d80d5c - Browse repository at this point
Copy the full SHA 8d80d5cView commit details -
Fixing parquet precision writing failing if scale is equal to precision(
#7146) @razajafri noticed that precision could not be equal to scale when writing decimals. This should be allowed and this fixes that and adds a test to verify it. closes #7145 Authors: - Mike Wilson <knobby@burntsheep.com> Approvers: - Raza Jafri (@razajafri) - Vukasin Milovanovic (@vuule) URL: #7146
Configuration menu - View commit details
-
Copy full SHA for b0525f4 - Browse repository at this point
Copy the full SHA b0525f4View commit details
Commits on Jan 20, 2021
-
Fix -Werror=sign-compare errors in device code(#7164)
Not sure why these aren't being caught in local 10.2 envs or CI builds, but I can't build a local CUDA 11.0 env due to a mamba bug. Authors: - ptaylor <paul.e.taylor@me.com> Approvers: - Mark Harris (@harrism) - David (@davidwendt) URL: #7164
Configuration menu - View commit details
-
Copy full SHA for 5828be5 - Browse repository at this point
Copy the full SHA 5828be5View commit details -
Update doxyfile project number(#7161)
The Doxyfile project number is set to 0.16. I know I've seen it in the UI before but cannot find it now. I've updated the number just in case. And I've added a line to the update-version.sh (thanks @ajschmidt8 ) to automatically update the file when a new release is created. Neither of these 2 files effect the CI/CD build. Authors: - davidwendt <dwendt@nvidia.com> Approvers: - Karthikeyan (@karthikeyann) - AJ Schmidt (@ajschmidt8) - Mark Harris (@harrism) URL: #7161
Configuration menu - View commit details
-
Copy full SHA for 7df4a4c - Browse repository at this point
Copy the full SHA 7df4a4cView commit details -
Add libcudf API for parsing of ORC statistics(#7136)
Implementation of the feature includes: - Renamed libcudf `read_orc_statistics` to `read_raw_orc_statistics` to make a distinction from the new function. - Changed the `read_raw_orc_statistics` return type to `raw_orc_statistics` instead of the vector with heterogeneous data. - Added `read_parsed_orc_statistics` that also parses the statistics blobs to make the API usable without the Python layer. - Fixed a few compiler warnings (i.e. errors). - Added read functions for statistics to ProtobufReader. - Added support for optional fields to ProtobufReader (such fields are `std::unique_ptr` for now). Other changes: - Renamed the existing ORC statistics API to `read_raw_orc_statistics`. - Replaced some explicit H2D and D2H copies with appropriate abstractions. - Enabled several ORC tests for bool columns that were missed when the support for such columns was added. - Remove unused `zigzag(uint64_t)`. Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vmilovanovic@nvidia.com> Approvers: - @brandon-b-miller - GALI PREM SAGAR (@galipremsagar) - Conor Hoekstra (@codereport) - Mark Harris (@harrism) URL: #7136
Configuration menu - View commit details
-
Copy full SHA for 3e0af46 - Browse repository at this point
Copy the full SHA 3e0af46View commit details -
Update s3 tests to use moto_server(#7144)
This PR updates s3 tests to use `moto_server` instead of going via a moto mock_s3 context. This enables cleaner s3 testing with `s3fs>=0.5` which incorporates aiobotocore for s3 connections. - The pytests starts up a moto-server for each worker running tests. - Ports used: `5000, 5550 - 5550+ (n_pytest_workers-1)` Updated integration repo with requirements: rapidsai/integration#207 Authors: - Ayush Dattagupta <ayushdg95@gmail.com> Approvers: - GALI PREM SAGAR (@galipremsagar) - Keith Kraus (@kkraus14) URL: #7144
Configuration menu - View commit details
-
Copy full SHA for 5855bfa - Browse repository at this point
Copy the full SHA 5855bfaView commit details -
Fix importing list & struct types in
from_arrow
(#7162)Fixes: #7137, #7148 This PR fixes converting a pyarrow table which has llist and struct types via `from_arrow`. Incase of `list` dtype we shouldn't have to perform any typecast and incase of `struct` dtype we should be renaming the fields appropriately. Authors: - galipremsagar <sagarprem75@gmail.com> Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Keith Kraus (@kkraus14) URL: #7162
Configuration menu - View commit details
-
Copy full SHA for 0515a42 - Browse repository at this point
Copy the full SHA 0515a42View commit details -
Replace offsets with iterators in cuIO utilities and CSV parser(#7150)
Closes #6210 - Rename datetime utils to snake_case; - Rename datetime utils so that `parse_xyz` functions move the input iterator past the parsed value, and `to_xyz` function do not change the input iterators. - Replace `findFirstOccurrence` with `thrust::find`. - Replace use of offsets with pointers in `datetime.cuh` and `csv_gpu.cu`; - Rename some variables in the CSV parser to make the code clearer. Note: the semantics of variables/parameters did not change in this PR - `T* end` points to the last element in the range in many places. Authors: - vuule <vmilovanovic@nvidia.com> - Vukasin Milovanovic <vmilovanovic@nvidia.com> Approvers: - Mark Harris (@harrism) - Christopher Harris (@cwharris) URL: #7150
Configuration menu - View commit details
-
Copy full SHA for 36f85dc - Browse repository at this point
Copy the full SHA 36f85dcView commit details -
Cross link RMM & libcudf Doxygen docs(#7149)
This PR updates `cpp/doxygen/Doxyfile` to consume the generated Doxygen tags from `rmm` (see rapidsai/rmm#672). This will enable linking between the `cudf` docs and `rmm` docs. This PR along with rapidsai/rmm#672 closes issue #5152. I also updated `docs/cudf/source/conf.py` and added it to `update-version.sh`. Authors: - AJ Schmidt <aschmidt@nvidia.com> - AJ Schmidt <ajschmidt8@users.noreply.github.com> Approvers: - Vukasin Milovanovic (@vuule) - @nvdbaranec - Dillon Cullinan (@dillon-cullinan) - Karthikeyan (@karthikeyann) URL: #7149
Configuration menu - View commit details
-
Copy full SHA for d79da2c - Browse repository at this point
Copy the full SHA d79da2cView commit details -
Add
MultiIndex.rename
API(#7172)Closes #7057 Properly overrides `MultiIndex.rename` from `Index.rename`, reusing API from `MultiIndex.set_names`. Authors: - Michael Wang <isVoid@users.noreply.github.com> Approvers: - GALI PREM SAGAR (@galipremsagar) URL: #7172
Configuration menu - View commit details
-
Copy full SHA for 02e25b6 - Browse repository at this point
Copy the full SHA 02e25b6View commit details -
Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) This PR resolves a part of #3556. I decided to push the changes for sort `cudf::group_by` and hash `group_by` in different PRs. Authors: - Conor Hoekstra (@codereport) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Karthikeyan (@karthikeyann) URL: #7169
Configuration menu - View commit details
-
Copy full SHA for 27893db - Browse repository at this point
Copy the full SHA 27893dbView commit details -
Add encoding and compression argument to CSV writer (#7168)
This PR closes #7083 by adding an encoding argument to our CSV writer, it also adds compression argument to the writer. This will help address some issues with feature tool compatibility [PR](alteryx/featuretools#1246). Authors: - Vibhu Jawa (@VibhuJawa) Approvers: - GALI PREM SAGAR (@galipremsagar) - Michael Wang (@isVoid) URL: #7168
Configuration menu - View commit details
-
Copy full SHA for 95059b8 - Browse repository at this point
Copy the full SHA 95059b8View commit details -
Enable round in cudf for DataFrame and Series (#7022)
This enables round for DataFrames and Series using the libcudf round implementation and removes the old numba round implementation. Closes #1270 Authors: - @ChrisJar Approvers: - Ashwin Srinath (@shwina) - Michael Wang (@isVoid) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - GALI PREM SAGAR (@galipremsagar) URL: #7022
Configuration menu - View commit details
-
Copy full SHA for a51caa5 - Browse repository at this point
Copy the full SHA a51caa5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 81952d0 - Browse repository at this point
Copy the full SHA 81952d0View commit details
Commits on Jan 21, 2021
-
Replace ORC writer api with class (#7099)
Replacing API with class for chunked orc writer to ease the usage, for additional information #6911. This PR also adds support ORC chunked writing in python along with test cases. Authors: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Jason Lowe (@jlowe) Approvers: - Vukasin Milovanovic (@vuule) - GALI PREM SAGAR (@galipremsagar) - Devavret Makkar (@devavret) - AJ Schmidt (@ajschmidt8) - Jason Lowe (@jlowe) - Robert (Bobby) Evans (@revans2) URL: #7099
Configuration menu - View commit details
-
Copy full SHA for 6390498 - Browse repository at this point
Copy the full SHA 6390498View commit details -
Java bindings for Fixed-point type support for Parquet (#7153)
Adds in java support to be able to write fixed-point type to parquet Authors: - Raza Jafri (@razajafri) Approvers: - Karthikeyan (@karthikeyann) - Jason Lowe (@jlowe) URL: #7153
Configuration menu - View commit details
-
Copy full SHA for 4111cb7 - Browse repository at this point
Copy the full SHA 4111cb7View commit details -
Add support for array-like inputs in
cudf.get_dummies
(#7181)FIxes: #7031 This PR introduces array-like inputs support in `cudf.get_dummies`. I think in near future we will have to deprecate and adapt new name for `get_dummies`: pandas-dev/pandas#35724 Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Keith Kraus (@kkraus14) URL: #7181
Configuration menu - View commit details
-
Copy full SHA for 4c6a57c - Browse repository at this point
Copy the full SHA 4c6a57cView commit details -
Implement update() function (#6883)
Resolves: #5543 This PR adds support for updating a DataFrame with non-NA values from another DataFrame, whereby only the values at matching index/column labels are updated. Only left join is supported, keeping the index and columns of original DataFrame. Authors: - @skirui-source Approvers: - GALI PREM SAGAR (@galipremsagar) - Michael Wang (@isVoid) URL: #6883
Configuration menu - View commit details
-
Copy full SHA for 6c116e3 - Browse repository at this point
Copy the full SHA 6c116e3View commit details
Commits on Jan 22, 2021
-
Add Python DecimalColumn (#6715)
Resolves #6657. Authors: - Ashwin Srinath (@shwina) - Conor Hoekstra (@codereport) - Keith Kraus (@kkraus14) Approvers: - Karthikeyan (@karthikeyann) - Keith Kraus (@kkraus14) - Vukasin Milovanovic (@vuule) URL: #6715
Configuration menu - View commit details
-
Copy full SHA for 797f004 - Browse repository at this point
Copy the full SHA 797f004View commit details -
Fix
fillna
&dropna
to also considernp.nan
as a missing value (#……7019) Fixes: #7007 This PR introduces changes to handle the filling of `np.nan` values in `fillna` code by converting `nan` to `null`. This fix surfaced an issue with `can_cast_safely` where when trying to convert a float column with `nan`'s to `int` column is being allowed - This is incorrect and thus added a check to return False if there is atleast 1 `nan` value in the float column. `nan` is not being handled in `dropna` aswell but is being handled in `isna`, thus introduced changes to `nan` in `dropna` too. <!-- Thank you for contributing to cuDF :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. There are CI checks in place to enforce that committed code follows our style and syntax standards. Please see our contribution guide in `CONTRIBUTING.MD` in the project root for more information about the checks we perform and how you can run them locally. 4. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 5. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 6. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 7. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on master/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against master they should be resolved by merging master into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Christopher Harris (@cwharris) - Keith Kraus (@kkraus14) URL: #7019
Configuration menu - View commit details
-
Copy full SHA for 78113f5 - Browse repository at this point
Copy the full SHA 78113f5View commit details
Commits on Jan 23, 2021
-
Adding unit tests for
fixed_point
with extremely largescale
s (#7178) After a discussion with Keith and Ashwin today, I realized `libcudf` was missing a couple corner cases for `fixed_point` and decided to open a small PR to add them. Authors: - Conor Hoekstra (@codereport) Approvers: - Vukasin Milovanovic (@vuule) - Keith Kraus (@kkraus14) - Mike Wilson (@hyperbolic2346) - @nvdbaranec - Mark Harris (@harrism) URL: #7178
Configuration menu - View commit details
-
Copy full SHA for 70cefa4 - Browse repository at this point
Copy the full SHA 70cefa4View commit details -
Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194)
Configuration menu - View commit details
-
Copy full SHA for 2e0889a - Browse repository at this point
Copy the full SHA 2e0889aView commit details
Commits on Jan 25, 2021
-
Adding support for explode to cuDF (#7140)
This is an operation that expands lists into rows and duplicates the existing rows from other columns. Explanation can be found in the issue #6151 partially fixes #6151 Missing pos_explode support required to completely close out #6151 Authors: - Mike Wilson (@hyperbolic2346) Approvers: - Robert (Bobby) Evans (@revans2) - Jake Hemstad (@jrhemstad) - Karthikeyan (@karthikeyann) - @nvdbaranec URL: #7140
Configuration menu - View commit details
-
Copy full SHA for f422391 - Browse repository at this point
Copy the full SHA f422391View commit details -
Add libcudf lists column count_elements API (#7173)
This adds the libcudf part of #7157 ``` std::unique_ptr<column> cudf::lists::count_elements( lists_column_view const& input, rmm::mr::device_memory_resource* mr); ``` Returns the size of each element in the input lists column. The PR also includes gtests for this new API. Authors: - David (@davidwendt) Approvers: - @nvdbaranec - AJ Schmidt (@ajschmidt8) - Karthikeyan (@karthikeyann) - Mark Harris (@harrism) URL: #7173
Configuration menu - View commit details
-
Copy full SHA for 6c2675c - Browse repository at this point
Copy the full SHA 6c2675cView commit details -
Add Java interface for the new API 'explode' (#7151)
This PR is to add Java interface for the new API '`explode`', along with its unit tests. This PR depends on the PR #7140 . Authors: - Liangcai Li (@firestarman) Approvers: - Jason Lowe (@jlowe) - Robert (Bobby) Evans (@revans2) URL: #7151
Configuration menu - View commit details
-
Copy full SHA for bf0c37a - Browse repository at this point
Copy the full SHA bf0c37aView commit details -
Default
groupby
tosort=False
(#7180)Closes #5038, also closes #7026 Using `sort=False` yields better `groupby` performance, this PR changes `groupby` API to refrain from sorting the group index by default. Besides, this PR updates docstring to address the performance diff when using `sort=False`. Authors: - Michael Wang (@isVoid) Approvers: - Keith Kraus (@kkraus14) - Ashwin Srinath (@shwina) URL: #7180
Configuration menu - View commit details
-
Copy full SHA for 93ef1d2 - Browse repository at this point
Copy the full SHA 93ef1d2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f09a75f - Browse repository at this point
Copy the full SHA f09a75fView commit details -
Refactor cudf::string_view host and device code (#7159)
While working on improving the sort performance for strings columns in #7075, we tried a vector-load approach in the `string_view::compare()` function. This approached used some CUDA math intrinsic functions like `__funnelshift_r()` and `__byte_perm()`. Unfortunately, adding these to the `string_view` source would cause compile errors for some .cpp files. This is because the `string_view.cuh` was being included by some .cpp file even though these only used the appropriate `__host__ __device__` functions. This PR breaks up the host/device from the device-only functions so the .cpp files can include `string_view.cuh` without processing the device-only definitions. The host/device functions are now defined in the `string_view.cuh` directly and the device-only source is isolated in the `string_view.inl`. The include of the `string_view.inl` is then wrapped if a `#if CUDA_ARCH` so it will not be processed by a .cpp file compilation. Also, I attempted to minimize includes of `string_view.cuh` by removing it from `traits.hpp` and replacing it with a forward reference. This found a few files that were not including `string_view.cuh` directly as they should've. This also exposed `cpp/tests/utilities/scalar_utilities.cu` which appears to be unused and thus removed along with its header. No functionality has changed. Build times may be slightly faster since `string_view.cuh` is included in less source files and .cpp files no longer the `string_view.inl`. This means changing this file was also have a slightly less impact on rebuilding libcudf. Authors: - David (@davidwendt) Approvers: - Keith Kraus (@kkraus14) - AJ Schmidt (@ajschmidt8) - Karthikeyan (@karthikeyann) - Jake Hemstad (@jrhemstad) URL: #7159
Configuration menu - View commit details
-
Copy full SHA for 103c41a - Browse repository at this point
Copy the full SHA 103c41aView commit details -
Fast path single column sort (#7167)
This change is based on the changes in PR #7075. When `cudf::sort()` or `cudf::sorted_order()` is called with a `cudf::table_view` and specifies only a single strings column, we choose a fast-path sort algorithm with a simpler comparator specifically coded for string compares. The specialized code path was added to `cudf::sorted_order()` which is called by the other libcudf sort functions. For example, `cudf::sort()` calls `cudf::sorted_order()` and the calls `cudf::gather()` on the input `cudf::table_view()` to materialize the results. The libcudf `sorted_order` feature has two APIs: `cudf::sorted_order()` and `cudf::stable_sorted_order()` which internally use `thrust::sort()` and `thrust::stable_sort()` respectively. Each uses the `row_lexographic_comparator` for managing sort of multiple columns. A simpler comparator can be used in the case of a single column per the implementation in #7075. In this PR, I generalized this fast-path for other single column types. I found the same comparator from #7075, templated by type, could be used for speeding up sorting of any comparable type -- where a single column is specified. Further, there are some conditions with numeric types when a comparator is not required and where the `cub::DeviceRadixSort` functions can be used instead of thrust. The restrictions to account for when _not using a comparator_: - the type must support an assignment operator as well as the compare operators (basically only numeric types) - the column must not contain nulls since these are handled specially with a `null_order` parameter - `thrust::sort()` and `thrust::stable_sort()` sort the input data in-place and do not support descending order - `cudf::DeviceRadixSort` does not sort in-place and does not have stable-sort but does have a descending order option Here is how these are used in `cudf::detail::sorted_order<stable>()` matching conditions with these restrictions. | stable | nulls | numeric | ascending | function | |:---:|:---:|:---:|:---:| --- | | y | y | - | - | `thrust::stable_sort()` with comparator | | y | - | n | - | `thrust::stable_sort()` with comparator | | y | - | - | n | `thrust::stable_sort()` with comparator | | y | n | y | y | `thrust::stable_sort_by_key()` with input column copied | | n | y | - | - | `thrust::sort()` with comparator | | n | - | n | - | `thrust::sort()` with comparator | | n | n | y | y | `cub::DeviceRadixSort::SortPairs` with input column copied and output indices copied | | n | n | y | n | `cub::DeviceRadixSort::SortPairsDescending` with input column copied and output indices copied | The `sort_benchmarks.cu` was updated to include a non-nulls set of tests to show the speedups for the bottom half of the chart. The benchmark sorts integers in ascending order. With nulls, the sort is now 1.2x faster. With no nulls, the sort is about 14x faster. The faster speed comes at the expense of 2-3 times the memory required for `thrust::stable_sort_by_key()` or the `cub:DeviceRadixSort::SortPairs()` functions. The generalization using the new single-column comparator accounts for strings columns as well. So the strings-specific code for this has been removed in this PR. Authors: - David (@davidwendt) Approvers: - Jake Hemstad (@jrhemstad) - Karthikeyan (@karthikeyann) URL: #7167
Configuration menu - View commit details
-
Copy full SHA for eb1336f - Browse repository at this point
Copy the full SHA eb1336fView commit details -
Support contains() on lists of primitives (#7039)
Closes #6944. This commit adds a method (`contains()`) to check whether each row of a `LIST` column contains the scalar value specified as an argument. The operation returns a `BOOL8` column (with as many rows as the input `LIST`), each row indicating `true` if the value is found, `false` if not. Output `column[i]` is set to null if even one of the following holds true (in line with the semantics of `array_contains()` in SQL): 1. The search key `skey` is null 2. The list row `lists[i]` is null 3. The list row `lists[i]` contains even *one* null, *and* `lists[i]` does not contain the search key. This implementation currently supports the operation on lists of numerics or strings. Authors: - MithunR (@mythrocks) Approvers: - AJ Schmidt (@ajschmidt8) - Mark Harris (@harrism) - David (@davidwendt) - Karthikeyan (@karthikeyann) URL: #7039
Configuration menu - View commit details
-
Copy full SHA for b1e9e20 - Browse repository at this point
Copy the full SHA b1e9e20View commit details
Commits on Jan 26, 2021
-
Modify the semantics of
end
pointers in cuIO to match standard libr……ary (#7179) Closes #6252 Fix the `end` parameter semantics to match the standard C++ library. Move `is_whitespace` and `trim_field_start_end` to parsing_utils and use in both CSV and JSON. Authors: - Vukasin Milovanovic (@vuule) Approvers: - Christopher Harris (@cwharris) - Conor Hoekstra (@codereport) URL: #7179
Configuration menu - View commit details
-
Copy full SHA for a1db5c5 - Browse repository at this point
Copy the full SHA a1db5c5View commit details -
Replace parquet writer api with class (#7058)
This PR contains changes only pertaining to Parquet. Instead of having API, a class is being used to control state and options to reduce burden on user. For more information look at #6911 These changes will break Java since main API changed. Authors: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Jason Lowe (@jlowe) Approvers: - Vukasin Milovanovic (@vuule) - Devavret Makkar (@devavret) - Robert (Bobby) Evans (@revans2) - @brandon-b-miller - David (@davidwendt) URL: #7058
Configuration menu - View commit details
-
Copy full SHA for 6a4c760 - Browse repository at this point
Copy the full SHA 6a4c760View commit details -
Fixing parquet benchmarks (#7214)
`return_filemetadata` was removed in one of the recent PR, and missed to remove it in benchmarks. Authors: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) Approvers: - Conor Hoekstra (@codereport) - Christopher Harris (@cwharris) URL: #7214
Configuration menu - View commit details
-
Copy full SHA for d97b09e - Browse repository at this point
Copy the full SHA d97b09eView commit details -
Add coverage for
skiprows
andnum_rows
in parquet reader fuzz tes……ting (#7216) This PR adds coverage for `skiprows` and `num_rows` parameters in parquet reader fuzz tests. Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Vukasin Milovanovic (@vuule) - Keith Kraus (@kkraus14) URL: #7216
Configuration menu - View commit details
-
Copy full SHA for ccf4ffa - Browse repository at this point
Copy the full SHA ccf4ffaView commit details
Commits on Jan 27, 2021
-
Remove floating point types from radix sort fast-path (#7215)
Closes #7212 Reference #7167 (comment) Using radix sort for all fixed-width types causes an [error in Spark when floating point columns contain NaN elements](NVIDIA/spark-rapids#1585). This PR removes floating-point column types from the radix fast-path. This means the original `relational_compare` row operator is used to handle sorting floating point columns since they could possibly contain NaN elements. The `NANSorting` gtest included null elements so it did not catch the fast-path output discrepancy. This PR adds a `NANSortingNonNull` gtest to check for the desired NaN sorting behavior. Authors: - David (@davidwendt) Approvers: - Jake Hemstad (@jrhemstad) - Conor Hoekstra (@codereport) URL: #7215
Configuration menu - View commit details
-
Copy full SHA for d19cb40 - Browse repository at this point
Copy the full SHA d19cb40View commit details -
Add static type checking via Mypy (#6381)
Adds static type checking to cuDF Python via MyPy. * An additional `mypy` style check is enabled in CI * `mypy` is run as part of the pre-commit hook * Many parts of the cuDF internal code now have type annotations * Any new internal code is expected to be written with type annotations (not public-facing APIs) Authors: - Ashwin Srinath (@shwina) Approvers: - Dillon Cullinan (@dillon-cullinan) - Keith Kraus (@kkraus14) - Christopher Harris (@cwharris) URL: #6381
Configuration menu - View commit details
-
Copy full SHA for fc40c52 - Browse repository at this point
Copy the full SHA fc40c52View commit details -
Add JNI and Java bindings for list_contains (#7125)
Adds JNI and Java side bindings for `list_contains` that is being added as part of #7039. Authors: - Kuhu Shukla (@kuhushukla) Approvers: - Robert (Bobby) Evans (@revans2) - MithunR (@mythrocks) URL: #7125
Kuhu Shukla authoredJan 27, 2021 Configuration menu - View commit details
-
Copy full SHA for dd1efe1 - Browse repository at this point
Copy the full SHA dd1efe1View commit details -
Fix missing null_count() comparison in test framework and related fai…
…lures (#7219) Fixes #7210 Fixes #6733 List of fixes included: - [x] Restore `null_count()` check in `expect_columns_equal` / `expect_columns_equivalent` - [x] Fix issue in `structs_column_view::get_sliced_child` - [x] Fix test failures in COPYING_TEST - [x] Fix test failures in STREAM_COMPACTION_TEST - [x] Fix test failures in RESHAPE_TEST Authors: - @nvdbaranec - Mark Harris (@harrism) Approvers: - Mark Harris (@harrism) - MithunR (@mythrocks) - Jake Hemstad (@jrhemstad) URL: #7219
Configuration menu - View commit details
-
Copy full SHA for 9631660 - Browse repository at this point
Copy the full SHA 9631660View commit details
Commits on Jan 28, 2021
-
Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#…
…7222) This adds in the JNI layer to be able to take build up Arrow column vectors which are just references to off heap arrow buffers and then convert those into CUDF ColumnVectors by directly copying the arrow data to the GPU. The way this works is you create a ArrowColumnBuilder for each column you need. You call addBatch for each separate arrow buffer you want to add into that column and then you call buildAndPutOnDevice() on the Builder. That will cause the arrow pointer to be passed into CUDF, an Arrow Table with 1 column is created, that Arrow table gets passed into the cudf::from_arrow which returns a CUDF Table and we grab the 1 column from that and return it. Note this only supports primitive types and Strings for now. List, Struct, Dictionary, and Decimal are not supported yet. Signed-off-by: Thomas Graves <tgraves@nvidia.com> Authors: - Thomas Graves (@tgravescs) Approvers: - Robert (Bobby) Evans (@revans2) - Jason Lowe (@jlowe) URL: #7222
Configuration menu - View commit details
-
Copy full SHA for cbc0394 - Browse repository at this point
Copy the full SHA cbc0394View commit details -
Support
numeric_only
field forrank()
(#7213)Closes #7174 This PR adds support for `numeric_only` field for `Dataframe.rank()` and `Series.rank()`. When user specifies `numeric_only=True`, only the numerical data type columns are selected to construct a cudf object and passed to lower level for processing. Two minor refactors are also included in this PR: - This PR refactors internal API of `Frame._get_columns_by_label`, which now supports dispatching to this method from both `Dataframe` and `Series`. - This PR refactors `test_rank.py`, moving test functions inside class `TestRank` out as top level functions. All test variables shared among test cases are moved to a `pytests.fixture` method. A `Dataframe.rank` test case that expects to raise due to a [pandas bug](pandas-dev/pandas#32593) is now captured under `pytest.raises`. Authors: - Michael Wang (@isVoid) Approvers: - Ashwin Srinath (@shwina) - @brandon-b-miller URL: #7213
Configuration menu - View commit details
-
Copy full SHA for 7d52970 - Browse repository at this point
Copy the full SHA 7d52970View commit details -
Fix test column vector leak (#7238)
#7125 added a test column vector leak. This PR fixes this minor leak. Authors: - Kuhu Shukla (@kuhushukla) Approvers: - Jason Lowe (@jlowe) - Thomas Graves (@tgravescs) URL: #7238
Kuhu Shukla authoredJan 28, 2021 Configuration menu - View commit details
-
Copy full SHA for ab34580 - Browse repository at this point
Copy the full SHA ab34580View commit details -
Configuration menu - View commit details
-
Copy full SHA for 02166da - Browse repository at this point
Copy the full SHA 02166daView commit details -
Fix Arrow column test leaks (#7241)
Found leaks in the ArrowColumnVectorTest so fix them. Signed-off-by: Thomas Graves <tgraves@nvidia.com> Authors: - Thomas Graves (@tgravescs) Approvers: - Robert (Bobby) Evans (@revans2) - Jason Lowe (@jlowe) URL: #7241
Configuration menu - View commit details
-
Copy full SHA for 9672e3d - Browse repository at this point
Copy the full SHA 9672e3dView commit details -
Add dictionary column support to rolling_window (#7186)
Reference #5963 Add support for dictionary column to `cudf::rolling_window` (non-udf) Rolling aggregations - [x] min/max - [x] lead/lag - [x] counting, row-number These only require aggregating the dictionary indices and do not need to access the keys. Authors: - David (@davidwendt) Approvers: - Mark Harris (@harrism) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: #7186
Configuration menu - View commit details
-
Copy full SHA for b608832 - Browse repository at this point
Copy the full SHA b608832View commit details
Commits on Jan 29, 2021
-
Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
a……nd `decimal64` (#7198) This resolves a part of #7132 **ToDo:** * [x] Simple unit test * [x] Comprehensive unit tests * [x] Initial Column + Column * [x] Full Column + Column * [x] Column + Scalar * [x] Scalar + Column * [x] Cleanup Authors: - Conor Hoekstra (@codereport) Approvers: - Mark Harris (@harrism) - @nvdbaranec URL: #7198
Configuration menu - View commit details
-
Copy full SHA for b097b5a - Browse repository at this point
Copy the full SHA b097b5aView commit details -
Refactor io memory fetches to use hostdevice_vector methods (#7035)
This replaces `cudaMemcpyAsync(hostdevice_vector)` with `hostdevice_vector.device_to_host()` or `hostdevice_vector.host_to_device()` when appropriate. Issue #6538 Authors: - @ChrisJar Approvers: - Karthikeyan (@karthikeyann) - Vukasin Milovanovic (@vuule) URL: #7035
Configuration menu - View commit details
-
Copy full SHA for fe5e07d - Browse repository at this point
Copy the full SHA fe5e07dView commit details -
Fix
loc
for Series with a MultiIndex (#7243)Fixes #7221 and adds improvements to `loc` with a MultiIndex. * Previously, `loc` on a `Series` with a `MultiIndex` would fail. For example: ```python In [7]: sr Out[7]: n_workers type 1 fit 1 2 load 2 3 predict 3 Name: x, dtype: int64 In [8]: sr.loc[(1, "fit")] # KeyError ```` * Previously, `loc` on a `DataFrame` with a `MultiIndex` would fail when a slice without `start` or `end` was used. For example: ```python In [3]: df Out[3]: x n_workers type 1 fit 1 2 load 2 3 predict 3 In [4]: df.loc[:(2, "load")] # TypeError ``` Both the above issues have been addressed and tests added. Authors: - Ashwin Srinath (@shwina) Approvers: - Keith Kraus (@kkraus14) - Michael Wang (@isVoid) - GALI PREM SAGAR (@galipremsagar) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: #7243
Configuration menu - View commit details
-
Copy full SHA for 019d7cc - Browse repository at this point
Copy the full SHA 019d7ccView commit details
Commits on Jan 30, 2021
-
Implement COLLECT rolling window aggregation (#7189)
Closes #7133. This is an implementation of the `COLLECT` aggregation in the context of rolling window functions. This enables the collection of rows (of type `T`) within specified window boundaries into a list column (containing elements of type `T`). In this context, one list row would be generated per input row. E.g. Consider the following example: ```c++ auto input_col = fixed_width_column_wrapper<int32_t>{70, 71, 72, 73, 74}; ``` Calling `rolling_window()` with `preceding=2`, `following=1`, `min_periods=1` produces the following: ```c++ auto output_col = cudf::rolling_window(input_col, 2, 1, 1, collect_aggr); // == [ [70,71], [70,71,72], [71,72,73], [72,73,74], [73,74] ] ``` `COLLECT` is supported with `rolling_window()`, `grouped_rolling_window()`, and `grouped_time_range_rolling_window()`, across primitive types and arbitrarily nested lists and structs. `min_periods` is also honoured: If the number of observations is fewer than min_periods, the resulting list row is null. Authors: - MithunR (@mythrocks) Approvers: - Keith Kraus (@kkraus14) - Vukasin Milovanovic (@vuule) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: #7189
Configuration menu - View commit details
-
Copy full SHA for 14b0900 - Browse repository at this point
Copy the full SHA 14b0900View commit details
Commits on Feb 1, 2021
-
Add List types support in data generator (#7064)
Resolves: #6263 This PR introduces changes which will enable generation of random list columns in datagenerator which will be used as part of fuzz tests. cc: @vuule Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Vukasin Milovanovic (@vuule) - @brandon-b-miller URL: #7064
Configuration menu - View commit details
-
Copy full SHA for 50be922 - Browse repository at this point
Copy the full SHA 50be922View commit details -
Handle various parameter combinations in
replace
API (#7207)Fixes: #7206 The `replace` API has two parameters `to_replace` & `value` which are overloaded and support different types of inputs for each of these two parameters have different behaviors. These changes introduce clear code-flow for each type of possible parameter combination. This way it would be easier to support newer parameters in future like `regex` & nested dict types, which would change the behaviour of `to_replace` & `value` parameters.. - [x] Ensure all combinations are covered for `to_replace` & `value` for both `DataFrame.replace` & `Series.replace`. - [x] Document changes inline & Update func docs. - [x] Add tests to include coverage for all combinations that are not yet covered. Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Keith Kraus (@kkraus14) - @brandon-b-miller URL: #7207
Configuration menu - View commit details
-
Copy full SHA for b8cb8c7 - Browse repository at this point
Copy the full SHA b8cb8c7View commit details -
Define and implement more behavior for merging on categorical variabl…
…es (#7209) Fixes #6892 Defines the desired behavior for an implicit merge of two possibly differing categorical variables, or one categorical variable and one non-categorical variable, as a function of the dtypes and the merge configuration. The desired behavior is defined through the tests and then implemented in `casting_logic.py`. Authors: - @brandon-b-miller - Keith Kraus (@kkraus14) Approvers: - GALI PREM SAGAR (@galipremsagar) - Keith Kraus (@kkraus14) URL: #7209
Configuration menu - View commit details
-
Copy full SHA for ccc9173 - Browse repository at this point
Copy the full SHA ccc9173View commit details -
Disallow picking output columns from nested columns. (#7248)
Only top level columns can be selected by name Fixes #7229 Authors: - Devavret Makkar (@devavret) Approvers: - Karthikeyan (@karthikeyann) - Vukasin Milovanovic (@vuule) - @nvdbaranec - Keith Kraus (@kkraus14) URL: #7248
Configuration menu - View commit details
-
Copy full SHA for 0ee8004 - Browse repository at this point
Copy the full SHA 0ee8004View commit details -
Remove floating point types from cudf::sort fast-path (#7250)
PR #7215 removed single floating point columns from radix sort fast-path but missed disabling the fast-path sort for floating-point in `cudf::sort()`. This PR fixes `cudf::sort` and adds a new test to the existing `RowOperatorTestForNAN.NANSortingNonNull` gtest. Authors: - David (@davidwendt) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Conor Hoekstra (@codereport) URL: #7250
Configuration menu - View commit details
-
Copy full SHA for 3ecde9d - Browse repository at this point
Copy the full SHA 3ecde9dView commit details
Commits on Feb 3, 2021
-
libcudf Developer Guide (#6977)
Adds a new developer guide for libcudf. This is based on the existing libcudf++ transition guide. Fixes #5273 TODO - [x] Description of `dictionary_column_wrapper` and `fixed_point_column_wrapper` - [x] Benchmarking Section (put in a new file, Benchmarking.md)? - [x] Better discussion of nested types - [x] Introductory section on data types - [x] Consider splitting into multiple documents: DEVELOPER_GUIDE.md, TESTING.md, BENCHMARKING.md? - [x] Placeholder for cuIO? - [x] Add section on code and documentation style and formatting Authors: - Mark Harris (@harrism) - Jake Hemstad (@jrhemstad) Approvers: - @nvdbaranec - Conor Hoekstra (@codereport) - Jake Hemstad (@jrhemstad) - David (@davidwendt) URL: #6977
Configuration menu - View commit details
-
Copy full SHA for 52f5b32 - Browse repository at this point
Copy the full SHA 52f5b32View commit details -
Fix style issues related to NumPy (#7279)
NumPy 1.20 is [typed](https://numpy.org/devdocs/release/1.20.0-notes.html#numpy-is-now-typed), which exposed a few typing errors in cuDF that this PR addresses. Authors: - Ashwin Srinath (@shwina) Approvers: - Keith Kraus (@kkraus14) - GALI PREM SAGAR (@galipremsagar) - AJ Schmidt (@ajschmidt8) URL: #7279
Configuration menu - View commit details
-
Copy full SHA for 900c1e1 - Browse repository at this point
Copy the full SHA 900c1e1View commit details -
Prepare Changelog for Automation (#7272)
This PR prepares the changelog to be automatically updated during releases. Authors: - AJ Schmidt (@ajschmidt8) Approvers: - Keith Kraus (@kkraus14) URL: #7272
Configuration menu - View commit details
-
Copy full SHA for 54cddb1 - Browse repository at this point
Copy the full SHA 54cddb1View commit details
Commits on Feb 4, 2021
-
Add docs for working with missing data (#7010)
Fixes: #6963 This PR introduces a "Working with missing data" doc page where we clearly outline how we can work with missing data in cudf. The behavior shown in #6963 is correct due to the fact that cudf treats `NaT` as `<NA>` values. Hence highlighted the difference in behavior of having `NaT` in datetime/timedelta values between pandas and cudf. Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: #7010
Configuration menu - View commit details
-
Copy full SHA for 2e71b36 - Browse repository at this point
Copy the full SHA 2e71b36View commit details -
Pack/unpack functionality to convert tables to and from a serialized …
…format. (#7096) Addresses #3793 Depends on #6864 (This affects contiguous_split.cu. For the purposes of this PR, the only changes that are relevant are those that involve the generation of metadata) - `pack()` performs a `contiguous_split()` on the incoming table to arrange the memory into a unified device buffer, and generates a host-side metadata buffer. These are returned in the `packed_columns` struct. - unpack() takes the data stored in the `packed_columns` struct and returns a deserialized `table_view` that points into it. The intent of this functionality is as follows (pseudocode) ``` // serialize-side table_view t; packed_columns p = pack(t); send_over_network(p.gpu_data); send_over_network(p.metadata); // deserialize-side packed_columns p = receive_from_network(); table_view t = unpack(p); ``` This PR also renames `contiguous_split_result` to `packed_table` (which is just a bundled `table_view` and `packed_column`) Authors: - @nvdbaranec Approvers: - Jake Hemstad (@jrhemstad) - Paul Taylor (@trxcllnt) - Mike Wilson (@hyperbolic2346) URL: #7096
Configuration menu - View commit details
-
Copy full SHA for fd2d0e2 - Browse repository at this point
Copy the full SHA fd2d0e2View commit details -
Move lists utility function definition out of header (#7266)
Fixes #7265. `cudf::detail::get_num_child_rows()` is currently defined in `cudf/lists/detail/utilities.cuh`. The build pipelines for #7189 are fine, but there seem to be build failures in dependent projects such as `spark-rapids`: ``` [2021-01-31T08:12:10.611Z] /.../workspace/spark/cudf18_nightly/cpp/include/cudf/lists/detail/utilities.cuh:31:18: error: 'cudf::size_type cudf::detail::get_num_child_rows(const cudf::column_view&, rmm::cuda_stream_view)' defined but not used [-Werror=unused-function] [2021-01-31T08:12:10.611Z] static cudf::size_type get_num_child_rows(cudf::column_view const& list_offsets, [2021-01-31T08:12:10.611Z] ^~~~~~~~~~~~~~~~~~ [2021-01-31T08:12:11.981Z] cc1plus: all warnings being treated as errors [2021-01-31T08:12:12.238Z] make[2]: *** [CMakeFiles/cudf_hash.dir/build.make:82: CMakeFiles/cudf_hash.dir/src/hash/hashing.cu.o] Error 1 [2021-01-31T08:12:12.238Z] make[1]: *** [CMakeFiles/Makefile2:220: CMakeFiles/cudf_hash.dir/all] Error 2 ``` In any case, it is less than ideal for the function to be completely defined in the header, especially given that the likes of `hashing.cu` are exposed to it (by way of `scatter.cuh`). This commit moves the function definition to a separate translation unit, without changing implementation or interface. Authors: - MithunR (@mythrocks) Approvers: - @nvdbaranec - Mike Wilson (@hyperbolic2346) - David (@davidwendt) URL: #7266
Configuration menu - View commit details
-
Copy full SHA for fd38b4c - Browse repository at this point
Copy the full SHA fd38b4cView commit details -
addresses part of #6541 Segment sort of lists - [x] lists_column_view segmented_sort - [x] numerical types (cub segmented sort limitation) - [x] sort_lists(table_view) - [x] unit tests closes #4603 Segmented sort - [x] segmented_sort - [x] unit tests. Authors: - Karthikeyan (@karthikeyann) Approvers: - AJ Schmidt (@ajschmidt8) - Keith Kraus (@kkraus14) - Jake Hemstad (@jrhemstad) - Conor Hoekstra (@codereport) URL: #7122
Configuration menu - View commit details
-
Copy full SHA for 369ec98 - Browse repository at this point
Copy the full SHA 369ec98View commit details -
Throw if bool column would cause incorrect result when writing to ORC (…
…#7261) Issue #6763 Authors: - Vukasin Milovanovic (@vuule) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - @nvdbaranec - GALI PREM SAGAR (@galipremsagar) - Keith Kraus (@kkraus14) URL: #7261
Configuration menu - View commit details
-
Copy full SHA for 4f87a59 - Browse repository at this point
Copy the full SHA 4f87a59View commit details -
Update JNI for contiguous_split packed results (#7127)
This PR requires the libcudf changes in #7096, fixing the Java bindings to `contiguous_split` that are broken by that change. This also adds the ability to create a `ContiguousTable` instance without manifesting a `Table` instance and all `ColumnVector` instances underneath it which should prove useful during Spark's shuffle. Authors: - Jason Lowe (@jlowe) Approvers: - Robert (Bobby) Evans (@revans2) - Alessandro Bellina (@abellina) URL: #7127
Configuration menu - View commit details
-
Copy full SHA for 110ef3e - Browse repository at this point
Copy the full SHA 110ef3eView commit details -
Turns out we need version > 5.4 of the junit jupiter engine to support `@TempDir`. - Changed the file mode to match Spark's disk manager. - Changed to use `fstat` to get the file length when appending. - Add tests for when a file already exists. Authors: - Rong Ou (@rongou) Approvers: - Jason Lowe (@jlowe) - Robert (Bobby) Evans (@revans2) URL: #7296
Configuration menu - View commit details
-
Copy full SHA for 1062fbc - Browse repository at this point
Copy the full SHA 1062fbcView commit details -
Improve
assert_eq
handling of scalar (#7220)Closes #7199 Refactors scalar handling inside `assert_eq`. On higher level, this PR proposes a "whitelist" style testing: all compares should go to the "strict equal" code path unless explicitly allowed. This allows the test system to capture all unintended inequality except the ones that's discussed upon. For example, this PR creates two whitelist items: - If the operands overrides `__eq__`, use it to determine equality. - If the operands are floating type, assert approximate equality. For all other cases, the operands should be strictly equal. Note that for testing purposes, `np.nan` are considered equal to itself. Authors: - Michael Wang (@isVoid) Approvers: - GALI PREM SAGAR (@galipremsagar) - @brandon-b-miller URL: #7220
Configuration menu - View commit details
-
Copy full SHA for fc9a00f - Browse repository at this point
Copy the full SHA fc9a00fView commit details -
Prepare Changelog for Automation (#7309)
This PR prepares the changelog to be automatically updated during releases. Authors: - GALI PREM SAGAR (@galipremsagar) Approvers: - Keith Kraus (@kkraus14) - AJ Schmidt (@ajschmidt8) URL: #7309
Configuration menu - View commit details
-
Copy full SHA for 568df5b - Browse repository at this point
Copy the full SHA 568df5bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8334700 - Browse repository at this point
Copy the full SHA 8334700View commit details -
Fix copying dtype metadata after calling libcudf functions (#7271)
Fixes #7249 Copies dtype metadata after calling `ColumnBase.copy()`. Moves logic for copying dtype metadata after calling libcudf functions from `Frame` to `ColumnBase`. Authors: - Ashwin Srinath (@shwina) Approvers: - Keith Kraus (@kkraus14) - GALI PREM SAGAR (@galipremsagar) URL: #7271
Configuration menu - View commit details
-
Copy full SHA for 253dfdf - Browse repository at this point
Copy the full SHA 253dfdfView commit details -
Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) Small PR to provide two fixes: - Use `rmm::device_uvector` in place of `device_vector` to improve efficiency. This is a scratch space, so supplied stream and default memory resource is used. Part of #5380 - Update `sort_helper::grouped_value` docstring to reflect change after use of stable sort. Authors: - Michael Wang (@isVoid) Approvers: - Vukasin Milovanovic (@vuule) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Mark Harris (@harrism) URL: #7256
Configuration menu - View commit details
-
Copy full SHA for fb33b94 - Browse repository at this point
Copy the full SHA fb33b94View commit details -
Fix failing CI ORC test (#7313)
Use a buffer for output in the newly added ORC test. Authors: - Vukasin Milovanovic (@vuule) Approvers: - GALI PREM SAGAR (@galipremsagar) URL: #7313
Configuration menu - View commit details
-
Copy full SHA for 3a52d93 - Browse repository at this point
Copy the full SHA 3a52d93View commit details -
Add Java unit tests for window aggregate 'collect' (#7121)
Add unit tests for aggregate 'collect' with windowing. This PR depends on the PR #7189 . Signed-off-by: Liangcai Li <liangcail@nvidia.com> Authors: - Liangcai Li (@firestarman) Approvers: - MithunR (@mythrocks) - Robert (Bobby) Evans (@revans2) URL: #7121
Configuration menu - View commit details
-
Copy full SHA for e2f6952 - Browse repository at this point
Copy the full SHA e2f6952View commit details
Commits on Feb 5, 2021
-
Fix typo in cudf.core.column.string.extract docs (#7253)
change: on -> one I read the contributing guidelines, but since this is just a documentation fix, I'm not sure which apply. Great library, I just got started using it. A little rough around the edges, but great so far, and well worth some of the added steps. Authors: - Alan deLevie (@adelevie) - AJ Schmidt (@ajschmidt8) Approvers: - GALI PREM SAGAR (@galipremsagar) - Keith Kraus (@kkraus14) - Michael Wang (@isVoid) - Ray Douglass (@raydouglass) URL: #7253
Configuration menu - View commit details
-
Copy full SHA for 3fef7f7 - Browse repository at this point
Copy the full SHA 3fef7f7View commit details -
Remove incorrect std::move call on return variable (#7319)
Returning a unique pointer using `std::move` causes a compile error for gcc 9 and above. Simple fix to remove the incorrect move semantic in `segmented_sort.cu` `get_segment_indices`. Authors: - David (@davidwendt) Approvers: - Karthikeyan (@karthikeyann) - Devavret Makkar (@devavret) URL: #7319
Configuration menu - View commit details
-
Copy full SHA for f1a6616 - Browse repository at this point
Copy the full SHA f1a6616View commit details -
Disallow constructing frames from a ColumnAccessor (#7298)
Constructing a DataFrame from a ColumnAccessor previously had unintended side-effects: ```python In [1]: import cudf In [2]: a = cudf.DataFrame({'a': [1, 2, 3]}) In [3]: a._data['a'].__cuda_array_interface__ Out[3]: {'shape': (3,), 'strides': (8,), 'typestr': '<i8', 'data': (140409137266688, False), 'version': 1} In [4]: a[['a']] Out[4]: a 0 1 1 2 2 3 In [5]: a._data['a'].__cuda_array_interface__ Out[5]: {'shape': (3,), 'strides': (8,), 'typestr': '<i8', 'data': (140409137267200, False), 'version': 1} ``` In a discussion with @galipremsagar - we decided that it's probably best not to handle `ColumnAccessor` in the frame constructors. * Remove special handling of `ColumnAccessor` in `Frame` constructors * Collapse `Series.copy()` and `DataFrame.copy()` into a single `Frame.copy()` Authors: - Ashwin Srinath (@shwina) - GALI PREM SAGAR (@galipremsagar) Approvers: - GALI PREM SAGAR (@galipremsagar) URL: #7298
Configuration menu - View commit details
-
Copy full SHA for 26b8c60 - Browse repository at this point
Copy the full SHA 26b8c60View commit details -
Fix bug when
iloc
slice terminates at before-the-zero position (#7277)Closes #7246 This PR fixes a bug in `Dataframe.iloc`. When the slice provided to `iloc`, is decrementing and also terminates at `before-the-zero` position, such as `slice(2, -1, -1)` or `slice(4, None, -1)`, the terminal position still gets wrapped around. `Frame._slice` is moved to `DataFrame._slice` to resolve typing issue. Authors: - Michael Wang (@isVoid) Approvers: - Keith Kraus (@kkraus14) - GALI PREM SAGAR (@galipremsagar) URL: #7277
Configuration menu - View commit details
-
Copy full SHA for 0410a36 - Browse repository at this point
Copy the full SHA 0410a36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 658e91a - Browse repository at this point
Copy the full SHA 658e91aView commit details -
Closes #7311 Authors: - Ashwin Srinath (@shwina) Approvers: - Keith Kraus (@kkraus14) - AJ Schmidt (@ajschmidt8) URL: #7318
Configuration menu - View commit details
-
Copy full SHA for da0e794 - Browse repository at this point
Copy the full SHA da0e794View commit details
Commits on Feb 8, 2021
-
Auto-label PRs based on their content (#7044)
This PR adds the GitHub action [PR Labeler](https://github.com/actions/labeler) to auto-label PRs based on their content. Labeling is managed with a configuration file `.github/labeler.yml` using the following [options](https://github.com/actions/labeler#usage). Authors: - Joseph (@jolorunyomi) - Mike Wendt (@mike-wendt) Approvers: - AJ Schmidt (@ajschmidt8) - Keith Kraus (@kkraus14) - Mike Wendt (@mike-wendt) URL: #7044
Configuration menu - View commit details
-
Copy full SHA for a86d5dd - Browse repository at this point
Copy the full SHA a86d5ddView commit details
Commits on Feb 9, 2021
-
Unpin from numpy < 1.20 (#7335)
Authors: - Ashwin Srinath (@shwina) Approvers: - Keith Kraus (@kkraus14) - @jakirkham - Ray Douglass (@raydouglass) URL: #7335
Configuration menu - View commit details
-
Copy full SHA for d3f5add - Browse repository at this point
Copy the full SHA d3f5addView commit details
Commits on Feb 16, 2021
-
Add GHA to mark issues/prs as stale/rotten (#7388)
Issues and PRs without activity for 30d will be marked as stale. If there is no activity for 90d, they will be marked as rotten. Authors: - Jordan Jacobelli (@Ethyling) Approvers: - Dillon Cullinan (@dillon-cullinan) URL: #7388
Configuration menu - View commit details
-
Copy full SHA for 26c2dfe - Browse repository at this point
Copy the full SHA 26c2dfeView commit details
Commits on Feb 17, 2021
-
Update stale GHA with exemptions & new labels (#7395)
Follows #7388 Updates the stale GHA with the following changes: - [x] Uses `inactive-30d` and `inactive-90d` labels instead of `stale` and `rotten` - [x] Updates comments to reflect changes in labels - [x] Exempts the following labels from being marked `inactive-30d` or `inactive-90d` - `0 - Blocked` - `0 - Backlog` - `good first issue` Authors: - Mike Wendt (@mike-wendt) Approvers: - Keith Kraus (@kkraus14) - Ray Douglass (@raydouglass) URL: #7395
Configuration menu - View commit details
-
Copy full SHA for 53ed28e - Browse repository at this point
Copy the full SHA 53ed28eView commit details
Commits on Feb 24, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 1544474 - Browse repository at this point
Copy the full SHA 1544474View commit details