Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v0.18 #7405

Merged
merged 204 commits into from
Feb 24, 2021
Merged

[RELEASE] cudf v0.18 #7405

merged 204 commits into from
Feb 24, 2021

Conversation

GPUtester
Copy link
Collaborator

❄️ Code freeze for branch-0.18 and v0.18 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-0.18 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-0.18 into main for the release

ajschmidt8 and others added 30 commits November 24, 2020 15:47
Add a cmake find module to locate cuFile. If found, add the include directory and link to the shared library.

This shouldn't have any effect if cuFile is not installed locally.
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
)

This implements the `non_numeric` argument for `DataFrame.quantile` meaning that it now works on `datetime` and `timedelta` data. However, because of the difference in how `DataFrame.iloc` behaves between Pandas and cuDF, this implementation returns a DataFrame when `non_numeric=False` even when Pandas returns a Series

Passes tests locally

This closes #6799

Authors:
  - Chris Jarrett <cjarrett@dt08.aselab.nvidia.com>
  - ChrisJar <chris.jarrett.0@gmail.com>

Approvers:
  - Keith Kraus

URL: #6902
When using parameter `--rmm_mode=managed` for gtests `Invalid RMM allocation mode: managed` exception is thrown.
The logic in `include/cudf_test/base_fixture.hpp` is just missing a return statement.

Authors:
  - davidwendt <dwendt@nvidia.com>

Approvers:
  - Paul Taylor
  - Mark Harris

URL: #6912
Resolves: #6870 

This PR adds support for `set_names` API in both `Index` & `MultiIndex`.

Authors:
  - galipremsagar <sagarprem75@gmail.com>
  - GALI PREM SAGAR <sagarprem75@gmail.com>

Approvers:
  - Keith Kraus

URL: #6929
Fixes: #6821 

This PR fixes issue where `columns` and `index` are currently not being handled correctly in specific scenarios.

Authors:
  - galipremsagar <sagarprem75@gmail.com>
  - GALI PREM SAGAR <sagarprem75@gmail.com>

Approvers:
  - Richard (Rick) Zamora
  - Ashwin Srinath

URL: #6838
Update to libcu++ on Github.

Authors:
  - ptaylor <paul.e.taylor@me.com>
  - Paul Taylor <paul.e.taylor@me.com>

Approvers:
  - Mark Harris
  - Keith Kraus
  - Christopher Harris
  - Mark Harris

URL: #6275
This PR removes `**kwargs` from the string/categorical accessors where unnecessary, and exposes keyword arguments like `inplace` to the user directly.

If we want to maintain parity with Pandas APIs for Dask/others using cuDF internally, we can consider using the approach described in #6135, which will automatically raise `NotimplementedError` when unsupported kwargs are passed.

Authors:
  - Ashwin Srinath <shwina@users.noreply.github.com>

Approvers:
  - GALI PREM SAGAR
  - Keith Kraus
  - Keith Kraus

URL: #6750
Fixes #6682, #6680 

Currently, empty fields are treated as N/A regardless on parsing options. However, the desired behavior is to handle empty fields the same way as fields with special values (apply default_na_values, na_filter logic). 
This PR irons out the behavior so it matches Pandas in this regard.

- Tries now support matching empty strings.
- The list of special NA values is now generated more robustly, so it has correct elements in any parameter combination.
- Empty string is added to the list of special NA values.
- Empty string string ("/"/"") is added to NA value list if empty string ("") is included (mirrors Pandas behavior).
- Added tests for previously failing parameter combinations.
- Reworked some of the tests to check against Pandas results instead of assumed desired behavior.

Authors:
  - vuule <vmilovanovic@nvidia.com>
  - vuule <vukasin.milovanovic.87@gmail.com>
  - Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com>
  - Vukasin Milovanovic <vmilovanovic@nvidia.com>

Approvers:
  - Ram (Ramakrishna Prabhu)
  - Christopher Harris
  - Keith Kraus

URL: #6922
The include directory was renamed from `simt` to `cuda`.

Authors:
  - Rong Ou <rong.ou@gmail.com>

Approvers:
  - Jason Lowe

URL: #6948
The `cudf::merge` API expects the key columns to be sorted. This means that if null rows are included, these null entries should all appear either at beginning or at the end of the column depending on the null_order for the sort. The `MergeDictionaryTest.WithNull` gtest placed null rows in the middle of the column. The expected results should also have included null entries at the beginning or the end.

This PR also includes an extra test for checking merge results are consistent with the sort parameters `cudf::order` and `cudf::null_order`. This test also includes a larger number of rows to ensure `thrust::merge` requires more than one tile/block in its runtime logic.

Authors:
  - davidwendt <dwendt@nvidia.com>

Approvers:
  - Ram (Ramakrishna Prabhu)
  - Vukasin Milovanovic

URL: #6942
Updating the Java bindings package version to match the libcudf version.

Authors:
  - Jason Lowe <jlowe@nvidia.com>

Approvers:
  - Robert (Bobby) Evans

URL: #6949
shwina and others added 14 commits February 4, 2021 18:35
Fixes #7249

Copies dtype metadata after calling `ColumnBase.copy()`. Moves logic for copying dtype metadata after calling libcudf functions from `Frame` to `ColumnBase`.

Authors:
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)
  - GALI PREM SAGAR (@galipremsagar)

URL: #7271
#7256)

Small PR to provide two fixes:
- Use `rmm::device_uvector` in place of `device_vector` to improve efficiency. This is a scratch space, so supplied stream and default memory resource is used. Part of #5380
- Update `sort_helper::grouped_value` docstring to reflect change after use of stable sort.

Authors:
  - Michael Wang (@isVoid)

Approvers:
  - Vukasin Milovanovic (@vuule)
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)
  - Mark Harris (@harrism)

URL: #7256
Use a buffer for output in the newly added ORC test.

Authors:
  - Vukasin Milovanovic (@vuule)

Approvers:
  - GALI PREM SAGAR (@galipremsagar)

URL: #7313
Add unit tests for aggregate 'collect' with windowing.

This PR depends on the PR #7189 . 

Signed-off-by: Liangcai Li <liangcail@nvidia.com>

Authors:
  - Liangcai Li (@firestarman)

Approvers:
  - MithunR (@mythrocks)
  - Robert (Bobby) Evans (@revans2)

URL: #7121
change: on -> one

I read the contributing guidelines, but since this is just a documentation fix, I'm not sure which apply.

Great library, I just got started using it. A little rough around the edges, but great so far, and well worth some of the added steps.

Authors:
  - Alan deLevie (@adelevie)
  - AJ Schmidt (@ajschmidt8)

Approvers:
  - GALI PREM SAGAR (@galipremsagar)
  - Keith Kraus (@kkraus14)
  - Michael Wang (@isVoid)
  - Ray Douglass (@raydouglass)

URL: #7253
Returning a unique pointer using `std::move` causes a compile error for gcc 9 and above.
Simple fix to remove the incorrect move semantic in `segmented_sort.cu` `get_segment_indices`.

Authors:
  - David (@davidwendt)

Approvers:
  - Karthikeyan (@karthikeyann)
  - Devavret Makkar (@devavret)

URL: #7319
Constructing a DataFrame from a ColumnAccessor previously had unintended side-effects:

```python

In [1]: import cudf

In [2]: a = cudf.DataFrame({'a': [1, 2, 3]})

In [3]: a._data['a'].__cuda_array_interface__
Out[3]:
{'shape': (3,),
 'strides': (8,),
 'typestr': '<i8',
 'data': (140409137266688, False),
 'version': 1}

In [4]: a[['a']]
Out[4]:
   a
0  1
1  2
2  3

In [5]: a._data['a'].__cuda_array_interface__
Out[5]:
{'shape': (3,),
 'strides': (8,),
 'typestr': '<i8',
 'data': (140409137267200, False),
 'version': 1}
```

In a discussion with @galipremsagar - we decided that it's probably best not to handle `ColumnAccessor` in the frame constructors. 

* Remove special handling of `ColumnAccessor` in `Frame` constructors
* Collapse `Series.copy()` and `DataFrame.copy()` into a single `Frame.copy()`

Authors:
  - Ashwin Srinath (@shwina)
  - GALI PREM SAGAR (@galipremsagar)

Approvers:
  - GALI PREM SAGAR (@galipremsagar)

URL: #7298
Closes #7246

This PR fixes a bug in `Dataframe.iloc`. When the slice provided to `iloc`, is decrementing and also terminates at `before-the-zero` position, such as `slice(2, -1, -1)` or `slice(4, None, -1)`, the terminal position still gets wrapped around. 

`Frame._slice` is moved to `DataFrame._slice` to resolve typing issue.

Authors:
  - Michael Wang (@isVoid)

Approvers:
  - Keith Kraus (@kkraus14)
  - GALI PREM SAGAR (@galipremsagar)

URL: #7277
This updates the 10 minutes to cuDF and CuPY notebook to use the new methods for moving between cuDF data structures and CuPy arrays.

Closes #7160

Authors:
  - @ChrisJar

Approvers:
  - Ashwin Srinath (@shwina)

URL: #7158
Closes #7311

Authors:
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)
  - AJ Schmidt (@ajschmidt8)

URL: #7318
This PR adds the GitHub action [PR Labeler](https://github.com/actions/labeler) to auto-label PRs based on their content. 

Labeling is managed with a configuration file `.github/labeler.yml` using the following [options](https://github.com/actions/labeler#usage).

Authors:
  - Joseph (@jolorunyomi)
  - Mike Wendt (@mike-wendt)

Approvers:
  - AJ Schmidt (@ajschmidt8)
  - Keith Kraus (@kkraus14)
  - Mike Wendt (@mike-wendt)

URL: #7044
Authors:
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)
  - @jakirkham
  - Ray Douglass (@raydouglass)

URL: #7335
Issues and PRs without activity for 30d will be marked as stale.
If there is no activity for 90d, they will be marked as rotten.

Authors:
  - Jordan Jacobelli (@Ethyling)

Approvers:
  - Dillon Cullinan (@dillon-cullinan)

URL: #7388
Follows #7388

Updates the stale GHA with the following changes:

- [x] Uses `inactive-30d` and `inactive-90d` labels instead of `stale` and `rotten`
- [x] Updates comments to reflect changes in labels
- [x] Exempts the following labels from being marked `inactive-30d` or `inactive-90d`
  - `0 - Blocked`
  - `0 - Backlog`
  - `good first issue`

Authors:
  - Mike Wendt (@mike-wendt)

Approvers:
  - Keith Kraus (@kkraus14)
  - Ray Douglass (@raydouglass)

URL: #7395
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ajschmidt8 ajschmidt8 added non-breaking Non-breaking change and removed non-breaking Non-breaking change labels Feb 23, 2021
@raydouglass raydouglass added the non-breaking Non-breaking change label Feb 24, 2021
@raydouglass raydouglass merged commit b7e1a85 into main Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet