Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NaN handling in drop_list_duplicates #7662

Merged
merged 38 commits into from
Mar 31, 2021
Merged
Changes from 2 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
81e0d79
Add tests for drop_list_duplicates in case of input containing floati…
ttnghia Mar 17, 2021
e431549
Add negative NaN into the tests
ttnghia Mar 19, 2021
84b06a3
Rewrite tests: split tests into smaller tests with some improvements
ttnghia Mar 19, 2021
34305f2
Some improvement to floating point tests with NaNs
ttnghia Mar 19, 2021
fa46446
Add customized comparators for drop_list_duplicates, still need to up…
ttnghia Mar 20, 2021
452835b
Some cleanup
ttnghia Mar 20, 2021
42535a0
Rewrite doc for element_comparator and element_comparator_fn
ttnghia Mar 20, 2021
2b5b8e4
Using type_dispatcher only for host code
ttnghia Mar 22, 2021
dfa1c8a
Fix memory access violation bug when using reference to column_device…
ttnghia Mar 22, 2021
f2c4d5d
Merge remote-tracking branch 'origin/branch-0.19' into fix_nan_drop_l…
ttnghia Mar 22, 2021
9d07cc7
Add test case when the list contains both -0.0 and 0.0
ttnghia Mar 22, 2021
ef6d7e2
Rename constants
ttnghia Mar 23, 2021
725155b
Change has_null from template parameter to runtime parameter
ttnghia Mar 23, 2021
97815de
Merge branch 'branch-0.19' into fix_nan_drop_list_duplicates
ttnghia Mar 23, 2021
6f76f8e
Remove redundant qualifiers from class constructor
ttnghia Mar 23, 2021
c66b88f
AddAdd `nan_equality` enum to specify whether NaN elements should be …
ttnghia Mar 25, 2021
806b900
Rewrite `drop_list_duplicate`, adding `nans_equal` parameter, allowin…
ttnghia Mar 25, 2021
d96bc79
Rewrite tests for `drop_list_duplicates`
ttnghia Mar 25, 2021
0246c78
Rewrite `collect_set_aggregation`, adding `nans_equal` parameter
ttnghia Mar 25, 2021
2185d8a
Fix typo
ttnghia Mar 25, 2021
919e859
Change `nan_equality` enum names
ttnghia Mar 25, 2021
a002e62
Fix enum in unit tests for `drop_list_duplicates`
ttnghia Mar 25, 2021
26cae09
Add an option to specify NaNs are compared equal only if they have th…
ttnghia Mar 25, 2021
4356bf6
Rework `drop_list_duplicates` for the new `nan_equality` option
ttnghia Mar 25, 2021
e4cfa11
Rewrite unit tests for `drop_list_duplicates` that can test for all c…
ttnghia Mar 25, 2021
98ec9b1
Revert "Rewrite unit tests for `drop_list_duplicates` that can test f…
ttnghia Mar 25, 2021
7a9f850
Revert "Rework `drop_list_duplicates` for the new `nan_equality` option"
ttnghia Mar 25, 2021
04b120d
Revert "Add an option to specify NaNs are compared equal only if they…
ttnghia Mar 25, 2021
0973be9
Fix typo
ttnghia Mar 25, 2021
fa035a6
Avoid initialize-then-assign
ttnghia Mar 30, 2021
8520f0d
Replace `thrust::any_of` by `thrust::count_if`
ttnghia Mar 30, 2021
49bfe13
Copy column by constructor
ttnghia Mar 30, 2021
3d50d8e
Merge remote-tracking branch 'origin/branch-0.19' into fix_nan_drop_l…
ttnghia Mar 30, 2021
27b5beb
Minor cleanup
ttnghia Mar 30, 2021
1406a25
Replace `is_null` by `is_null_nocheck`
ttnghia Mar 30, 2021
32b4393
Replace `make_numeric_column` by `device_uvector`
ttnghia Mar 30, 2021
b5af91e
Rewrite comments, and add a condition check for nans_equal == ALL_EQU…
ttnghia Mar 30, 2021
7812e05
Merge branch 'branch-0.19' into fix_nan_drop_list_duplicates
ttnghia Mar 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 3 additions & 21 deletions cpp/src/lists/drop_list_duplicates.cu
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ namespace cudf {
namespace lists {
namespace detail {
namespace {
using offset_type = lists_column_view::offset_type;

template <typename Type>
struct has_negative_nans {
column_device_view const d_entries;
Expand Down Expand Up @@ -114,25 +112,9 @@ std::unique_ptr<column> replace_negative_nans_entries(column_view const& lists_e
lists_column_view const& lists_column,
rmm::cuda_stream_view stream)
{
auto const mr = rmm::mr::get_current_device_resource();

// Copy list offsets from the given lists column to the new lists column
auto new_offsets = std::make_unique<column>(
data_type{type_id::INT32},
lists_column.size() + 1,
rmm::device_buffer{(lists_column.size() + 1) * sizeof(offset_type), stream, mr});
thrust::copy_n(rmm::exec_policy(stream),
lists_column.offsets_begin(),
lists_column.size() + 1,
new_offsets->mutable_view().begin<offset_type>());

// Copy entries from the given lists column to the new lists column, replacing all -NaNs by NaNs
auto new_entries = std::make_unique<column>(
lists_entries.type(),
lists_entries.size(),
rmm::device_buffer{lists_entries.size() * cudf::size_of(lists_entries.type()), stream, mr},
cudf::detail::copy_bitmask(lists_entries, stream, mr),
lists_entries.null_count());
auto const mr = rmm::mr::get_current_device_resource();
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
auto new_offsets = std::make_unique<column>(lists_column.offsets());
auto new_entries = std::make_unique<column>(lists_entries);

type_dispatcher(lists_entries.type(),
detail::replace_negative_nans_fn{},
Expand Down