Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add dictionary support to cudf::reduce #6666

Merged
merged 46 commits into from
Dec 1, 2020
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
bedcfda
Add cudf::dictionary::make_dictionary_pair_iterator
davidwendt Nov 3, 2020
f8e523d
Add dictionary support to cudf::reduce
davidwendt Nov 4, 2020
890d21b
fixed hardcoded dictionary index type
davidwendt Nov 4, 2020
7034379
add support for some compound reduce ops
davidwendt Nov 4, 2020
c846f9f
Merge branch 'branch-0.17' into dictionary-make-pair-iterator
davidwendt Nov 4, 2020
92a185a
Merge branch 'dictionary-make-pair-iterator' into dictionary-reduce
davidwendt Nov 4, 2020
24a4ed9
Merge branch 'branch-0.17' into dictionary-make-pair-iterator
davidwendt Nov 5, 2020
ac3801e
Merge branch 'dictionary-make-pair-iterator' into dictionary-reduce
davidwendt Nov 5, 2020
b2da2c0
add detail::get_element declaration
davidwendt Nov 6, 2020
d70eba4
remove unneeded includes
davidwendt Nov 6, 2020
022afe0
do min/max on dictionary indices
davidwendt Nov 6, 2020
3b2ece0
remove 2nd dispatch to improve compile time
davidwendt Nov 6, 2020
dc78bee
fix merge conflict
davidwendt Nov 6, 2020
ccbdb38
refactor any/all to hardcode expected BOOL8 output type
davidwendt Nov 9, 2020
8b49010
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 9, 2020
0208389
remove unneeded includes
davidwendt Nov 11, 2020
cddba90
fix merge conflicts
davidwendt Nov 11, 2020
ffa7dd6
re-add dictionary to refactored reduce without 2nd dispatch
davidwendt Nov 11, 2020
bc7304f
use detail::get_element with stream parameter
davidwendt Nov 11, 2020
70bf53c
use dictionary keys for min,max,any,all ops
davidwendt Nov 12, 2020
399d7b9
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 12, 2020
190aae5
add detail quantile function and header
davidwendt Nov 12, 2020
2e12845
use detail quantile function
davidwendt Nov 12, 2020
d180c93
add tests for median and quantile; also use dictionary_column_wrapper
davidwendt Nov 12, 2020
f221c20
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 12, 2020
c739489
add more doxygen comments
davidwendt Nov 13, 2020
e7969bd
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 13, 2020
2508c37
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 13, 2020
6e5b1d9
remove unneeded include
davidwendt Nov 13, 2020
3cc9735
simplified dictionary keys logic for min,max,any,all ops
davidwendt Nov 13, 2020
78b252d
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 13, 2020
a806b0d
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 17, 2020
50baa74
add specialization for dictionary in cudf::reduction::simple::detail:…
davidwendt Nov 19, 2020
83f0198
use dictionary32 reduce for min/max ops
davidwendt Nov 19, 2020
aa196b9
fix any()/all() ops to honor sliced columns
davidwendt Nov 19, 2020
f12a769
fix min()/max() for sliced dictionary
davidwendt Nov 19, 2020
06d7359
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 19, 2020
0dcf147
fix merge conflicts
davidwendt Nov 20, 2020
dc6a47d
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 21, 2020
7ca0c94
remove redundant detail/quantile.hpp file
davidwendt Nov 21, 2020
40e661a
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 24, 2020
28e93bc
fix merge conflict
davidwendt Nov 24, 2020
62d1f4a
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 28, 2020
4cdd121
update assert to include mention of unsigned
davidwendt Nov 29, 2020
05bad28
Merge branch 'branch-0.17' into dictionary-reduce
davidwendt Nov 29, 2020
95f02a7
update stream.value() to just stream for detail fn calls
davidwendt Nov 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
- PR #6644 Cover different CSV reader/writer options in benchmarks
- PR #6741 Cover different ORC and Parquet reader/writer options in benchmarks
- PR #6651 Add cudf::dictionary::make_dictionary_pair_iterator
- PR #6666 Add dictionary support to `cudf::reduce`
- PR #6635 Add cudf::test::dictionary_column_wrapper class
- PR #6702 Fix orc read corruption on boolean column
- PR #6676 Add dictionary support to `cudf::quantile`
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,12 @@ test:
- test -f $PREFIX/include/cudf/detail/null_mask.hpp
- test -f $PREFIX/include/cudf/detail/nvtx/nvtx3.hpp
- test -f $PREFIX/include/cudf/detail/nvtx/ranges.hpp
- test -f $PREFIX/include/cudf/detail/quantiles.hpp
davidwendt marked this conversation as resolved.
Show resolved Hide resolved
- test -f $PREFIX/include/cudf/detail/reduction_functions.hpp
- test -f $PREFIX/include/cudf/detail/repeat.hpp
- test -f $PREFIX/include/cudf/detail/replace.hpp
- test -f $PREFIX/include/cudf/detail/reshape.hpp
- test -f $PREFIX/include/cudf/detail/round.hpp
- test -f $PREFIX/include/cudf/detail/quantiles.hpp
- test -f $PREFIX/include/cudf/detail/scatter.hpp
- test -f $PREFIX/include/cudf/detail/search.hpp
- test -f $PREFIX/include/cudf/detail/sequence.hpp
Expand Down
7 changes: 5 additions & 2 deletions cpp/include/cudf/column/column_device_view.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -705,14 +705,17 @@ __device__ inline string_view const column_device_view::element<string_view>(
* The basic dictionary elements are the indices which can be any index type.
*/
struct index_element_fn {
template <typename IndexType, std::enable_if_t<is_index_type<IndexType>()>* = nullptr>
template <
typename IndexType,
std::enable_if_t<is_index_type<IndexType>() and std::is_unsigned<IndexType>::value>* = nullptr>
__device__ size_type operator()(column_device_view const& input, size_type index)
{
return static_cast<size_type>(input.element<IndexType>(index));
}
template <typename IndexType,
typename... Args,
std::enable_if_t<not is_index_type<IndexType>()>* = nullptr>
std::enable_if_t<not(is_index_type<IndexType>() and
std::is_unsigned<IndexType>::value)>* = nullptr>
__device__ size_type operator()(Args&&... args)
{
release_assert(false and "indices must be an integral type");
davidwendt marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
14 changes: 0 additions & 14 deletions cpp/include/cudf/detail/reduction.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -131,20 +131,6 @@ std::unique_ptr<scalar> reduce(InputIterator d_in,
return std::unique_ptr<scalar>(s);
}

// @brief dictionary specialization of simple reduction
template <typename Op,
typename InputIterator,
typename OutputType = typename thrust::iterator_value<InputIterator>::type,
typename std::enable_if_t<std::is_same<OutputType, dictionary32>::value>* = nullptr>
std::unique_ptr<scalar> reduce(InputIterator d_in,
cudf::size_type num_items,
op::simple_op<Op> sop,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_FAIL("dictionary type not supported");
}

/** --------------------------------------------------------------------------*
* @brief compute reduction by the compound operator (reduce and transform)
*
Expand Down
37 changes: 23 additions & 14 deletions cpp/src/copying/get_element.cu
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

#include <cudf/column/column_device_view.cuh>
#include <cudf/copying.hpp>
#include <cudf/detail/indexalator.cuh>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/scalar/scalar_factories.hpp>
Expand Down Expand Up @@ -84,25 +85,33 @@ struct get_element_functor {
cudaStream_t stream = 0,
rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())
{
auto dict_view = dictionary_column_view(input);
auto key_index_scalar =
get_element_functor{}.operator()<int32_t>(dict_view.indices(), index, stream);

size_type key_index =
static_cast<numeric_scalar<int32_t> const *>(key_index_scalar.get())->value(stream);
auto result = type_dispatcher(
dict_view.keys().type(), get_element_functor{}, dict_view.keys(), key_index, stream, mr);

auto result_validity = result->validity_data();
auto device_col = column_device_view::create(input, stream);
auto dict_view = dictionary_column_view(input);
auto indices_iter = detail::indexalator_factory::make_input_iterator(dict_view.indices());
numeric_scalar<size_type> key_index_scalar{index, true, stream};
auto d_key_index = get_scalar_device_view(key_index_scalar);
auto d_col = column_device_view::create(input, stream);

// retrieve the indices value at index
device_single_thread(
[result_validity, d_col = *device_col, index] __device__() mutable {
*result_validity = d_col.is_valid(index);
[d_key_index, d_col = *d_col, indices_iter, index] __device__() mutable {
d_key_index.set_value(indices_iter[index]);
d_key_index.set_valid(d_col.is_valid(index));
},
stream);

return result;
if (!key_index_scalar.is_valid(stream)) {
auto null_result = make_default_constructed_scalar(dict_view.keys().type());
null_result->set_valid(false, stream);
return null_result;
}

// retrieve the key element using the key-index
return type_dispatcher(dict_view.keys().type(),
get_element_functor{},
dict_view.keys(),
key_index_scalar.value(stream),
stream,
mr);
}

template <typename T, std::enable_if_t<std::is_same<T, list_view>::value> *p = nullptr>
Expand Down
48 changes: 11 additions & 37 deletions cpp/src/quantiles/quantile.cu
Original file line number Diff line number Diff line change
Expand Up @@ -140,54 +140,28 @@ std::unique_ptr<column> quantile(column_view const& input,
std::unique_ptr<column> quantile(column_view const& input,
std::vector<double> const& q,
interpolation interp,
column_view const& ordered_indices,
column_view const& indices,
bool exact,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
if (ordered_indices.is_empty()) {
if (indices.is_empty()) {
auto begin = thrust::make_counting_iterator<size_type>(0);
if (exact) {
return detail::quantile<true>(input,
thrust::make_counting_iterator<size_type>(0),
input.size(),
q,
interp,
exact,
stream,
mr);
return quantile<true>(input, begin, input.size(), q, interp, exact, stream, mr);
} else {
return detail::quantile<false>(input,
thrust::make_counting_iterator<size_type>(0),
input.size(),
q,
interp,
exact,
stream,
mr);
return quantile<false>(input, begin, input.size(), q, interp, exact, stream, mr);
}

} else {
CUDF_EXPECTS(ordered_indices.type() == data_type{type_to_id<size_type>()},
"`ordered_indicies` type must be `INT32`.");

CUDF_EXPECTS(indices.type() == data_type{type_to_id<size_type>()},
"`indicies` type must be `INT32`.");
if (exact) {
return detail::quantile<true>(input,
ordered_indices.data<size_type>(),
ordered_indices.size(),
q,
interp,
exact,
stream,
mr);
return quantile<true>(
input, indices.begin<size_type>(), indices.size(), q, interp, exact, stream, mr);
} else {
return detail::quantile<false>(input,
ordered_indices.data<size_type>(),
ordered_indices.size(),
q,
interp,
exact,
stream,
mr);
return quantile<false>(
input, indices.begin<size_type>(), indices.size(), q, interp, exact, stream, mr);
}
}
}
Expand Down
10 changes: 5 additions & 5 deletions cpp/src/reductions/all.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,9 @@
* limitations under the License.
*/

#include <reductions/simple.cuh>

#include <cudf/detail/reduction_functions.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <reductions/simple.cuh>

namespace cudf {
namespace reduction {
Expand All @@ -30,7 +28,9 @@ std::unique_ptr<cudf::scalar> all(column_view const& col,
{
CUDF_EXPECTS(output_dtype == cudf::data_type(cudf::type_id::BOOL8),
"all() operation can be applied with output type `BOOL8` only");
return cudf::type_dispatcher(col.type(),
auto const dispatch_type =
cudf::is_dictionary(col.type()) ? dictionary_column_view(col).keys().type() : col.type();
return cudf::type_dispatcher(dispatch_type,
simple::bool_result_element_dispatcher<cudf::reduction::op::min>{},
col,
stream,
Expand Down
10 changes: 5 additions & 5 deletions cpp/src/reductions/any.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,9 @@
* limitations under the License.
*/

#include <reductions/simple.cuh>

#include <cudf/detail/reduction_functions.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <reductions/simple.cuh>

namespace cudf {
namespace reduction {
Expand All @@ -30,7 +28,9 @@ std::unique_ptr<cudf::scalar> any(column_view const& col,
{
CUDF_EXPECTS(output_dtype == cudf::data_type(cudf::type_id::BOOL8),
"any() operation can be applied with output type `bool8` only");
return cudf::type_dispatcher(col.type(),
auto const dispatch_type =
cudf::is_dictionary(col.type()) ? dictionary_column_view(col).keys().type() : col.type();
return cudf::type_dispatcher(dispatch_type,
simple::bool_result_element_dispatcher<cudf::reduction::op::max>{},
col,
stream,
Expand Down
48 changes: 30 additions & 18 deletions cpp/src/reductions/compound.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,11 @@
#pragma once

#include <cudf/detail/reduction.cuh>

#include <cudf/dictionary/detail/iterator.cuh>
#include <cudf/scalar/scalar_factories.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_dispatcher.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/device_scalar.hpp>

namespace cudf {
namespace reduction {
namespace compound {
Expand Down Expand Up @@ -58,23 +56,37 @@ std::unique_ptr<scalar> compound_reduction(column_view const& col,
std::unique_ptr<scalar> result;
Op compound_op{};

if (col.has_nulls()) {
auto it = thrust::make_transform_iterator(
dcol->pair_begin<ElementType, true>(),
compound_op.template get_null_replacing_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
if (!cudf::is_dictionary(col.type())) {
if (col.has_nulls()) {
auto it = thrust::make_transform_iterator(
dcol->pair_begin<ElementType, true>(),
compound_op.template get_null_replacing_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
} else {
auto it = thrust::make_transform_iterator(
dcol->begin<ElementType>(), compound_op.template get_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
}
} else {
auto it = thrust::make_transform_iterator(
dcol->begin<ElementType>(), compound_op.template get_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
if (col.has_nulls()) {
auto it = thrust::make_transform_iterator(
cudf::dictionary::detail::make_dictionary_pair_iterator<ElementType, true>(*dcol),
compound_op.template get_null_replacing_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
} else {
auto it = thrust::make_transform_iterator(
cudf::dictionary::detail::make_dictionary_iterator<ElementType>(*dcol),
compound_op.template get_element_transformer<ResultType>());
result = detail::reduce<Op, decltype(it), ResultType>(
it, col.size(), compound_op, valid_count, ddof, stream, mr);
}
}

// set scalar is valid
if (col.null_count() < col.size())
result->set_valid(true, stream);
else
result->set_valid(false, stream);
result->set_valid(col.null_count() < col.size(), stream);
rgsl888prabhu marked this conversation as resolved.
Show resolved Hide resolved
return result;
};

Expand Down
18 changes: 13 additions & 5 deletions cpp/src/reductions/max.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
* limitations under the License.
*/

#include <reductions/simple.cuh>

#include <cudf/detail/reduction_functions.hpp>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <reductions/simple.cuh>

#include <rmm/cuda_stream_view.hpp>

Expand All @@ -28,9 +28,17 @@ std::unique_ptr<cudf::scalar> max(column_view const& col,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_EXPECTS(col.type() == output_dtype, "max() operation requires matching output type");
return cudf::type_dispatcher(
col.type(), simple::same_element_type_dispatcher<cudf::reduction::op::max>{}, col, stream, mr);
auto const input_type =
cudf::is_dictionary(col.type()) ? cudf::dictionary_column_view(col).keys().type() : col.type();
CUDF_EXPECTS(input_type == output_dtype, "max() operation requires matching output type");
auto const dispatch_type = cudf::is_dictionary(col.type())
? cudf::dictionary_column_view(col).indices().type()
: col.type();
return cudf::type_dispatcher(dispatch_type,
simple::same_element_type_dispatcher<cudf::reduction::op::max>{},
col,
stream,
mr);
}

} // namespace reduction
Expand Down
8 changes: 5 additions & 3 deletions cpp/src/reductions/mean.cu
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
*/
// The translation unit for reduction `mean`

#include "compound.cuh"

#include <cudf/detail/reduction_functions.hpp>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <reductions/compound.cuh>

#include <rmm/cuda_stream_view.hpp>

Expand All @@ -27,6 +27,8 @@ std::unique_ptr<cudf::scalar> cudf::reduction::mean(column_view const& col,
rmm::mr::device_memory_resource* mr)
{
using reducer = cudf::reduction::compound::element_type_dispatcher<cudf::reduction::op::mean>;
auto col_type =
cudf::is_dictionary(col.type()) ? dictionary_column_view(col).keys().type() : col.type();
return cudf::type_dispatcher(
col.type(), reducer(), col, output_dtype, /* ddof is not used for mean*/ 1, stream, mr);
col_type, reducer(), col, output_dtype, /* ddof is not used for mean*/ 1, stream, mr);
rgsl888prabhu marked this conversation as resolved.
Show resolved Hide resolved
}
20 changes: 13 additions & 7 deletions cpp/src/reductions/min.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,9 @@
* limitations under the License.
*/

#include <reductions/simple.cuh>

#include <cudf/detail/reduction_functions.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <reductions/simple.cuh>

namespace cudf {
namespace reduction {
Expand All @@ -28,9 +26,17 @@ std::unique_ptr<cudf::scalar> min(column_view const& col,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_EXPECTS(col.type() == output_dtype, "min() operation requires matching output type");
return cudf::type_dispatcher(
col.type(), simple::same_element_type_dispatcher<cudf::reduction::op::min>{}, col, stream, mr);
auto const input_type =
cudf::is_dictionary(col.type()) ? cudf::dictionary_column_view(col).keys().type() : col.type();
CUDF_EXPECTS(input_type == output_dtype, "min() operation requires matching output type");
auto const dispatch_type = cudf::is_dictionary(col.type())
? cudf::dictionary_column_view(col).indices().type()
: col.type();
return cudf::type_dispatcher(dispatch_type,
simple::same_element_type_dispatcher<cudf::reduction::op::min>{},
col,
stream,
mr);
}

} // namespace reduction
Expand Down
Loading