Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-table comparators with strong index types #10730

Merged
merged 35 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
50b8891
Add strong index type.
bdice Apr 16, 2022
b9ed4d7
Revert changes to non-experimental row operators.
bdice Apr 20, 2022
d67f17e
Use enum for strongly typed index.
bdice May 3, 2022
464ed2b
Add two table comparator and adapter.
bdice May 3, 2022
b26b318
Add friends. :)
bdice May 3, 2022
1fd199d
Apply two-table comparator to search algorithms.
bdice May 3, 2022
18bd9f0
Move shared lhs/rhs logic into launch_search.
bdice May 3, 2022
b5b8b39
Improve comments, remove old code.
bdice May 3, 2022
4060b4f
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 11, 2022
73c4b27
Move strong typing code into cudf::experimental::row::lexicographic.
bdice May 11, 2022
9cdbe27
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 13, 2022
c8a38fe
Improve comment.
bdice May 13, 2022
8b5ef34
Fix docstrings.
bdice May 13, 2022
77f85b4
Enable weak ordering machinery (weak_ordering_comparator_impl) to wra…
bdice May 13, 2022
529e944
Remove template template parameters.
bdice May 13, 2022
fb0e192
Use references.
bdice May 13, 2022
56d99ba
Use Ts const...
bdice May 13, 2022
c5998b7
Move strong typing to cudf::experimental::row.
bdice May 13, 2022
b78d978
Use constexpr.
bdice May 13, 2022
3aea8d4
Use custom iterator class.
bdice May 14, 2022
bbaf360
Use __device__ only.
bdice May 14, 2022
4a1d7aa
Add comment.
bdice May 14, 2022
09c5661
Use symmetry of comparator (now possible with weak ordering) to avoid…
bdice May 14, 2022
290323f
Add constexpr to two_table_device_row_comparator_adapter.
bdice May 14, 2022
4c69edd
Remove forward (always accepts lvalues).
bdice May 14, 2022
fbd5b90
Indicate reversed signature.
bdice May 16, 2022
3db6484
Move constructor to implementation, add shape compatibility check.
bdice May 16, 2022
3e81b53
Improve docstrings.
bdice May 16, 2022
1834095
Use thrust::iterator_facade.
bdice May 16, 2022
ff26024
Use const for struct members.
bdice May 17, 2022
f779bff
Slim down the strong index layer by using a templated struct.
bdice May 17, 2022
157abbc
Simplify construction.
bdice May 17, 2022
a2ac19d
Use size_type const where possible.
bdice May 17, 2022
75249e8
Require weakly or strongly typed values for lhs_index and rhs_index.
bdice May 17, 2022
bed1162
Unconstrain template typenames.
bdice May 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 188 additions & 22 deletions cpp/include/cudf/table/experimental/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
#include <cudf/utilities/type_dispatcher.hpp>

#include <thrust/equal.h>
#include <thrust/iterator/iterator_adaptor.h>
#include <thrust/iterator/iterator_facade.h>
#include <thrust/logical.h>
#include <thrust/swap.h>
#include <thrust/transform_reduce.h>
Expand Down Expand Up @@ -69,6 +71,48 @@ struct dispatch_void_if_nested {
};

namespace row {

enum class lhs_index_type : size_type {};
enum class rhs_index_type : size_type {};

template <typename Index, typename Underlying = std::underlying_type_t<Index>>
struct strong_index_iterator : public thrust::iterator_facade<strong_index_iterator<Index>,
Comment on lines +78 to +79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice I realize that I am late to the party here, but this struct has zero docs. A new reader of the code won't have any idea why this struct exists since the purpose is not conveyed by its implementation, but rather by its existence alone. Could you make a PR with some brief documentation (or maybe just stick that into one of the downstream PRs that either you or @ttnghia is working on)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I can create a standalone PR for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index,
thrust::use_default,
thrust::random_access_traversal_tag,
Index,
Underlying> {
using super_t = thrust::iterator_adaptor<strong_index_iterator<Index>, Index>;

explicit constexpr strong_index_iterator(Underlying n) : begin{n} {}

friend class thrust::iterator_core_access;

private:
__device__ constexpr void increment() { ++begin; }
__device__ constexpr void decrement() { --begin; }

__device__ constexpr void advance(Underlying n) { begin += n; }

__device__ constexpr bool equal(strong_index_iterator<Index> const& other) const noexcept
{
return begin == other.begin;
}

__device__ constexpr Index dereference() const noexcept { return static_cast<Index>(begin); }

__device__ constexpr Underlying distance_to(
strong_index_iterator<Index> const& other) const noexcept
{
return other.begin - begin;
}

Underlying begin{};
};

using lhs_iterator = strong_index_iterator<lhs_index_type>;
using rhs_iterator = strong_index_iterator<rhs_index_type>;

namespace lexicographic {

/**
Expand All @@ -91,6 +135,8 @@ namespace lexicographic {
template <typename Nullate>
class device_row_comparator {
friend class self_comparator;
friend class two_table_comparator;

/**
* @brief Construct a function object for performing a lexicographic
* comparison between the rows of two tables.
Expand Down Expand Up @@ -183,9 +229,9 @@ class device_row_comparator {

template <typename Element,
CUDF_ENABLE_IF(not cudf::is_relationally_comparable<Element, Element>() and
not std::is_same_v<Element, cudf::struct_view>),
typename... Args>
__device__ cuda::std::pair<weak_ordering, int> operator()(Args...) const noexcept
not std::is_same_v<Element, cudf::struct_view>)>
__device__ cuda::std::pair<weak_ordering, int> operator()(size_type const,
size_type const) const noexcept
{
CUDF_UNREACHABLE("Attempted to compare elements of uncomparable types.");
}
Expand Down Expand Up @@ -234,12 +280,13 @@ class device_row_comparator {
* @brief Checks whether the row at `lhs_index` in the `lhs` table compares
* lexicographically less, greater, or equivalent to the row at `rhs_index` in the `rhs` table.
*
* @param lhs_index The index of row in the `lhs` table to examine
* @param lhs_index The index of the row in the `lhs` table to examine
* @param rhs_index The index of the row in the `rhs` table to examine
* @return weak ordering comparison of the row in the `lhs` table relative to the row in the `rhs`
* table
*/
__device__ weak_ordering operator()(size_type lhs_index, size_type rhs_index) const noexcept
__device__ weak_ordering operator()(size_type const lhs_index,
size_type const rhs_index) const noexcept
{
int last_null_depth = std::numeric_limits<int>::max();
for (size_type i = 0; i < _lhs.num_columns(); ++i) {
Expand Down Expand Up @@ -288,12 +335,14 @@ class device_row_comparator {
*/
template <typename Comparator, weak_ordering... values>
struct weak_ordering_comparator_impl {
__device__ bool operator()(size_type const lhs, size_type const rhs) const noexcept
template <typename LhsType, typename RhsType>
__device__ constexpr bool operator()(LhsType const lhs_index,
RhsType const rhs_index) const noexcept
{
weak_ordering const result = comparator(lhs, rhs);
weak_ordering const result = comparator(lhs_index, rhs_index);
return ((result == values) || ...);
}
Comparator comparator;
Comparator const comparator;
};

/**
Expand All @@ -302,14 +351,12 @@ struct weak_ordering_comparator_impl {
*
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
*/
template <typename Nullate>
using less_comparator =
weak_ordering_comparator_impl<device_row_comparator<Nullate>, weak_ordering::LESS>;
template <typename Comparator>
using less_comparator = weak_ordering_comparator_impl<Comparator, weak_ordering::LESS>;

template <typename Nullate>
using less_equivalent_comparator = weak_ordering_comparator_impl<device_row_comparator<Nullate>,
weak_ordering::LESS,
weak_ordering::EQUIVALENT>;
template <typename Comparator>
using less_equivalent_comparator =
weak_ordering_comparator_impl<Comparator, weak_ordering::LESS, weak_ordering::EQUIVALENT>;

struct preprocessed_table {
using table_device_view_owner =
Expand All @@ -319,7 +366,7 @@ struct preprocessed_table {
* @brief Preprocess table for use with lexicographical comparison
*
* Sets up the table for use with lexicographical comparison. The resulting preprocessed table can
* be passed to the constructor of `lex::self_comparator` to avoid preprocessing again.
* be passed to the constructor of `lexicographic::self_comparator` to avoid preprocessing again.
*
* @param table The table to preprocess
* @param column_order Optional, host array the same length as a row that indicates the desired
Expand All @@ -337,6 +384,7 @@ struct preprocessed_table {

private:
friend class self_comparator;
friend class two_table_comparator;

preprocessed_table(table_device_view_owner&& table,
rmm::device_uvector<order>&& column_order,
Expand Down Expand Up @@ -395,10 +443,10 @@ struct preprocessed_table {
}

private:
table_device_view_owner _t;
rmm::device_uvector<order> _column_order;
rmm::device_uvector<null_order> _null_precedence;
rmm::device_uvector<size_type> _depths;
table_device_view_owner const _t;
rmm::device_uvector<order> const _column_order;
rmm::device_uvector<null_order> const _null_precedence;
rmm::device_uvector<size_type> const _depths;
};

/**
Expand Down Expand Up @@ -459,16 +507,134 @@ class self_comparator {
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
*/
template <typename Nullate>
less_comparator<Nullate> device_comparator(Nullate nullate = {}) const
less_comparator<device_row_comparator<Nullate>> device_comparator(Nullate nullate = {}) const
{
return less_comparator<Nullate>{device_row_comparator<Nullate>(
return less_comparator<device_row_comparator<Nullate>>{device_row_comparator<Nullate>(
nullate, *d_t, *d_t, d_t->depths(), d_t->column_order(), d_t->null_precedence())};
}

private:
std::shared_ptr<preprocessed_table> d_t;
};

template <typename Comparator>
struct strong_index_comparator_adapter {
__device__ constexpr weak_ordering operator()(lhs_index_type const lhs_index,
rhs_index_type const rhs_index) const noexcept
{
return comparator(static_cast<cudf::size_type>(lhs_index),
static_cast<cudf::size_type>(rhs_index));
}

__device__ constexpr weak_ordering operator()(rhs_index_type const rhs_index,
lhs_index_type const lhs_index) const noexcept
{
auto const left_right_ordering =
comparator(static_cast<cudf::size_type>(lhs_index), static_cast<cudf::size_type>(rhs_index));

// Invert less/greater values to reflect right to left ordering
if (left_right_ordering == weak_ordering::LESS) {
return weak_ordering::GREATER;
} else if (left_right_ordering == weak_ordering::GREATER) {
return weak_ordering::LESS;
}
return weak_ordering::EQUIVALENT;
}

Comparator const comparator;
};

/**
* @brief An owning object that can be used to lexicographically compare rows of two different
* tables
*
* This class takes two table_views and preprocesses certain columns to allow for lexicographical
* comparison. The preprocessed table and temporary data required for the comparison are created and
* owned by this class.
*
* Alternatively, `two_table_comparator` can be constructed from two existing
* `shared_ptr<preprocessed_table>`s when sharing the same tables among multiple comparators.
*
* This class can then provide a functor object that can used on the device.
* The object of this class must outlive the usage of the device functor.
*/
class two_table_comparator {
public:
/**
* @brief Construct an owning object for performing a lexicographic comparison between rows of
* two different tables.
*
* The left and right table are expected to have the same number of columns
* and data types for each column.
*
* @param left The left table to compare
* @param right The right table to compare
* @param column_order Optional, host array the same length as a row that indicates the desired
* ascending/descending order of each column in a row. If empty, it is assumed all columns are
* sorted in ascending order.
* @param null_precedence Optional, device array the same length as a row and indicates how null
* values compare to all other for every column. If empty, then null precedence would be
* `null_order::BEFORE` for all columns.
* @param stream The stream to construct this object on. Not the stream that will be used for
* comparisons using this object.
*/
two_table_comparator(table_view const& left,
table_view const& right,
host_span<order const> column_order = {},
host_span<null_order const> null_precedence = {},
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

/**
* @brief Construct an owning object for performing a lexicographic comparison between two rows of
* the same preprocessed table.
*
* This constructor allows independently constructing a `preprocessed_table` and sharing it among
* multiple comparators.
*
* @param left A table preprocessed for lexicographic comparison
* @param right A table preprocessed for lexicographic comparison
*/
two_table_comparator(std::shared_ptr<preprocessed_table> left,
std::shared_ptr<preprocessed_table> right)
: d_left_table{std::move(left)}, d_right_table{std::move(right)}
{
}

/**
* @brief Return the binary operator for comparing rows in the table.
*
* Returns a binary callable, `F`, with signatures
* `bool F(lhs_index_type, rhs_index_type)` and
* `bool F(rhs_index_type, lhs_index_type)`.
*
* `F(lhs_index_type i, rhs_index_type j)` returns true if and only if row
* `i` of the left table compares lexicographically less than row `j` of the
* right table.
*
* Similarly, `F(rhs_index_type i, lhs_index_type j)` returns true if and
* only if row `i` of the right table compares lexicographically less than row
* `j` of the left table.
*
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
*/
template <typename Nullate>
less_comparator<strong_index_comparator_adapter<device_row_comparator<Nullate>>>
device_comparator(Nullate nullate = {}) const
{
return less_comparator<strong_index_comparator_adapter<device_row_comparator<Nullate>>>{
device_row_comparator<Nullate>(nullate,
*d_left_table,
*d_right_table,
d_left_table->depths(),
d_left_table->column_order(),
d_left_table->null_precedence())};
}

private:
std::shared_ptr<preprocessed_table> d_left_table;
std::shared_ptr<preprocessed_table> d_right_table;
};

} // namespace lexicographic

namespace hash {
Expand Down
2 changes: 1 addition & 1 deletion cpp/include/cudf/table/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ class row_lexicographic_comparator {
* @brief Checks whether the row at `lhs_index` in the `lhs` table compares
* lexicographically less than the row at `rhs_index` in the `rhs` table.
*
* @param lhs_index The index of row in the `lhs` table to examine
* @param lhs_index The index of the row in the `lhs` table to examine
* @param rhs_index The index of the row in the `rhs` table to examine
* @return `true` if row from the `lhs` table compares less than row in the
* `rhs` table
Expand Down
1 change: 1 addition & 0 deletions cpp/src/search/contains.cu
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/search.hpp>
#include <cudf/structs/detail/contains.hpp>
#include <cudf/table/experimental/row_operators.cuh>
#include <cudf/table/row_operators.cuh>
#include <cudf/table/table_device_view.cuh>
#include <cudf/table/table_view.hpp>
Expand Down
Loading