Skip to content

Commit

Permalink
Check column type equality, handling nested types correctly. (#14531)
Browse files Browse the repository at this point in the history
Addresses most of #14527. See also #14494.

This PR expands the use of `cudf::column_types_equal(lhs, rhs)` and adds new methods `cudf::column_scalar_types_equal`, `cudf::scalar_types_equal`, and `cudf::all_column_types_equal`.

These type check functions are now employed throughout the code base instead of raw checks like `a.type() == b.type()` because those do not correctly handle nested types.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Lawrence Mitchell (https://github.com/wence-)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #14531
  • Loading branch information
bdice committed May 2, 2024
1 parent 4494991 commit 500cb29
Show file tree
Hide file tree
Showing 62 changed files with 615 additions and 319 deletions.
15 changes: 11 additions & 4 deletions cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -943,13 +943,14 @@ Use the `CUDF_EXPECTS` macro to enforce runtime conditions necessary for correct
Example usage:

```c++
CUDF_EXPECTS(lhs.type() == rhs.type(), "Column type mismatch");
CUDF_EXPECTS(cudf::have_same_types(lhs, rhs), "Type mismatch", cudf::data_type_error);
```
The first argument is the conditional expression expected to resolve to `true` under normal
conditions. If the conditional evaluates to `false`, then an error has occurred and an instance of
`cudf::logic_error` is thrown. The second argument to `CUDF_EXPECTS` is a short description of the
error that has occurred and is used for the exception's `what()` message.
conditions. The second argument to `CUDF_EXPECTS` is a short description of the error that has
occurred and is used for the exception's `what()` message. If the conditional evaluates to
`false`, then an error has occurred and an instance of the exception class in the third argument
(or the default, `cudf::logic_error`) is thrown.
There are times where a particular code path, if reached, should indicate an error no matter what.
For example, often the `default` case of a `switch` statement represents an invalid alternative.
Expand Down Expand Up @@ -1048,6 +1049,12 @@ types such as numeric types and timestamps/durations, adding support for nested
Enabling an algorithm differently for different types uses either template specialization or SFINAE,
as discussed in [Specializing Type-Dispatched Code Paths](#specializing-type-dispatched-code-paths).

## Comparing Data Types

When comparing the data types of two columns or scalars, do not directly compare
`a.type() == b.type()`. Nested types such as lists of structs of integers will not be handled
properly if only the top level type is compared. Instead, use the `cudf::have_same_types` function.

# Type Dispatcher

libcudf stores data (for columns and scalars) "type erased" in `void*` device memory. This
Expand Down
7 changes: 5 additions & 2 deletions cpp/include/cudf/detail/scatter.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
#include <cudf/strings/detail/scatter.cuh>
#include <cudf/strings/string_view.cuh>
#include <cudf/utilities/default_stream.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_checks.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>
Expand Down Expand Up @@ -213,8 +215,9 @@ struct column_scatterer_impl<dictionary32> {
// check the keys match
dictionary_column_view const source(source_in);
dictionary_column_view const target(target_in);
CUDF_EXPECTS(source.keys().type() == target.keys().type(),
"scatter dictionary keys must be the same type");
CUDF_EXPECTS(cudf::have_same_types(source.keys(), target.keys()),
"scatter dictionary keys must be the same type",
cudf::data_type_error);

// first combine keys so both dictionaries have the same set
auto target_matched = dictionary::detail::add_keys(target, source.keys(), stream, mr);
Expand Down
2 changes: 1 addition & 1 deletion cpp/include/cudf/lists/detail/scatter.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ std::unique_ptr<column> scatter_impl(rmm::device_uvector<unbound_list_view> cons
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr)
{
CUDF_EXPECTS(column_types_equal(source, target), "Mismatched column types.");
CUDF_EXPECTS(have_same_types(source, target), "Mismatched column types.");

auto const child_column_type = lists_column_view(target).child().type();

Expand Down
11 changes: 1 addition & 10 deletions cpp/include/cudf/table/table_view.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -339,15 +339,6 @@ bool has_nested_nullable_columns(table_view const& input);
*/
std::vector<column_view> get_nullable_columns(table_view const& table);

/**
* @brief Checks if two `table_view`s have columns of same types
*
* @param lhs left-side table_view operand
* @param rhs right-side table_view operand
* @return boolean comparison result
*/
bool have_same_types(table_view const& lhs, table_view const& rhs);

/**
* @brief Copy column_views from a table_view into another table_view according to
* a column indices map.
Expand Down
106 changes: 103 additions & 3 deletions cpp/include/cudf/utilities/type_checks.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,11 +16,16 @@
#pragma once

#include <cudf/column/column_view.hpp>
#include <cudf/scalar/scalar.hpp>

#include <algorithm>

namespace cudf {

/**
* @brief Compares the type of two `column_view`s
* @brief Compare the types of two `column_view`s
*
* @deprecated Since 24.06. Use cudf::have_same_types instead.
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is compared.
Expand All @@ -34,10 +39,11 @@ namespace cudf {
* @param rhs The second `column_view` to compare
* @return true if column types match
*/
bool column_types_equal(column_view const& lhs, column_view const& rhs);
[[deprecated]] bool column_types_equal(column_view const& lhs, column_view const& rhs);

/**
* @brief Compare the type IDs of two `column_view`s
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is ignored.
*
Expand All @@ -47,4 +53,98 @@ bool column_types_equal(column_view const& lhs, column_view const& rhs);
*/
bool column_types_equivalent(column_view const& lhs, column_view const& rhs);

/**
* @brief Compares the type of two `column_view`s
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is compared.
* - For dictionary types, the type of the keys are compared if both are
* non-empty columns.
* - For lists types, the type of child columns are compared recursively.
* - For struct types, the type of each field are compared in order.
* - For all other types, the `id` of `data_type` is compared.
*
* @param lhs The first `column_view` to compare
* @param rhs The second `column_view` to compare
* @return true if types match
*/
bool have_same_types(column_view const& lhs, column_view const& rhs);

/**
* @brief Compare the types of a `column_view` and a `scalar`
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is compared.
* - For dictionary column types, the type of the keys is compared to the
* scalar type.
* - For lists types, the types of child columns are compared recursively.
* - For struct types, the types of each field are compared in order.
* - For all other types, the `id` of `data_type` is compared.
*
* @param lhs The `column_view` to compare
* @param rhs The `scalar` to compare
* @return true if types match
*/
bool have_same_types(column_view const& lhs, scalar const& rhs);

/**
* @brief Compare the types of a `scalar` and a `column_view`
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is compared.
* - For dictionary column types, the type of the keys is compared to the
* scalar type.
* - For lists types, the types of child columns are compared recursively.
* - For struct types, the types of each field are compared in order.
* - For all other types, the `id` of `data_type` is compared.
*
* @param lhs The `scalar` to compare
* @param rhs The `column_view` to compare
* @return true if types match
*/
bool have_same_types(scalar const& lhs, column_view const& rhs);

/**
* @brief Compare the types of two `scalar`s
*
* This function returns true if the type of `lhs` equals that of `rhs`.
* - For fixed point types, the scale is compared.
* - For lists types, the types of child columns are compared recursively.
* - For struct types, the types of each field are compared in order.
* - For all other types, the `id` of `data_type` is compared.
*
* @param lhs The first `scalar` to compare
* @param rhs The second `scalar` to compare
* @return true if types match
*/
bool have_same_types(scalar const& lhs, scalar const& rhs);

/**
* @brief Checks if two `table_view`s have columns of same types
*
* @param lhs left-side table_view operand
* @param rhs right-side table_view operand
* @return boolean comparison result
*/
bool have_same_types(table_view const& lhs, table_view const& rhs);

/**
* @brief Compare the types of a range of `column_view` or `scalar` objects
*
* This function returns true if all objects in the range have the same type, in the sense of
* cudf::have_same_types.
*
* @tparam ForwardIt Forward iterator
* @param first The first iterator
* @param last The last iterator
* @return true if all types match
*/
template <typename ForwardIt>
inline bool all_have_same_types(ForwardIt first, ForwardIt last)
{
return first == last || std::all_of(std::next(first), last, [want = *first](auto const& c) {
return cudf::have_same_types(want, c);
});
}

} // namespace cudf
11 changes: 5 additions & 6 deletions cpp/src/copying/concatenate.cu
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
#include <cudf/table/table.hpp>
#include <cudf/table/table_device_view.cuh>
#include <cudf/utilities/default_stream.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/type_checks.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>
Expand Down Expand Up @@ -461,12 +463,9 @@ void traverse_children::operator()<cudf::list_view>(host_span<column_view const>
*/
void bounds_and_type_check(host_span<column_view const> cols, rmm::cuda_stream_view stream)
{
CUDF_EXPECTS(std::all_of(cols.begin(),
cols.end(),
[expected_type = cols.front().type()](auto const& c) {
return c.type() == expected_type;
}),
"Type mismatch in columns to concatenate.");
CUDF_EXPECTS(cudf::all_have_same_types(cols.begin(), cols.end()),
"Type mismatch in columns to concatenate.",
cudf::data_type_error);

// total size of all concatenated rows
size_t const total_row_count =
Expand Down
18 changes: 7 additions & 11 deletions cpp/src/copying/copy.cu
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <cudf/strings/string_view.cuh>
#include <cudf/utilities/default_stream.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_checks.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/device_uvector.hpp>
Expand Down Expand Up @@ -362,9 +363,10 @@ std::unique_ptr<column> copy_if_else(column_view const& lhs,
CUDF_EXPECTS(boolean_mask.size() == lhs.size(),
"Boolean mask column must be the same size as lhs and rhs columns",
std::invalid_argument);
CUDF_EXPECTS(lhs.size() == rhs.size(), "Both columns must be of the size", std::invalid_argument);
CUDF_EXPECTS(
lhs.type() == rhs.type(), "Both inputs must be of the same type", cudf::data_type_error);
lhs.size() == rhs.size(), "Both columns must be of the same size", std::invalid_argument);
CUDF_EXPECTS(
cudf::have_same_types(lhs, rhs), "Both inputs must be of the same type", cudf::data_type_error);

return copy_if_else(lhs, rhs, lhs.has_nulls(), rhs.has_nulls(), boolean_mask, stream, mr);
}
Expand All @@ -378,11 +380,8 @@ std::unique_ptr<column> copy_if_else(scalar const& lhs,
CUDF_EXPECTS(boolean_mask.size() == rhs.size(),
"Boolean mask column must be the same size as rhs column",
std::invalid_argument);

auto rhs_type =
cudf::is_dictionary(rhs.type()) ? cudf::dictionary_column_view(rhs).keys_type() : rhs.type();
CUDF_EXPECTS(
lhs.type() == rhs_type, "Both inputs must be of the same type", cudf::data_type_error);
cudf::have_same_types(rhs, lhs), "Both inputs must be of the same type", cudf::data_type_error);

return copy_if_else(lhs, rhs, !lhs.is_valid(stream), rhs.has_nulls(), boolean_mask, stream, mr);
}
Expand All @@ -396,11 +395,8 @@ std::unique_ptr<column> copy_if_else(column_view const& lhs,
CUDF_EXPECTS(boolean_mask.size() == lhs.size(),
"Boolean mask column must be the same size as lhs column",
std::invalid_argument);

auto lhs_type =
cudf::is_dictionary(lhs.type()) ? cudf::dictionary_column_view(lhs).keys_type() : lhs.type();
CUDF_EXPECTS(
lhs_type == rhs.type(), "Both inputs must be of the same type", cudf::data_type_error);
cudf::have_same_types(lhs, rhs), "Both inputs must be of the same type", cudf::data_type_error);

return copy_if_else(lhs, rhs, lhs.has_nulls(), !rhs.is_valid(stream), boolean_mask, stream, mr);
}
Expand All @@ -412,7 +408,7 @@ std::unique_ptr<column> copy_if_else(scalar const& lhs,
rmm::device_async_resource_ref mr)
{
CUDF_EXPECTS(
lhs.type() == rhs.type(), "Both inputs must be of the same type", cudf::data_type_error);
cudf::have_same_types(lhs, rhs), "Both inputs must be of the same type", cudf::data_type_error);
return copy_if_else(
lhs, rhs, !lhs.is_valid(stream), !rhs.is_valid(stream), boolean_mask, stream, mr);
}
Expand Down
10 changes: 6 additions & 4 deletions cpp/src/copying/copy_range.cu
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
#include <cudf/utilities/default_stream.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_checks.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/resource_ref.hpp>
Expand Down Expand Up @@ -147,8 +148,9 @@ std::unique_ptr<cudf::column> out_of_place_copy_range_dispatch::operator()<cudf:
// check the keys in the source and target
cudf::dictionary_column_view const dict_source(source);
cudf::dictionary_column_view const dict_target(target);
CUDF_EXPECTS(dict_source.keys().type() == dict_target.keys().type(),
"dictionary keys must be the same type");
CUDF_EXPECTS(cudf::have_same_types(dict_source.keys(), dict_target.keys()),
"dictionary keys must be the same type",
cudf::data_type_error);

// combine keys so both dictionaries have the same set
auto target_matched =
Expand Down Expand Up @@ -211,7 +213,7 @@ void copy_range_in_place(column_view const& source,
(target_begin <= target.size() - (source_end - source_begin)),
"Range is out of bounds.",
std::out_of_range);
CUDF_EXPECTS(target.type() == source.type(), "Data type mismatch.", cudf::data_type_error);
CUDF_EXPECTS(cudf::have_same_types(target, source), "Data type mismatch.", cudf::data_type_error);
CUDF_EXPECTS(target.nullable() || not source.has_nulls(),
"target should be nullable if source has null values.",
std::invalid_argument);
Expand Down Expand Up @@ -239,7 +241,7 @@ std::unique_ptr<column> copy_range(column_view const& source,
(target_begin <= target.size() - (source_end - source_begin)),
"Range is out of bounds.",
std::out_of_range);
CUDF_EXPECTS(target.type() == source.type(), "Data type mismatch.", cudf::data_type_error);
CUDF_EXPECTS(cudf::have_same_types(target, source), "Data type mismatch.", cudf::data_type_error);

return cudf::type_dispatcher<dispatch_storage_type>(
target.type(),
Expand Down
Loading

0 comments on commit 500cb29

Please sign in to comment.