Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check column type equality, handling nested types correctly. #14531

Merged
merged 39 commits into from
May 2, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 30, 2023

Description

Addresses most of #14527. See also #14494.

This PR expands the use of cudf::column_types_equal(lhs, rhs) and adds new methods cudf::column_scalar_types_equal, cudf::scalar_types_equal, and cudf::all_column_types_equal.

These type check functions are now employed throughout the code base instead of raw checks like a.type() == b.type() because those do not correctly handle nested types.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 30, 2023
@bdice
Copy link
Contributor Author

bdice commented Nov 30, 2023

@wence- Is this the kind of change you had in mind? I looked over the list of files you put in #14527 and made changes for most of them.

I'd like your thoughts on what level of testing we need to merge this PR. Should we test every algorithm with mismatching inputs that have the same top-level type, like list<int> and list<float>? I haven't put in any time on test writing yet, but that seems like what we need to give it proper coverage.

@bdice bdice self-assigned this Nov 30, 2023
@bdice bdice changed the title Check column types with a utility that handles nesting correctly. Check column types with a utility that handles nested types correctly. Nov 30, 2023
@bdice bdice changed the title Check column types with a utility that handles nested types correctly. Check column type equality, handling nested types correctly. Nov 30, 2023
@bdice bdice added bug Something isn't working non-breaking Non-breaking change labels Nov 30, 2023
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good step, although please check the logic in your implementation of all_columns_equal.

cpp/include/cudf/utilities/type_checks.hpp Outdated Show resolved Hide resolved
@wence-
Copy link
Contributor

wence- commented Nov 30, 2023

@wence- Is this the kind of change you had in mind? I looked over the list of files you put in #14527 and made changes for most of them.

I think if we don't want to change the structure of the data_type object, then this looks basically the same as what I was thinking of.

I'd like your thoughts on what level of testing we need to merge this PR. Should we test every algorithm with mismatching inputs that have the same top-level type, like list<int> and list<float>? I haven't put in any time on test writing yet, but that seems like what we need to give it proper coverage.

I think we want some unit tests for the equality comparisons first. Since the data-type description is an inductive type, the tests can be "exhaustive" by checking the base cases and the inductive cases, and then convincing ourselves we've implemented the inductive definition correctly. We should also then test that the table-equality comparators work correctly.

In terms of then testing the advertised API surface, can we collect which functions advertise that they throw on mismatched types and then consider the testing strategy.

One thing I wonder is whether it should be the case that any such type-equality checking happens in the detail API, or if those functions have as (implicit) preconditions that they are passed valid data. We can't encode any such requirement in the type system because columns are type-erased: the best I can think of is that detail APIs that require matching types gain an argument that carries a token indicating whether equality constraints have been satisfied by the caller. But that might not be very ergonomic. The thing I would like to avoid is redoing equality checks when the caller of some function has already performed them.

rapids-bot bot pushed a commit that referenced this pull request Dec 5, 2023
… table_view.cpp (#14535)

Moves some `inline` functions from the `table_view.hpp` to the `table_view.cpp` to help reduce dependency bloat.
Also reference #14531: I'd like to minimize changes to the highly inclusive `table_view.hpp`.

Closes #14431

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Karthikeyan (https://github.com/karthikeyann)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #14535
@davidwendt
Copy link
Contributor

One extra caveat I'd like to consider here. We are planning for STRING type columns to contain either INT32 or INT64 offsets children. In this case, I think we'd want two STRING type columns to be equal (in type only) regardless of their children types.
So I would propose a special case for STRING types in the type_checks functions to ignore STRING children.

@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue conda Java Affects Java cuDF API. labels Jan 23, 2024
@bdice bdice changed the base branch from branch-24.02 to branch-24.04 March 11, 2024 15:33
@github-actions github-actions bot removed conda Java Affects Java cuDF API. labels Mar 11, 2024
@bdice bdice force-pushed the nested-column-type-checks branch from d92c5ea to 39d14c8 Compare March 11, 2024 16:44
@github-actions github-actions bot removed Python Affects Python cuDF API. CMake CMake build issue labels Mar 11, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 26, 2024
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I have some very minor (all non-blocking) comments that are mostly stylistic choices (feel free to ignore).

cpp/include/cudf/utilities/type_checks.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/utilities/type_checks.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/utilities/type_checks.hpp Show resolved Hide resolved
cpp/src/dictionary/detail/concatenate.cu Outdated Show resolved Hide resolved
cpp/src/groupby/groupby.cu Outdated Show resolved Hide resolved
cpp/src/table/table_view.cpp Outdated Show resolved Hide resolved
cpp/src/utilities/type_checks.cpp Show resolved Hide resolved
cpp/src/utilities/type_checks.cpp Show resolved Hide resolved
cpp/src/utilities/type_checks.cpp Outdated Show resolved Hide resolved
cpp/src/utilities/type_checks.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- Thanks for the great review comments. I am slowly working through this as I have time. I had a couple replies to your comments, and I'll ping you when I'm ready for another review.

cpp/include/cudf/utilities/type_checks.hpp Show resolved Hide resolved
cpp/src/dictionary/detail/concatenate.cu Outdated Show resolved Hide resolved
cpp/src/groupby/groupby.cu Outdated Show resolved Hide resolved
@bdice bdice requested a review from wence- May 1, 2024 20:55
@bdice
Copy link
Contributor Author

bdice commented May 1, 2024

@wence- I applied most of your suggested changes. Feel free to have a final look, if you'd like. I'm aiming to merge in the next 2 days to avoid conflicts and get this in before 24.06 burndown begins.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bdice! I think you will need to solicit another C++ review, but this looks great from my end.

@bdice
Copy link
Contributor Author

bdice commented May 2, 2024

/merge

@rapids-bot rapids-bot bot merged commit 500cb29 into rapidsai:branch-24.06 May 2, 2024
70 checks passed
@bdice
Copy link
Contributor Author

bdice commented May 2, 2024

Thanks @davidwendt @wence- @PointKernel for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Status: To be revisited
Development

Successfully merging this pull request may close these issues.

None yet

5 participants