Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert byte_array_view to use std::byte #11424

Merged

Conversation

hyperbolic2346
Copy link
Contributor

Description

When reviewing PR #11322 it was noted that it would be preferable to use std::byte for the data type, but at the time that didn't work out, so the plan was to address it later and issue #11362 was created to track it.

Fixes #11362

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@hyperbolic2346 hyperbolic2346 added 3 - Ready for Review Ready for review by team code quality improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 2, 2022
@hyperbolic2346 hyperbolic2346 requested a review from a team as a code owner August 2, 2022 00:39
@hyperbolic2346 hyperbolic2346 self-assigned this Aug 2, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 2, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Using std::byte anywhere that arithmetic operations don’t matter and we just want a “data byte” with bitwise ops is a good choice.

I’m a little unsure about adding a “rep layout compatible” trait for a type that lacks a corresponding cudf data type. Can you illuminate that decision a bit more?

template <typename T>
constexpr inline bool is_byte()
{
return std::is_same_v<std::remove_const_t<T>, std::byte>;
Copy link
Contributor

@bdice bdice Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a full decay, cv removal, reference removal, or just const removal? I’m not certain what choice is most appropriate for a trait like this

edit: removed suggestion for decay in favor of alternative remove_cv below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even go as far as claiming something like is_byte<volatile T&> should fail to compile, so this seems sensible to me.

Copy link
Contributor

@bdice bdice Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@upsj I’m not sure if I follow. Do you mean you prefer the current state of remove_const?

On further inquiry, the std::is_integral and std::is_floating_point traits return true for any cv-qualifiers but not references. We probably want to remove volatile qualifiers but not reference-ness. (Not that we use volatile often, but I would like to align trait behaviors with the standard where possible.)

Suggested change
return std::is_same_v<std::remove_const_t<T>, std::byte>;
return std::is_same_v<std::remove_cv_t<T>, std::byte>;

Copy link
Contributor

@upsj upsj Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, that was ambiguous. Yes, I believe a trait like is_byte should be rather restrictive, or are there use cases in cuDF where you need to call it on a decltype of an expression, or a volatile type? remove_cv seems more appropriate than decay though, I agree with staying consistent with the std library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't remove volatile because I wasn't sure the use-case where it was needed and I would rather have it fail to compile and force a decision at that point than to assume it would be ok. I'm ok with changing it to remove_cv_t though.

Copy link
Contributor

@bdice bdice Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #11424 (comment)

@bdice: On further inquiry, the std::is_integral and std::is_floating_point traits return true for any cv-qualifiers but not references. We probably want to remove volatile qualifiers but not reference-ness.

@upsj: I agree with staying consistent with the std library

@hyperbolic2346 I think this is consensus among reviewers that remove_cv_t is a good change for consistency with std / <type_traits>.

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@35a7c81). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head a3c5f13 differs from pull request most recent head 59218fd. Consider uploading reports for the commit 59218fd to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11424   +/-   ##
===============================================
  Coverage                ?   86.47%           
===============================================
  Files                   ?      144           
  Lines                   ?    22856           
  Branches                ?        0           
===============================================
  Hits                    ?    19765           
  Misses                  ?     3091           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ttnghia
Copy link
Contributor

ttnghia commented Aug 2, 2022

Can you explain more what is the benefit of using std::byte instead of int8_t?

Copy link
Contributor

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

template <typename T>
constexpr inline bool is_byte()
{
return std::is_same_v<std::remove_const_t<T>, std::byte>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even go as far as claiming something like is_byte<volatile T&> should fail to compile, so this seems sensible to me.

cpp/include/cudf/utilities/traits.hpp Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Aug 2, 2022

Can you explain more what is the benefit of using std::byte instead of int8_t?

Integer types and chars (signed or unsigned) are implied to represent numbers or letters of some kind. They have arithmetic operators defined like +, they exhibit overflow, they can be upcast to larger integer types, etc. None of these behaviors are desirable for a representation of pure bytes. The only behaviors we want are bitwise operators between byte types and the ability to safely use type aliasing as with char types (not used in this code, but in other places like hash functions).

std::byte is a distinct type that implements the concept of byte as specified in the C++ language definition.
Like char and unsigned char, it can be used to access raw memory occupied by other objects (object representation), but unlike those types, it is not a character type and is not an arithmetic type. A byte is only a collection of bits, and the only operators defined for it are the bitwise ones.

https://en.cppreference.com/w/cpp/types/byte

@ttnghia
Copy link
Contributor

ttnghia commented Aug 2, 2022

Yes but that's in general. I wanted to ask about the benefit of using it in this PR. What if we don't do that? Or this is just something that paves the way for better future development?

@bdice
Copy link
Contributor

bdice commented Aug 2, 2022

Yes but that's in general. I wanted to ask about the benefit of using it in this PR. What if we don't do that? Or this is just something that paves the way for better future development?

This change is motivated by type safety. Only use an arithmetic type like uint8_t or char if you want arithmetic behaviors, otherwise always use std::byte.

@hyperbolic2346
Copy link
Contributor Author

I’m a little unsure about adding a “rep layout compatible” trait for a type that lacks a corresponding cudf data type. Can you illuminate that decision a bit more?

This was Jake's suggestion. I asked him how he would expect this to move forward and he thought it would go into is_rep_layout_compatible. I don't want to speak for him or suggest this is the best route though. I noticed the issue with byte_array_view using std::byte and started to figure out why .data<std::byte>() wasn't working and it led me here. Considering we need to be able to convert a cudf data type to bytes it didn't seem to be off target to me.

Co-authored-by: Tobias Ribizel <ribizel@kit.edu>
@bdice
Copy link
Contributor

bdice commented Aug 2, 2022

I’m a little unsure about adding a “rep layout compatible” trait for a type that lacks a corresponding cudf data type. Can you illuminate that decision a bit more?

This was Jake's suggestion. I asked him how he would expect this to move forward and he thought it would go into is_rep_layout_compatible. I don't want to speak for him or suggest this is the best route though. I noticed the issue with byte_array_view using std::byte and started to figure out why .data<std::byte>() wasn't working and it led me here. Considering we need to be able to convert a cudf data type to bytes it didn't seem to be off target to me.

Sounds good to me! Approving.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. You might apply the remove_cv_t change, otherwise LGTM.

@hyperbolic2346
Copy link
Contributor Author

Approving. You might apply the remove_cv_t change, otherwise LGTM.

Is there a consensus on that? I'm ok going either way and debated it myself before submitting.

@hyperbolic2346
Copy link
Contributor Author

rerun tests

1 similar comment
@hyperbolic2346
Copy link
Contributor Author

rerun tests

@etseidl etseidl mentioned this pull request Aug 3, 2022
3 tasks
@ajschmidt8 ajschmidt8 changed the title Convert byte_array_view to use std::byte Convert byte_array_view to use std::byte Aug 3, 2022
@ajschmidt8 ajschmidt8 changed the title Convert byte_array_view to use std::byte Convert byte_array_view to use std::byte Aug 3, 2022
@hyperbolic2346
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 554fb10 into rapidsai:branch-22.10 Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support for std::byte in column_device_view's data function
4 participants