Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for the to_bytes and from_bytes methods of integers #49792

Closed
SimonSapin opened this issue Apr 8, 2018 · 14 comments

Comments

@SimonSapin
Copy link
Contributor

commented Apr 8, 2018

This tracks the stabilization of two methods on each primitive integer type, added in PR #49871:

impl $Int {
    pub fn to_bytes(self) -> [u8; mem::size_of::<Self>()] {
        unsafe { mem::transmute(self) }
    }
    pub fn from_bytes(bytes: [u8; mem::size_of::<Self>()]) -> Self {
        unsafe { mem::transmute(bytes) }
    }
}

Previous issue message:


I’d like to propose adding to the standard library between various integer types $Int and byte arrays [u8; size_of::<$Int>()] (which by the way is literally a valid type today). The implementation would be exactly transmute, but since the signature is much more restricted and all bit patterns are valid for each of the types involved, these conversions are safe.

Transmuting produces arrays with the target platform’s endianness. When something different is desired, the existing to_be/to_le/from_be/from_le methods can be combined with these new conversions. Keeping these concerns orthogonal (instead of multiplying ad-hoc conversions) allows to keep the API surface small.

Wrapping specific forms of transmute into safe APIs makes good candidates for the standard library IMO since they can save users from needing writing (and reviewing and maintaining) unsafe code themselves. See Box::into_raw for example. Together with the existing {to,from}_{be,le} methods and TryFrom<&[T]> for &[T; $N] impls, these new conversions would cover much of the functionality of the popular byteorder crate with little code and a relatively small API surface.

What I’m less certain about (and why this isn’t a PR yet) is what API should we expose these conversions as. Options are:

  • Impls of the From trait, or
  • Named methods, similar to f32::to_bits and f32::from_bits. The advantage over From is that we can give specific names to these conversions in order to communicate what they do. The downside is that we need to pick names.
    • I initially thought of to_native_endian_bytes and from_native_endian_bytes but that’s not great because:
      • It’s somewhat inconsistent with to_be and friends which are much more abbreviated. (But maybe they shouldn’t be. It is worth adding to_big_endian & co and deprecating the short ones?)
      • It looks weird when combining them for writing portable code: n.to_be().to_native_endian(): now "native endian" is inaccurate, but that’s partly the fault of to_be for changing the meaning of a value without changing its type.
    • Another idea is simply to_bytes and from_bytes, but that’s uninformative enough that they could just as well be From impls.

@rust-lang/libs or anyone, any thoughts?

@sfackler

This comment has been minimized.

Copy link
Member

commented Apr 9, 2018

I definitely like the idea of having these. Because of the multiple possible interpretations I'd lean towards inherent methods rather than From impls. I think to_bytes and from_bytes are reasonable names - to_native_endian is a bit confusing I agree.

@scottmcm

This comment has been minimized.

Copy link
Member

commented Apr 9, 2018

Two recent internals threads with thoughts around this area:

It seems to me like there's a general common theme here of "safe, but a bit weird and rather transmute-y" conversions: this thread's uN <=> [u8; N/8], the first thread's u16x8 <=> u32x4 and u32 <=> f32, some parts of as like u32 <=> i32 that currently don't have a method version, and extended versions of that like &'a u32 <=> &'a i32 that are never exposed as safe today (but could be).

So here's a sketch of an idea using #[marker] traits:

#[marker] unsafe trait InplaceReinterpretAs<T> {}
unsafe impl<T> InplaceReinterpretAs<T> for T {}
unsafe impl InplaceReinterpretAs<[u8; 4]> for u32 {}
unsafe impl InplaceReinterpretAs<i32> for u32 {}
unsafe impl InplaceReinterpretAs<u32> for i32 {}
unsafe impl<T, U> InplaceReinterpretAs<*const U> for *const T {}
unsafe impl<T, U> InplaceReinterpretAs<*mut U> for *mut T {}
unsafe impl InplaceReinterpretAs<u16x8> for u32x4 {}
unsafe impl InplaceReinterpretAs<u32x4> for u16x8 {}

#[marker] unsafe trait ReinterpretAs<T> {
    // Because it's a marker trait, these cannot be overridden,
    // and thus their behaviour is always predicatable
    fn reinterpret(self) -> T {
        unsafe {
            let r = ptr::read_unaligned(&self as *const Self as *const T);
            mem::forget(self);
            r
        }
    }
    unsafe fn reinterpret_unchecked(x: T) -> Self {
        let r = ptr::read_unaligned(&x as *const T as *const Self);
        mem::forget(x);
        r
    }
}
unsafe impl<T, U> ReinterpretAs<U> for T where T: InplaceReinterpretAs<U> {}
unsafe impl<'a, T, U> ReinterpretAs<&'a U> for &'a T where T: InplaceReinterpretAs<U> {}
unsafe impl<'a, T, U> ReinterpretAs<&'a mut U> for &'a mut T where T: InplaceReinterpretAs<U> {}
unsafe impl ReinterpretAs<u32> for [u8;4] {} // not ok in-place, but fine as memcpy

Certainly std is generally adverse to introducing these using traits, but I think the recursiveness of the scenario makes the trait version more compelling than normal in this case. If one can turn a u32 into a [u8; 4] safely, why not also be able to turn a &[u32] into a &[[u8; 4]] safely?

(Name inspired by C++'s reinterpret_cast, obviously.)

@SimonSapin SimonSapin changed the title Conversions between integers and typed arrays Conversions between integers and byte arrays Apr 9, 2018

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2018

@scottmcm There’s probably something interesting there, but this feels like the definition of scope creep and I’m not gonna personally pursue it today. Byte arrays/slices are "special" because they’re what’s used for I/O, let’s keep this particular issue specifically about them.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2018

@sfackler Alright, from_bytes and to_bytes sound good. Remaining questions:

  • Should we have these methods for signed integers? The Reference already specifies that they’re two’s complement.
  • Should we have them for pointer-sized integers? The portability story with arrays of different size feels more iffy than with endianness.
@sfackler

This comment has been minimized.

Copy link
Member

commented Apr 10, 2018

It definitely makes sense for signed integers. The case for usize/isize is interesting. I think we should probably still have them and maybe just note in the docs that people should be careful about using literal sizes.

@SimonSapin SimonSapin changed the title Conversions between integers and byte arrays Tracking issue for the to_bytes and from_bytes methods of integers Apr 11, 2018

kennytm added a commit to kennytm/rust that referenced this issue Apr 14, 2018
Rollup merge of rust-lang#49871 - SimonSapin:int-bytes, r=sfackler
Add to_bytes and from_bytes to primitive integers

Discussion issue turned tracking issue: rust-lang#49792
@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2018

This landed almost two months ago. Let’s stabilize?

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

commented Jun 6, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Copy link

commented Jun 19, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@dhardy

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

Two comments:

  • Adding new safe ways to achieve platform-specific behaviour may not be advisable? i.to_le().to_bytes() is easy to use but also easy to forget. i.to_bytes_le() would be an alternative — though it's unfortunate that this implies the need for four functions instead of two.
  • It would be preferable to have a trait covering this. Unfortunately at the moment that implies either using slices (not preferred) or generic constants. Perhaps waiting for the latter would be preferable:
trait ByteConversions {
    const N: usize;
    pub fn to_bytes(self) -> [u8; Self::N];
    pub fn from_bytes(bytes: [u8; Self::N]) -> Self;
}
@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2018

I think there are two separate relevant operations here: the type cast (which compiles to a no-op), and the byte-order normalization. They are often used together, but not necessarily always. For example, a serialization mechanism for communicating over a (byte-stream) pipe with a child process on the same machine might want to use native-endian to avoid the cost of swapping the byte order only to swap it back on the other end of the pipe.

So if we want to provide separate APIs for each endianness I think we should still include native-endian, for a total of 6 new methods.

As to a trait, I don’t think it is preferable for this specifically. Integer types already have plenty of "duplicated" inherent methods, why are these ones different? If we want to bring back an Int trait into the standard library, it should be a larger proposal that includes all integer APIs.

@dhardy

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

You are correct that some users will want native Endianness. IMO it's wrong to make the easiest option platform-dependent, but I don't see a good, concise alternative.

You are also right about these not being special with regards to integer operations. Having traits is very useful for generic code, so an Int trait is a good idea IMO. Not sure why the current design has so many duplicated methods, but never mind.

Then I have no more concerns regarding this proposal. 👍

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2018

(The Int trait was removed in #23549 and moved to https://crates.io/crates/num / https://crates.io/crates/num-traits.)

tmccombs added a commit to tmccombs/rust that referenced this issue Jun 27, 2018
bors added a commit that referenced this issue Jun 27, 2018
Auto merge of #51835 - tmccombs:stable-int-to-from-bytes, r=joshtriplett
Stabilize to_bytes and from_bytes for integers.

Fixes #49792

@bors bors closed this in #51835 Jun 27, 2018

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2018

Reopening for now because of #52850 and #51919.

@SimonSapin SimonSapin reopened this Jul 30, 2018

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 30, 2018
Rollup merge of rust-lang#52850 - SimonSapin:unstablize, r=alexcrichton
Revert "Stabilize to_bytes and from_bytes for integers."

This reverts commit c8f9b84 / PR rust-lang#51835, and reopens the tracking issue rust-lang#49792.

These methods were stabilized in Rust 1.29, which is still in Nightly as of this writing. So my understanding is that it is still time to change our minds. Given the ongoing discussion in rust-lang#51919 about possibly renaming these APIs and since 1.29 goes to beta soon, I’d like to revert this stabilization for now until a decision is made in that PR. It’s possible that a decision will be made in time for 1.29, but there is no urgency. At most I expect this functionality to make it into 1.30.
bors added a commit that referenced this issue Jul 31, 2018
Auto merge of #52850 - SimonSapin:unstablize, r=alexcrichton
Revert "Stabilize to_bytes and from_bytes for integers."

This reverts commit c8f9b84 / PR #51835, and reopens the tracking issue #49792.

These methods were stabilized in Rust 1.29, which is still in Nightly as of this writing. So my understanding is that it is still time to change our minds. Given the ongoing discussion in #51919 about possibly renaming these APIs and since 1.29 goes to beta soon, I’d like to revert this stabilization for now until a decision is made in that PR. It’s possible that a decision will be made in time for 1.29, but there is no urgency. At most I expect this functionality to make it into 1.30.
@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2018

It looks like unstabilization #52850 made it into beta 1.29. Moving to #52963, as I don’t know if rfcbot will enjoy doing a second FCP in the same issue.

@SimonSapin SimonSapin closed this Aug 1, 2018

tbu- added a commit to tbu-/rust that referenced this issue Aug 4, 2018
Change tracking issue from rust-lang#49792 to rust-lang#51919
The old issue has already been in FCP, a new issue was opened for the
new API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.