New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

Merged
merged 2 commits into from Aug 4, 2018

Conversation

Projects
None yet
@tbu-
Contributor

tbu- commented Jun 29, 2018

If one doesn't view integers as containers of bytes, converting them to
bytes necessarily needs the specfication of encoding.

I think Rust is a language that wants to be explicit. The to_bytes
function is basically the opposite of that – it converts an integer into
the native byte representation, but there's no mention (in the function
name) of it being very much platform dependent. Therefore, I think it
would be better to replace that method by three methods, the explicit
to_ne_bytes ("native endian") which does the same thing and
to_{le,be}_bytes which return the little- resp. big-endian encoding.

@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented Jun 29, 2018

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jun 29, 2018

I think these methods also make more sense than the to_be and to_le function from a type-system point of view: It doesn't make sense to repeatedly call to_be on an integer, but you can, the same way that it doesn't make sense to repeatedly call encode() on a string in Python 2, but you can do so there (which will raise an error for non-ASCII, but still).

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jun 29, 2018

CC #49792

@kennytm kennytm added the T-libs label Jun 30, 2018

///
/// [`to_native_bytes`]: #method.to_native_bytes
#[rustc_deprecated(since = "1.29.0", reason = "method doesn't communicate
intent, use `to_native_bytes`, `to_le_bytes` or `to_be_bytes` instead")]
#[stable(feature = "int_to_from_bytes", since = "1.29.0")]

This comment has been minimized.

@kennytm

kennytm Jun 30, 2018

Member

If this PR is accepted, we should better keep to_bytes()/from_bytes() unstable or just remove them, instead of stabilize and immediately deprecate them.

@alexreg

This comment has been minimized.

Contributor

alexreg commented Jul 1, 2018

Very fair point! Three different method classes makes sense.

@felix91gr

This comment has been minimized.

felix91gr commented Jul 2, 2018

Hi! I don't know if I'm supposed to add my 2 cents here, but since I can, I will.

I've been using the to_bytes kind of transformation without regards for the endianness. This is because this is what my program is doing:

  • Receive an N-Dimensional Vec<f32>
  • Take its (non-randomly-seeded) u64 hash
  • Take the bytes of that hash
  • Feed those bytes to a pseudorandom generator as its seed.

I'm using this to make a generalized N-D Perlin (and in the future, Simplex) noise library. Since I'm using the hash's bytes as the seed for the pseudorandom generator, I didn't care about endianness when I implemented it.

But now that I think of it, specifying the endianness would allow me to have the exact same noise patterns regardless of the platform in which the library is running. So I think, even in my case, this is a better feature than the recently merged to_bytes :)

Thanks for your hard work 🙇

@kennytm kennytm added the I-nominated label Jul 4, 2018

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jul 5, 2018

Python implements a similar API on integers:

 |  from_bytes(...) from builtins.type
 |      int.from_bytes(bytes, byteorder, *, signed=False) -> int
 |      
 |      Return the integer represented by the given array of bytes.
 |      
 |      The bytes argument must be a bytes-like object (e.g. bytes or bytearray).
 |      
 |      The byteorder argument determines the byte order used to represent the
 |      integer.  If byteorder is 'big', the most significant byte is at the
 |      beginning of the byte array.  If byteorder is 'little', the most
 |      significant byte is at the end of the byte array.  To request the native
 |      byte order of the host system, use `sys.byteorder' as the byte order value.
 |      
 |      The signed keyword-only argument indicates whether two's complement is
 |      used to represent the integer.
 |  
 |  to_bytes(...)
 |      int.to_bytes(length, byteorder, *, signed=False) -> bytes
 |      
 |      Return an array of bytes representing an integer.
 |      
 |      The integer is represented using length bytes.  An OverflowError is
 |      raised if the integer is not representable with the given number of
 |      bytes.
 |      
 |      The byteorder argument determines the byte order used to represent the
 |      integer.  If byteorder is 'big', the most significant byte is at the
 |      beginning of the byte array.  If byteorder is 'little', the most
 |      significant byte is at the end of the byte array.  To request the native
 |      byte order of the host system, use `sys.byteorder' as the byte order value.
 |      
 |      The signed keyword-only argument determines whether two's complement is
 |      used to represent the integer.  If signed is False and a negative integer
 |      is given, an OverflowError is raised.

(Note that the byteorder argument is mandatory.)

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jul 12, 2018

This was discussed briefly at @rust-lang/libs triage yesterday but the conclusion was that we're going to leave comments individually here if we feel so.

I personally would not want to merge this PR. I think that leveraging the already-stable to_le and to_be methods to work in combination with to_bytes and from_bytes (already implemented) is the best way to work with these methods. Given that to_le and to_be are already stable I wouldn't feel we need to stabilize to_le_bytes or to_be_bytes. The question then to me is whether we need to include "native" in the name. I feel that the fact that these are native-endianness is implied by the lack of other information in the method name as otherwise we'd specify big or little.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jul 12, 2018

@rfcbot fcp close

Let's see what others think though!

@rfcbot

This comment has been minimized.

rfcbot commented Jul 12, 2018

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jul 13, 2018

The question then to me is whether we need to include "native" in the name. I feel that the fact that these are native-endianness is implied by the lack of other information in the method name as otherwise we'd specify big or little.

As you can see in this thread, the implied native isn't clear for everyone. In fact, I think one should actively think about whether one wants a big-endian, little-endian or native byte representation for the integer. Having to_le and to_be on the integers does not achieve this, as you can just omit calls to them and never notice. If you instead have the choice between to_le_bytes, to_be_bytes, to_native_bytes, it is very clear that you have to make a choice, and it's immediately documented, too. If you see to_bytes in the wild, you wouldn't really know whether the author intended to make the code platform-specific or just forgot to add to_be or to_le.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Jul 13, 2018

This alternative was discussed in #49792 and I mildly preferred the current design in order to avoid the multiplication of API surface. This PR proposes replacing two methods with six.

I don’t really mind changing, especially since to_bytes and from_bytes were only stabilized in this cycle (1.29) so we can still remove them. However the motivation based only on principle feels a bit weak.

@tbu-, do you feel that this change would prevent bugs? What do you think of the "intermediate" option of renaming the existing methods but not adding the be/le ones? (Keep documenting them as to be combined with to_le and to_be.)

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jul 13, 2018

@tbu-, do you feel that this change would prevent bugs?

Yes, this is the intention behind this change. As you can see above, we already caught one with the PR alone. :)

What do you think of the "intermediate" option of renaming the existing methods but not adding the be/le ones?

I think that would be worse than what we currently have, as to_be().to_native_bytes() looks wrong, as we're combining a big-endian function with a function operating on native endian.

I think the optimization based on number of functions is misguided.

In C, zero-terminated "strings" share the same type, whether they're encoded as UTF-8 or just treated as bag of bytes. Rust makes the distinction because it is useful to prevent errors, it makes sense to make the user cast between bytes and UTF-8-strings, even if this results in an enormous method duplication between byte slices and strings.

As shown with the Python example, other languages consider the endianness of the integers part of the encoding process, too.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jul 13, 2018

I think I'm personally just not sold on this. I see the cost of six functions, some of which duplicate the functionality of existing functions, as not worth it. Dealing with endianness is inherently tricky and I don't feel like making all the methods longer and more wordy will really reduce the trickiness, but rather just make it less ergonomic to call when you already know which one you'd like.

@tbu-

This comment has been minimized.

Contributor

tbu- commented Jul 14, 2018

I see the cost of six functions, some of which duplicate the functionality of existing functions, as not worth it.

I understand this part.

don't feel like making all the methods longer and more wordy will really reduce the trickiness, but rather just make it less ergonomic to call when you already know which one you'd like.

I don't really understand this part. It's

to_be().to_bytes()              → to_be_bytes()            [reduction by  5 chars]
to_le().to_bytes()              → to_le_bytes()            [reduction by  5 chars]
to_bytes()                      → to_native_bytes()        [increase  by  7 chars]
i32::from_be(i32::from_bytes()) → i32::from_be_bytes()     [reduction by 10 chars]
i32::from_le(i32::from_bytes()) → i32::from_le_bytes()     [reduction by 10 chars]
i32::from_bytes()               → i32::from_native_bytes() [increase  by  7 chars]

To me, it looks like it decreases the ergonomics of doing the platform-specific thing slightly but also increases the ergonomics of doing the platform-independent thing slightly, and the important part: It brings them to the same level. Currently, doing the platform-independent thing is slightly less ergonomic.

Rust wants to encourage good code, if we want to bring ergonomics of the methods into play, then I think an even playing field between platform-independent methods and platform-dependent methods is in line with Rust's objectives.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Jul 15, 2018

I feel that "native" in a name doesn’t really work on its own. It only sort of does in "native endian", and even then only by opposition to little/big endian.

What do you all think of transmute_to_bytes and transmute_from_bytes, to express that this is a type-level change without a memory representation change?

Also the more I think about it the more I’m becoming partial to having explicit {to,from}_{le,be}_bytes conversions. The {to,from}_{le,be} methods have the merit of already being stable, but they’re kinda weird in taking and returning the same type. In fact the implementation of to_be and from_be for example is identical! I understand that they’re intended to be thought about differently, and perhaps their use is self-documenting what direction of conversion is intended, but it feels like the type system would be better suited to this (even though adding a whole series of dedicated types like struct BigEndianU32(u32); is probably not worth it).

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jul 17, 2018

@tbu- to me ergonomics isn't literally how many characters you type but also how easy it is to remember idioms, and remembering that there's endian conversions on integers plus byte<-> int conversions means it's easy to remember how to compose. With everything it can be confusing about knowing which one was the one implemented and there's now a choice to be made of which to do in various circumstances.

@SimonSapin I think we could try to find a better name, yeah, but I'd reserve transmute for unsafe-like operations and would prefer to not have it here on a safe method.

@marshrayms

This comment has been minimized.

marshrayms commented Jul 18, 2018

I am today starting on my first Rust program. I need to read a binary file format. When reading and writing files or network data, the endianness is an inherent part of the format's or protocol's definition.

It seems far more complicated to me to first read an incorrect intermediate value using host endianness and then adjust it to the correct value with bit manipulations. Byte-swapping is a completely unnecessary complication when we can just read it correctly in the first place.

I would really appreciate it if conversions of the form 'u32::from_le_bytes([u8; 4])' were made available.

Provide `{to,from}_{ne,le,be}_bytes` functions on integers
If one doesn't view integers as containers of bytes, converting them to
bytes necessarily needs the specfication of encoding.

I think Rust is a language that wants to be explicit. The `to_bytes`
function is basically the opposite of that – it converts an integer into
the native byte representation, but there's no mention (in the function
name) of it being very much platform dependent. Therefore, I think it
would be better to replace that method by three methods, the explicit
`to_ne_bytes` ("native endian") which does the same thing and
`to_{le,be}_bytes` which return the little- resp. big-endian encoding.

@tbu- tbu- changed the title from Provide `{to,from}_{native,le,be}_bytes` functions on integers to Provide `{to,from}_{ne,le,be}_bytes` functions on integers Aug 3, 2018

@tbu-

This comment has been minimized.

Contributor

tbu- commented Aug 3, 2018

Rebased, changed to_native_bytes to to_ne_bytes.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Aug 3, 2018

This looks great, thanks! Sorry for one last hoop to jump through, could you change the tracking issue in #[unstable] attributes to #52963 ? r=me with that.

Change tracking issue from #49792 to #51919
The old issue has already been in FCP, a new issue was opened for the
new API.
@tbu-

This comment has been minimized.

Contributor

tbu- commented Aug 4, 2018

@SimonSapin Done.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Aug 4, 2018

@bors r+

Thanks!

@bors

This comment has been minimized.

Contributor

bors commented Aug 4, 2018

📌 Commit 0ddfae5 has been approved by SimonSapin

kennytm added a commit to kennytm/rust that referenced this pull request Aug 4, 2018

Rollup merge of rust-lang#51919 - tbu-:pr_num_to_from_bytes2, r=Simon…
…Sapin

Provide `{to,from}_{ne,le,be}_bytes` functions on integers

If one doesn't view integers as containers of bytes, converting them to
bytes necessarily needs the specfication of encoding.

I think Rust is a language that wants to be explicit. The `to_bytes`
function is basically the opposite of that – it converts an integer into
the native byte representation, but there's no mention (in the function
name) of it being very much platform dependent. Therefore, I think it
would be better to replace that method by three methods, the explicit
`to_ne_bytes` ("native endian") which does the same thing and
`to_{le,be}_bytes` which return the little- resp. big-endian encoding.
@rust-highfive

This comment was marked as off-topic.

Collaborator

rust-highfive commented Aug 4, 2018

Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
Automatic restarts limited: Please try restarting this job later or contact support@travis-ci.com.
travis_fold:end:step_start_instance

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

bors added a commit that referenced this pull request Aug 4, 2018

Auto merge of #53056 - kennytm:rollup, r=kennytm
Rollup of 14 pull requests

Successful merges:

 - #51919 (Provide `{to,from}_{ne,le,be}_bytes` functions on integers)
 - #52940 (Align 6-week cycle check with beta promotion instead of stable release.)
 - #52968 (App-lint-cability)
 - #52969 (rustbuild: fix local_rebuild)
 - #52995 (Remove unnecessary local in await! generator)
 - #52996 (RELEASES.md: fix the `hash_map::Entry::or_default` link)
 - #53001 (privacy: Fix an ICE in `path_is_private_type`)
 - #53003 (Stabilize --color and --error-format options in rustdoc)
 - #53022 (volatile operations docs: clarify that this does not help wrt. concurrency)
 - #53024 (Specify reentrancy gurantees of `Once::call_once`)
 - #53041 (Fix invalid code css rule)
 - #53047 (Make entire row of doc search results clickable)
 - #53050 (Make left column of rustdoc search results narrower)
 - #53062 (Remove redundant field names in structs)
@bors

This comment has been minimized.

Contributor

bors commented Aug 4, 2018

⌛️ Testing commit 0ddfae5 with merge 215bf3a...

@bors bors merged commit 0ddfae5 into rust-lang:master Aug 4, 2018

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
homu Testing commit 0ddfae5ba2d0104a33f3a7162571b761458f0464 with merge 215bf3abd9c0eb1fd0c6cae7c92a842732dc033d...
Details
@bors

This comment has been minimized.

Contributor

bors commented Aug 4, 2018

💥 Test timed out

SimonSapin added a commit to SimonSapin/rust that referenced this pull request Aug 14, 2018

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Aug 17, 2018

Rollup merge of rust-lang#53358 - SimonSapin:int-bytes, r=shepmaster
`{to,from}_{ne,le,be}_bytes` for unsigned integer types

Same as rust-lang#51919 did for signed integers.

Tracking issue: rust-lang#52963
@gamozolabs

This comment has been minimized.

gamozolabs commented Aug 27, 2018

I would really like to see this be put into a trait rather than functions.

As a trait this could be implemented by a builtin derive which would make it work for structures composed of primitive types like this is being implemented on currently.

This would allow for simple creation of packed (or C structures by ignoring padding) structures in Rust (like an IP header or binary file header) and then the automatic conversion to and from the raw bytes on the wire/file.

I have a library to do this now and I use it in many of my projects. I think the ability to be able to handle reading and writing packed structures safely from bytestreams is critical enough to a systems language to justify it being put into the core language rather than being an external library. Especially given it would be almost no code increase to the Rust base (would be a procmacro), and would have no implications to programs unless they decide to use the feature.

@BurntSushi

This comment has been minimized.

Member

BurntSushi commented Aug 27, 2018

@gamozolabs I think it would be appropriate to publish that library on crates.io, garner adoption and prove out the API. At that point, someone could consider writing an RFC, but I think it's probably immature to jump to that right now.

zhzy0077 added a commit to zhzy0077/LocustDB that referenced this pull request Sep 18, 2018

As rust-lang/rust#51919 changes function name, use to_ne_bytes instea…
…d. Therefore we don't need to specify toolchain version.

cswinter added a commit to cswinter/LocustDB that referenced this pull request Sep 18, 2018

Update rust-toolchain and fix some typos (#27)
* Fix typos and unused lifetimes
* Remove duplicated cast
* As rust-lang/rust#51919 changes function name, use to_ne_bytes instead. 
* Remove stabled feature flag
* Update toolchain to latest version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment