Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Hex literals #2244

Closed
wants to merge 7 commits into from

Conversation

Projects
None yet
@newpavlov
Copy link

newpavlov commented Dec 12, 2017

Rendered

You will be able to write:

const BYTES: &'static [u8; 4] = h"00 aa cc ff";

assert_eq!(BYTES, &[0u8, 170u8, 204u8, 255u8])
@KillTheMule

This comment has been minimized.

Copy link

KillTheMule commented Dec 12, 2017

Unless I'm missing something, the "Rendered" link does not point where it should.

@newpavlov

This comment has been minimized.

Copy link
Author

newpavlov commented Dec 12, 2017

Ups... I've fixed the link.

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Dec 12, 2017

This looks like even more of a niche use case than NUL-terminated string literals, which were unfortunately rejected.

- Instead of `h` use a base a modifier on the `b` prefix, e.g. `bx` for hex
binaries, `bo` for octal ones, `bb` for binary, or `bN` where N is the base
(between 2 and 36 included?)
- Built-in macro, e.g. `hex!("00 ff ee")`

This comment has been minimized.

@Manishearth

Manishearth Dec 12, 2017

Member

This can also be a non-builtin macro; procedural macros exist and work pretty well.

This comment has been minimized.

@newpavlov

newpavlov Dec 12, 2017

Author

Is there an example of implementing this functionality using procedural macros? I am not that familiar with them, so I was not sure if they are expressive enough to do it. Either way I'll change wording a bit.

This comment has been minimized.

@Manishearth

Manishearth Dec 12, 2017

Member

It's pretty straightforward; I don't have an example on hand but you'd basically have a hex!() macro that parses the hex string and outputs a byte array literal

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 12, 2017

As people already mentioned in the pre-RFC thread, a proc macro crate on crates.io would work just as well. Most of the motivation apply unmodified, and even the token drawback is resolved. Therefore I'm opposed to this language extension, at least until someone gives reasons why this should be in the language as opposed to a macro.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 12, 2017

Comparison with existing syntax:

let udp_data = h"
    1111 2222
    0c00 ffff
    6461 7461
";

let udp_data = b"\
    \x11\x11\x22\x22\
    \x0c\x00\xff\xff\
    \x64\x61\x74\x61\
";

let udp_data = [
    0x11, 0x11,  0x22, 0x22,
    0x0c, 0x00,  0xff, 0xff,
    0x64, 0x61,  0x74, 0x61,
];
@ketsuban

This comment has been minimized.

Copy link

ketsuban commented Dec 12, 2017

This looks like even more of a niche use case than NUL-terminated string literals, which were unfortunately rejected.

I'm not sure I agree with that assessment, but I'm also not favourably inclined towards this proposal. I'd like to see prior art in another language - if the developers of some other better-established language said "this is something valuable to us" then that's a good argument in favour of adopting it. As is, it seems like the syntactic gain is minimal compared to a procedural macro.

@vitiral

This comment has been minimized.

Copy link

vitiral commented Dec 12, 2017

(question) is it possible with a macro to have syntax like:

hex![
    00 05 aa cc ff
    00 aa cc ff 07
];

or does the mixing of numbers+letters+spaces make that impossible? It would probably also have odd syntax highlighting...

If not, maybe something like this is possible?

hex![
    x00_05_aa_cc_ff
    x00_aa_cc_ff_07
];

If these are possible, I actually prefer this kind of syntax as it no longer looks like a string and gives you a lot of freedom in how you want to format the blocks.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Dec 12, 2017

That is all possible with a proc macro. It just needs to be a valid token tree.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 12, 2017

Some hex strings are not valid tokens though. For example, 0b (https://play.rust-lang.org/?gist=e27373e75ff0e2be93b00a2ff39eefa6&version=stable)

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Dec 12, 2017

You can just quote it: hex!("0123456789abcdef")

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 12, 2017

Yes, but that's not what @vitiral asked about.

@kdar

This comment has been minimized.

Copy link

kdar commented Dec 12, 2017

I would like to see this as a proc macro as well.

@eminence

This comment has been minimized.

Copy link
Contributor

eminence commented Dec 12, 2017

I've only seen documentation on how to use a proc macro to implement a custom define. But this proposed hex!() macro seems to be a different type of proc macro. If this is possible, I think a new tutorial based on this RFC would be a wonderful resource. In addition to showing the community how to build this type of macro, it could also be used as a reference for people who want to prototype small language extensions as a macro crate in the crates.io ecosystem.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 12, 2017

For now the stable way to have a procedural macro is to abuse custom derive, see proc-macro-hack. Eventually real procedural macros will be stabilized and that should include docs.

@Centril Centril added the T-lang label Dec 13, 2017

@FlorentBecker

This comment has been minimized.

Copy link

FlorentBecker commented Dec 13, 2017

Spaces (newlines / tabs) should be disallowed at odd offsets within the literal: h"012 34 056" should be rejected.

@newpavlov

This comment has been minimized.

Copy link
Author

newpavlov commented Dec 13, 2017

I've added a paragraph on my opinion regarding procedural macros. While I understand general rejection for the new syntax, I still believe its the most optimal solution for the problem, especially considering small size and isolated nature of the feature. Yes, most of the Rust programmers do not work with static arrays (if possible I would like to ask commentators to note if they deal with static byte arrays in their code), but in some use-cases it's a quite unpleasant papercut.

As for prior art, as was mentioned in the internals thread D has a similar feature.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 13, 2017

I don't understand this argument. How exactly are the people who need this feature measurably worse off if it's a macro in an external crate? Why does having to add #[macro_use] extern crate foo; and write slightly a couple more characters on each invocation make it "less useful"?

@newpavlov

This comment has been minimized.

Copy link
Author

newpavlov commented Dec 13, 2017

For one you will not be able to use it in code examples if your crate does not otherwise depend on it. But yes, "less ergonomic" would have been a better wording. The same way as writing to_string() annoys people thus resulting in different proposals (1, 2) to remove those "couple more characters", hex literals try to remove one of the Rust papercuts, just arguably less familiar to the most of Rust programmers.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Dec 13, 2017

Doctests can use dev-dependencies, right?

Admittedly, it's still a tall order to pull in a proc macro for a small example. Or more generally, for very small programs, or programs that include only one or two small hex strings. However, in those cases the existing syntax is also available and not that bad (precisely because the program/hex string is so small).

@matthieu-m

This comment has been minimized.

Copy link

matthieu-m commented Dec 13, 2017

The more I think about it, the more it appears to me that this is a very niche case, and a symptom of a larger issue.

C++11 introduced user-defined literal suffixes, which allows writing 1_h for 1 hour and 1_s for 1 second. It makes creating typed quantities easy, and is really useful for "big nums", etc...

I'm not sure I see the need for str: hhmmss("193542") is an easy enough way to convert a string into a time. However, for integer literals and floating point literals, it may be useful, especially when the literals exceed the capacity of a regular integer/floating point.

Which would you rather read (assuming they are strictly identical):

let big = BigInt::new(b"0x123456789012345678901234567890");

or:

let big = 0x123_456_789_012_345_678_901_234_567_890::BigInt::new; // made-up syntax

?

The latter is nice because the compiler will take care of removing the pesky _ which are so appreciated by a human reader.

Or to repeat Simon's comparison:

let udp_data = 0x1111_2222___0c00_ffff___6461_7461::h;

Assuming that fn h(data: &[u8]) -> &[u8] { assert_eq!(&data[0..2], b"0x"); &data[2..] } is defined above.

(It does not seem as necessary for &str and &[u8] literals as one can always create a trait to get b"1111 2222 0c00 ffff".h())

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Dec 14, 2017

The latter is nice because the compiler will take care of removing the pesky _ which are so appreciated by a human reader.

What's the type of BigInt::new in the latter code? Why wouldn't the former code be able to strip the _ too?

@matthieu-m

This comment has been minimized.

Copy link

matthieu-m commented Dec 14, 2017

@nox: BigInt::new would be the same in both cases; and yes of course it could also remove the _, if the developer thought about it.

@petertodd

This comment has been minimized.

Copy link
Contributor

petertodd commented Dec 15, 2017

FWIW, regarding use-cases, this is really useful for cryptography. As an example, most of the Bitcoin libraries I've seen have added some kind of hex-literal kludge to make hex literals more usable, including my own python-bitcoinlib. Similarly @apoelstra's rust-bitcoin uses a lot of hex literals, usually done at run-time with strings.

@apoelstra

This comment has been minimized.

Copy link
Contributor

apoelstra commented Dec 15, 2017

Almost all of the code that I write involves hex literals, including rust-bitcoin and rust-secp256k1 but also some proprietary code that I've been working on over the last couple of years. I don't understand how this is a "niche usecase". It is nearly impossible to write unit tests for anything that involves binary protocols without using hex literals.

@tmccombs

This comment has been minimized.

Copy link

tmccombs commented Dec 16, 2017

I like the idea, but I'd rather see a more general way to define procedural macros like this in user code. Scala and javascript both have ways to do this (See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals and http://docs.scala-lang.org/overviews/core/string-interpolation.html#advanced-usage). This certainly isn't the only type of string literal that would be useful. Some others I can thing of include compile-time regex, string interpolation, SQL queries with safe parameter interpolation, etc.

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Dec 16, 2017

@apoelstra @tmccombs There is nothing about these use cases that can't be implemented as a procedural macro.

@tmccombs

This comment has been minimized.

Copy link

tmccombs commented Dec 17, 2017

@nox that's true, but:

  1. Writing such procedural macros, requires a fair amount of boiler-plate to make sure the TokenStream is a single string literal, and produce an appropriate error message if it isn't. It would be more conveninet to define the macros as a function that takes a string and returns a TokenStream.
  2. Procedural macros haven't stabilized yet, and hygiene hasn't been ironed out yet, which is pretty important for things like string interpolation.
  3. There are an extra three characters required s!("...") vs s"...".

Of these, I think 1 is the most important, but it could probably be solved by making a macro to define these types of macros in a third-party library.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Dec 17, 2017

Writing such procedural macros, requires a fair amount of boiler-plate to make sure the TokenStream is a single string literal, and produce an appropriate error message if it isn't.

You need to write such procedural macros exactly once and then they're easy to use. In fact, that is less than the amount of effort needed to add this to the language, since you need to handle the parsing and move it through the stabilization process 😀

"This crate is nontrivial to write" is not an argument for making something part of the language.

FWIW long term plans/ideas for proc macros include the ability to define them in the same crate.

Procedural macros haven't stabilized yet, and hygiene hasn't been ironed out yet, which is pretty important for things like string interpolation.

I don't see how that is relevant to the proposed proc macro that just takes a string and outputs an array literal. Hygeine is only involves when you deal with identifiers, which we aren't doing.


Bear in mind, stuff like the regex crate isn't included in the stdlib either, nor do we have regex literals. The bar for inclusion in the language/stdlib is quite high for things which can work approximately as well as a crate.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Dec 17, 2017

I think the proposed syntax is a breaking change for macros 1.0:

macro_rules! foo {
    ($f:ident $g:expr) => {2}
}

fn main() {
    let x = foo!(h"abcdef");
    println!("{}", x);
}
@tmccombs

This comment has been minimized.

Copy link

tmccombs commented Dec 18, 2017

@Manishearth I'm not saying it necessarily has to be part of the language.

it could probably be solved by making a macro to define these types of macros in a third-party library

I think there should be a supported (but not part of std) crate that allows you to do something like:

string_macro!{ 
  fn my_macro(text: &str) -> TokenStream {
    ...
  }
}
@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Dec 18, 2017

@tmccombs as I already said, that's something that may be the eventual state of proc macros. I don't think that's something we should propose now since there's a well decided planned evolution for proc macros (and RFCs / implementations are in flight right now)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jan 25, 2018

@rfcbot fcp close

This seems like a useful addition, but I think not useful enough to merit adding to the base language -- especially when it could readily be prototyped in a procedural macro (this is the sort of case where proc-macro-hack would work just fine). Therefore, I move to close.

Thanks to @newpavlov for the suggestion, in any case!

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jan 25, 2018

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.