Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Numeric literal types #2507

Closed
wants to merge 2 commits into from
Closed

Conversation

mcy
Copy link

@mcy mcy commented Jul 30, 2018

This RFC proposes two new primitive types, ulit and flit, which represent unsized numeric literals. These are intended to be used as the types of untyped constants (like, for example, C-style #defines or Go-style constants), and as the argument types for a future custom literals feature.

Rendered.

CC @scottmcm

// it is unlikely that an flit will need to be coerced at runtime
struct flit {
middle: usize,
bytes: [u8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the fields to reflect that flit is now a ratio.

In the ratio representation, how to ensure the compiler won't be DoS'ed with the following (it prints inf today)?

fn main() {
    let m = 1.0e+999_999_999_999_999_999_999_999_999;
    println!("{:?}", m);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This representation doesn't work either, at least not the "fdiv to convert" bit, since to do that you first need to convert numerator and denominator to the float format.

The "ratio of bignums" aspect is at least lossless and can therefore be salvaged by a more expensive conversion algorithms, but it's not more useful than other representations for those algorithms. Frankly, I don't think that there's anything better than a plain old string for representing literals.

In any case, the conversion will be rather expensive when done at runtime. In the worst case it requires at least one >1000 bit integer division+remainder calculations, to the best of my knowledge even multiple. Yet another reason to not permit such values to leak into runtime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm I don't quite understand what you mean-- the middle field indicated where the numerator ends and the denominator starts? I wrote ([u8], [u8]) originally but I didn't want to have to explain the pseudocode.

Also... this seems like a problem, since neither a ratio or IEEE representation seem better... the former makes scientific notation awful, and the latter makes most decimals require truncation. I think @rkruppe has a point- I don't intend these values to ever reach a runtime context, so in practice a string representation with a very expensive conversion is fine. I'll update the RFC later with a list of possible representations and their drawbacks.

Copy link

@hanna-kruppe hanna-kruppe Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If flit is going to be just a string without (I assume) any compile-time arbitrary-precision arithmetic and the conversion happens at compile time anyway, I don't see any reason to have flit in the first place: just use strings and use the standard float parsing facilities (which aren't currently available at compile time, but should be).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the standard float parsing facilities handle arbitrary-size floats? I assume you're referring to f64::parse and friends. For most use-cases what you suggest is totally fine, but I worry about restricting ourselves to the capacity of f64.

As I mentioned below, I think an opaque type (which could even just be a lang item!) might be a neater abstraction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. Let's be careful about what the language/compiler can actually provide to a user-defined type in this area. It can't reasonably do base conversion and rounding for them. For example, a "bignum rational" library type, a base 10 floating point, and a base 2 fixed point type all do very different things with the decimal digits in the source code. Library types will often have to do their own custom parsing, period.

Furthermore, even if there's code to be shared between many such libraries, it can simply be yet more library code. It doesn't have to be built into the compiler (just as f64::parse isn't!).

Copy link
Author

@mcy mcy Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! Here's what I imagine if we go for the string-based alternative:

// mod core::ops? core::num is private iirc

// the compiler needs to know about this type, since it needs to
// be able to construct it from literals in source code.
#[lang = "float_lit"] 
// sticking with flit for now, though since it's not a primitive 
// will definitely want to go with FloatLit... which I would be ok with
struct flit(str); 

impl flit {
    /// The literal, as it appears in source code.
    /// (Canonicalized to remove underscores.)
    ///
    /// Consider, e.g. `lit.verbatim().parse::<f64>().expect("...")`
    const fn verbatim(&self) -> &str { .. }

    // the following are all *very* expensive str-manipulation fns
    const fn numer_bytes(&self) -> &[u8] { .. }
    const fn denom_bytes(&self) -> &[u8] { .. }

    const fn mantissa_bytes(&self, base: usize) -> &[u8] { .. }
    const fn exponent_bytes(&self, base: usize) -> &[u8] { .. }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would benefit from {numer,denom}_bytes and how much? The only types that could use it that comes to mind are rationals, and it only saves them calling a function they'd probably have anyway (parsing decimal strings) -- and they may not otherwise have a way to interpret these u8 slices.

As for mantissa_bytes, exponent_bytes, I struggle to understand what these even do, let alone how they're useful for anything. The (mantissa, exponent) representation generally requires a periodic mantissa, since it's just a different way to write scientific notation. So, does it round? Then it's missing the target precision. But even the target precision is not enough for types with fixed-size exponent field, because overflowing the exponent range can impact rounding (e.g., if you have subnormals like IEEE 754).

What you can write is a parsing function for IEEE 754 floating point parametrized by the base, precision, and exponent range (that's the way the standard is written, even!) but that function is even more niche than the rational-based functions discussed before.


And yet again my most important contention remains unaddressed: what justifies putting these conversion functions into the standard library rather than letting those who need that conversion either do it themselves or import a third party library for it?

@oli-obk
Copy link
Contributor

oli-obk commented Jul 30, 2018

Am I right in understanding that there shall never be a runtime value of these types? So no coercion from ulit to u8 at runtime, but always at compile-time?

Otherwise we run into not so nice situations where the runtime value is silently losing bits due to a coercion and we can't even guarantee that we'd lint it. We can still do a best-effort variant, but that feels less nice.

@Ixrec
Copy link
Contributor

Ixrec commented Jul 30, 2018

It's not clear to me that Go-style "untyped" constants and custom literals should be using the same underlying types, or that we want custom literals.

My understanding of Go's "untyped" arbitrary-precision constants is that the only advantage they have relative to type inference of constants (which we still haven't figured out) is that Go can do arbitrary-precision arithmetic with those constants, deferring any cocercion into a fixed-size type until some non-constant code is involved. If I understand this RFC correctly, ulit/flit get coerced to a fixed-size type on first use no matter what, so they don't actually provide any arbitrary-precision arithmetic.

Of course, if they did provide arbitrary-precision arithmetic, then they wouldn't be "literal" types anymore and the names ulit/flit wouldn't feel right.

Since this RFC doesn't mention arbitrary-precision arithmetic or type inference of constants at all, I'm confused as to what this RFC on its own actually accomplishes or what the argument for it is supposed to be. Right now, it just feels like half of a custom literals proposal that doesn't make sense to discuss in isolation (especially since my concern with custom literals is motivation, not mechanism).

Did I misunderstand Go or this RFC somehow?

@Ixrec
Copy link
Contributor

Ixrec commented Jul 30, 2018

I think making custom literal machinery involve an arbitrary-precision type (unlike C++, where you only get the builtin fixed-size number and char types) only makes sense if we plan on going all the way with it and supporting arbitrary-precision arithmetic of custom-literal-using types, e.g.:

const SUPER_SAIYAN_SPEED: mps = 9_000_000_000_000_000_000_000_000_000_000_000_000_001_km / 1_hr;

Though I have no idea how we'd want to design custom km, hr, and mps types that can be arbitrary precision yet are somehow usable by runtime code. Seems like it'd get very magical very fast unless we actually embedded a BigNum type into the core language.

C++, the only language with custom literals, simply takes its versions of
`u64` and `f64` as arguments for literals, this is an unnecessary restriction
in Rust, given that we recently stabalized the `u128` type. This problem
cannot be neatly worked around, as far as we know.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The far simpler solution for custom literals is to just take a string. In fact C++ supports that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type would a function-style proc macro get if it was "passed" a superlong literal like this? Just a numeric literal node containing a string?

Copy link
Member

@kennytm kennytm Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ixrec you get a Literal. Note that there's no public methods to extract the content besides .to_string().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like the idea of taking literally &'static str, since any application using a size that doesn't fit in the biggest type (and thus has a FromStr implementation) will need to parse the literal. I'm in favor of a string-based representation, but I think it should be opaque, with methods to extract components, like the exponent of a scientific notation string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Chopping the string into its basic components is the easiest part of the parsing process by a mile. The compiler can't help with any of the actually hard parts, and inventing a whole new (thoroughly weird) kind of primitive type just to save a few library types the trouble of doing .split('.') and .split('e') seems disproportionate.

```

Literal types are otherwise *mostly* like normal integers. They support
arithmetic and comparisons (but don't implement any `std::ops` traits, since

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include arithmetic for flit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; I guess that's not clear. I'll update the RFC noting that later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Where's the motivation for arbitrary-precision "float" arithmetic?

Also: how? Completely accurate arithmetic beyond integers is a difficult area without any local minima that are clearly good enough to bake into the language. Usually people envision bignum rationals for this sort of thing, but besides not being able to handle anything beyond add/sub/mul/div, it has serious downsides: even if you always use irreducible fractions and eat the significant overhead of doing so, the space requirements are pretty bad for many numbers (and truly awful for some).

@mcy
Copy link
Author

mcy commented Jul 30, 2018

@oli-obk @Ixrec
I'll update the RFC in an attempt to make this a bit more clear, but I'll try to answer some questions somewhat more informally, for now:

  • The whole "in a Sized context" business is more of a proxy for "when it exits compile-time mode", which seems quite a bit harder to define. I haven't nailed it down yet, and I'm hoping someone with a bit more experience can help me do that.
  • Arithmetic is meant to be supported in a constant setting, though as @kennytm suggests, I worry there is potential for some innocent-looking expressions to DoS the compiler. I'll have to think about this more.
  • @scottmcm suggests another reason to have arbitrary arithmetic: const generics. However, I think a bespoke const bignum would be the better choice here.
  • If we judge that such compile-time constants, as Go and C do them, are not valuable or too hard to pin down at the moment, that probably means a pared-down version of the proposal can be folded into whatever custom literals RFC ends up happening (which I'd hold off on proposing until after Rust2018).

I'm proposing this RFC now as a result of this internals thread, in an attempt to start resolving the big open questions.

@Ixrec
Copy link
Contributor

Ixrec commented Jul 30, 2018

Arithmetic is meant to be supported in a constant setting

Cool, thanks for clarifying.

In that case my major concern is, as mentioned above, whether this is possible without designing some sort of bignum type that's built into the core runtime language. I see the appeal of a "const-only bignum", but that would mean a massive divergence between const code and runtime code (instead of the status quo where const code is mostly a subset of runtime code), and getting the final bignum value into runtime code raises a lot of questions that Go didn't have to answer. If we just say that bignum is const-only and it coerces to fixed-size types, does that mean we can never have that bignum at runtime in the future? Does that create compatibility hazards for adding type inference in constants someday? How does a "const-only type" with implicit coercions interact with generics, e.g. can I write a fn foo<T: Debug, const x: bignum>(...) { ... } function that constructs a Vec<T> of size x? (I guess in this case it'd round x down to usize::MAX, but I can see this sort of thing getting pretty subtle)

@hanna-kruppe
Copy link

hanna-kruppe commented Jul 30, 2018

Taking a step back from the various technical issues, I find it extremely difficult to read this as a RFC. It seems more like a superposition of several different and only slightly-overlapping proposals. While in some sense the union of all the features needed for the various proposals gives something that would technically addresses all the use cases, the result feels more like a hard-to-follow frankenstein proposal than a single general design to me.

As I see it, there are at least three mostly-to-entirely separate mechanism that this RFC tries to define:

  • "Named literals" matching the existing inference variables {int} and {float}. This is the smallest proposal of the bunch, and the only one to receive actual motivation in the RFC text. Although it could presumably be emulated with solutions for the other two use cases below, it needs only a tiny subset of them and so bundling it up with the much more grandiose proposals does it a disservice IMO.
    • In short: why are we wracking our heads over bignums and rounding modes just to enable 123i8 & MASK and 123456u32 & MASK?
  • Compile-time arbitrary-precision arithmetic. This is a huge and hard topic, especially wrt "pseudo-floats". It does not seem necessary for "named literals" (as literals today are always inferred to be a specific fixed-size type before use), it goes much further and enables arbitrary-precision compile-time calculations that are unprecedented in the core language and not at all motivated in the RFC text. Honestly it seems obfuscating to tie this to literals, and it also complicates this part of the proposal compared to a standalone "compile-time bignums" proposal.
    • For example, the "implicit conversion to fixed-size primitive" part could be an explicit operation if this wasn't supposed to address the "typeless constants" use case.
  • Something that user-defined literals can be built on. While the RFC text says this is out of scope, fact is that it shapes the design and comes up frequently in the discussion. I think it's a lost cause to try and anticipate the needs of hypothetical future features that will themselves require significant design and discussion. Furthermore, none of the many complications of the above two points are relevant to user-defined literals: they just need to be a vessel in which we can safely transport the programmer's source code to the user-defined function, no arithmetic or coercions to primitives are needed.

I admit I'm not convinced by any of these three proposals, but it seems clear to me that muddling them together hurts all of them.

@mcy
Copy link
Author

mcy commented Jul 30, 2018

Thanks all for the feedback! It looks like I accidentally coupled a bunch of orthogonal features, and it'll take me a couple days to disentangle them. I'm going to close the PR for now, and I'll re-open it once I'm done consulting with Dr Frankenstein (i.e., once I have something a bit more workable)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants