Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `f16b` floating-point type for native support of `bfloat16` #2690

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@joshtriplett
Copy link
Member

commented Apr 19, 2019

Rendered

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2019

@joshtriplett joshtriplett added the T-lang label Apr 19, 2019

@rkruppe

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

After how #2686 exploded overnight to a degree where I had trouble even following it, I am almost afraid to leave the first comment here too 😄 but I think this RFC should map out the motivation & alternatives more. Adding a new primitive type is a big commitment, and this particular type is still new & niche -- who's to say Rust won't long outlast all hardware that supports it? Countering this concern is, I assume, the desire for Rust to "get in on the ground floor" and become useful for the domains where this floating point type is already gaining ground. But the RFC should lay out in detail:

  • what are those domains?
  • how is Rust going to be used/useful there? (e.g., I don't see it being super useful for neural networks, as models are generally authored in higher level DSLs and the niche of writing the underlying computational kernel is small and doesn't benefit much from Rust's advantages over C or Fortran)
  • what, specifically, are the advantages of a portable built-in type over hardware-specific types and intrinsics?

I'm also curious how this interacts with SIMD. Many of the domains where I see bfloats being proposed also greatly benefit from data parallelism and low-level performance tuning, so my (possibly inaccurate) mental picture was that bfloat-using software is heavily using intrinsics rather than scalar code or possibly auto-generates the instructions from DSLs .

@eaglgenes101

This comment has been minimized.

Copy link

commented Apr 21, 2019

I can work with the existence of this type in stdlib, but I'm not really feeling the justification vs a crate that implements this, lopping off the least 16 significant bits of intermediate f32s as needed.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

Why can't this be implemented in a crate on crates.io ? The RFC does not mention this, and that would seem like the most important alternative design-wise to adding a new primitive type.

AFAICT, the only proposed API that a crate can't support is literals (e.g. 1.23_f16b), but this problem comes often enough that it would IMO be worth solving.

Also, how does this fit in the 2019 Roadmap?

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 22, 2019

The justification for having native support, rather than a crate, is that this needs dedicated code generation in LLVM to make use of hardware support for bfloat16 on specific platforms. Falling back to f32 or f64 is exclusively for platforms that don't have hardware support for bfloat16.

My expectation would be that code using bfloat16 may use intrinsics, carefully written code, or both. Intrinsics force the compiler to generate specific instructions, rather than giving it the flexibility to group operations and take advantage of whatever instructions the native platform has.

bfloat16 is heavily used in matrix operations, especially in neural network implementations. That includes both writing kernels and moving data around, and for both purposes I'd like to have native support in Rust.

If this were a purely software format, I wouldn't suggest giving it native support, and a separate crate would make perfect sense. However, we're talking about a new floating-point format that has hardware support in CPUs and coprocessors, and is extensively used by prominent floating-point-crunching software.

Keep in mind that right now I'm proposing support in nightly, and I fully expect this to bake for a while before becoming stable.

I'll make several updates to the RFC to address the various concerns raised, including some clear discussion of the idea of a separate crate.

joshtriplett added some commits Apr 22, 2019

Clarify standard library size
Additional code in the standard library doesn't mean larger programs,
unless those programs use the new code.
@darconeous

This comment has been minimized.

Copy link

commented Apr 22, 2019

If this gets added, is the IEEE 754-2008 binary16 type (f16, I presume) going to be considered as well? It, too, has extensive hardware support. It is also the only half-precision type supported by CBOR.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 23, 2019

@darconeous As the RFC mentions, it could certainly be considered. I would suggest filing a separate RFC.

The proposal in this RFC is to not have any type be f16, and to instead use distinct names to disambiguate the two common 16-bit floating-point formats. So, for instance, I'd suggest f16h, for "half-float".

One notable downside of the IEEE binary16 type, however: if it isn't supported in hardware, it's far more expensive to emulate, requiring either software implementation or conversions, along with handling for overflow and similar. That makes it hard to support universally.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

The justification for having native support, rather than a crate, is that this needs dedicated code generation in LLVM to make use of hardware support for bfloat16 on specific platforms.

Which platforms? Could you be more precise about which code generation is needed? And repeating @rkruppe question: why aren't intrinsics enough ?

My expectation would be that code using bfloat16 may use intrinsics, carefully written code, or both. Intrinsics force the compiler to generate specific instructions, rather than giving it the flexibility to group operations and take advantage of whatever instructions the native platform has.

How is the LLVM-IR emitted using intrinsics any different from "whatever else" this RFC is proposing? (It is completely unclear to me at this point what exactly this RFC is proposing).

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

Intrinsics force the compiler to generate specific instructions, rather than giving it the flexibility to group operations and take advantage of whatever instructions the native platform has.

Expanding on what @gnzlbg wrote: A "native" type in Rust will still call llvm intrinsics. If we expose those via Rust intrinsics and then higher level functions, you will lose nothing (even Rust's checked addition uses intrinsics). IIRC exposing more unstable intrinsics in core::intrinsics does not require an RFC and would allow implementing a support crate (on nightly) that offers the f16b type.

Once the support crate has given us enough data to know what to expose in order to make it stable, then an RFC can be openend to suggest that stable API.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

Expanding on what @oli-obk a bit more, if we wanted to support IEEE-754 binary16, it might make sense to at least add a perma-unstable way to directly use LLVM's half type, at least so that libcore and libstd can use, e.g., feature(link_llvm_intrinsics) to interface with such a type (which some platform intrinsics use on their LLVM ABIs). I still think it might be dubious to expose such a type to users, because we can just use struct f16(i16); (just like the half crate does) and as long as we expose core::{arch, intrinsics, ...} that take and return i16s we could expose the whole f16 API without having to add a new primitive type to the language (modulo literals, but that's probably better solved with user-defined literals).

But this RFC is not about binary16, it is about bfloat16, which is something completely different.

The only argument in favor of this RFC with some weight is @joshtriplett 's: "this needs dedicated code generation in LLVM to make use of hardware support for bfloat16 on specific platforms".

Yet AFAICT, from reading the LangRef (which is often arguably incomplete), LLVM does not have data-types and/or intrinsics for bfloat16, so such "dedicated code generation" does not exist (yet).

For some targets, some intrinsics for operating on bfloat16 might be added in the future to LLVM (e.g. avx512bf16: https://reviews.llvm.org/D60552), but these are not part of LLVM yet, and these do not need any kind of bfloat16 primitive type support (or at least not the ones that are currently proposed, they all accept i16s, vector types, etc. they can't accept bfloat16 because LLVM does not have such a type yet).


I need to add that I don't think these RFCs by the @rust-lang/lang team are a good use of our time.

It took less time to land some of these intrinsics to core::arch (rust-lang-nursery/stdsimd#737) with documentation, tests, run-time feature detection, automatic-verification, etc. than what it took me to write this comment, much less what it probably took to write this RFC, for everyone to read it, discuss it, etc.

It also feels that the RFC was written just because it could be written, not because it directly improves or solves a problem that actually needs solving [0] or because it is part of the roadmap or anything.

All of this would have been completely avoidable if somebody would have reviewed this before, or if there would have been some discussion about this somewhere (an rfc repo issue, an internal threads, a pre-RFC, etc.). If I could vote, I'd say let's close this, open an issue or an internal threads, and start by exploring basic design questions, like "why can't this be in a library".

[0] E.g. because a crates.io crate already exists to solve this problem, but it does not work well because of X, therefore we need a new primitive type - such a crate does exist for IEEE-754 binary16: https://crates.io/crates/half , but this does not appear to be the case for bfloat16, yet here we are.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 23, 2019

@gnzlbg

Which platforms?

As called out in the bfloat16 Wikipedia article I linked to in the RFC:

The bfloat16 format is utilized in upcoming Intel AI processors, such as Nervana NNP-L1000, Xeon processors, and Intel FPGAs, Google Cloud TPUs, and TensorFlow.

LLVM does not have data-types and/or intrinsics for bfloat16, so such "dedicated code generation" does not exist (yet).

"yet" would be the operative word. Support in toolchains (gcc, binutils, LLVM, etc) is in progress right now. The intention would be to add type-level support in LLVM (not just intrinsics) and then pipe that through to rustc.

I certainly would agree that without such native support, this could just as easily be a crate. The whole point of this RFC is to enable native support.

Assume for the moment the development of such support in LLVM. Would you agree that enabling such support in Rust would require native support, not just intrinsics and a crate?

The only argument in favor of this RFC with some weight is @joshtriplett 's: "this needs dedicated code generation in LLVM to make use of hardware support for bfloat16 on specific platforms".

This is precisely the argument that motivated submitting this RFC in the first place.

@oli-obk

A "native" type in Rust will still call llvm intrinsics.

As I understand it, types natively supported by LLVM get handled directly rather than via intrinsics. I certainly want to see the intrinsics as well, but the point of this RFC is to work towards native support in LLVM (and future backends).

@gnzlbg

It took less time to land some of these intrinsics

This RFC isn't asking for intrinsic support. Again, the whole point of this RFC is to enable native support, ideally in time for an upcoming processor. I'd like to make Rust an obvious first-class choice for development on that platform.

It also feels that the RFC was written just because it could be written, not because it directly improves or solves a problem that actually needs solving

I'd really appreciate it if you stop repeatedly using insulting approaches like this, such as treating your own feeling that you don't think a problem needs solving as an authoritative pronouncement that the problem doesn't need solving.

I'm certainly interested in discussing the issue and approach. If your position is "we don't need this", that position has been heard.

start by exploring basic design questions, like "why can't this be in a library".

That question has been asked, and answered (as has the question of why intrinsics don't suffice).

@Centril Centril added the A-primitive label Apr 23, 2019

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

As I understand it, types natively supported by LLVM get handled directly rather than via intrinsics. I certainly want to see the intrinsics as well, but the point of this RFC is to work towards native support in LLVM (and future backends).

True, regular integer addition and such work without intrinsics, but AFAIK optimizations understand a lot of the intrinsics. Though, that is not a guarantee, especially with an introduction of a new type and new intrinsics that first need to get support from optimizations.

One thing still speaking for not making it a native type is that it's much easier to implement the fallback (for unsupported platforms) on a non-native type (since you just cfg stuff in the Rust source). There are some non-fun workarounds for u128 on unsupported platforms in rustc.

Assume for the moment the development of such support in LLVM. Would you agree that enabling such support in Rust would require native support, not just intrinsics and a crate?

Note that std::intrinsics::* do not need to correspond to llvm intrinsics, they can (and are in some cases) be lowered to native operations. The main reason I'm arguing for development outside rustc is development speed. Adding a new intrinsic to Rust is easy and can get merged fast. Modifying a basic type is more involved. Since bfloat16 will be very experimental by its very nature, I think just giving the support for the type in rustc but doing all the actual implementation in a crate does make sense.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

@joshtriplett

My position is that this RFC is a misuse of the RFC process, not that we don't need this, so I'm unsubscribing.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

As @oli-obk's described from various angles above, whether we present bfloat16 operations to Rust programmers as first-class operations on a new primitive type or as intrinsic does not make an iota of a difference for the optimization and code generation pipeline. And in any case, what LLVM IR we need to generate (e.g., fadd instruction or a call to something like @llvm.bfloat16.fadd) is ultimately not our decision but LLVM's. Whatever IR representation they choose for these operations, we can (and will) trivially generate it from either kind of source code representation.

So the decision should be solely based on source-level factors, such as ease of use and portability. @joshtriplett rightly points out that platform-specific intrinsics will be a burden for portability, and that the type can be useful for its compact memory footprint even if the target hardware doesn't specifically support it. So if we go with intrinsics in rustc, we may want to make them portable (just like many other existing intrinsics are portable). However, if we're going to commit to providing these intrinsics portably and on stable, then I don't see much justification for not provide the significant usability boost of a proper type with literals, arithmetic operators, etc. to users.

As pointed out before, such a type could be built in library code, even as a third party crate, on top of intrinsics. What's more, this library can be built even on platform-specific intrinsics, by judicious use of cfg and by implementing the "software fallback" (extend to f32, do operation in higher precision, and convert back to f16b) in library code. However, @joshtriplett seems to suggest it would be valuable for adoption to sell Rust as having "first-class" support for it, which would be aided by bundling it (even if it's "only" a library type placed in the prelude). I don't know how I feel about this but I want to note it, as a separate point from "importing a third party crate is a bother" which I don't think is significant if one uses Cargo.

Another downside of implementing the "extend to f32" fallback in library code is that I don't see how it could benefit from allowing intermediate results of higher precision at the optimizer's discretion (as proposed in #2686), which would eliminate some intermediate f16->f32->f16 roundtrips. I am not entirely sure how this will play out in practice though, because if we generate the software fallback sequence in the MIR -> LLVM IR step, then we will also largely miss opportunities for higher precision. I expect it will ultimately depend on how LLVM implements bfloat16 support.

We should also provide hardware intrinsics for platforms with native `bfloat16`
support. However, such intrinsics do not obviate the need for native code
generation support in the compiler. Intrinsics only allow code to directly
force the use of specific instructions, rather than supporting high-level

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 24, 2019

Member

I think @oli-obk may have covered this but I want to attach it to this line because it is wrong: intrinsics do not in general force the use of a particular instruction and in particular don't have to impede vectorization. Not only can Rust-level intrinsics expand to inline LLVM IR instruction sequences (e.g. as a radical example, arch::x86_64::_mm256_setzero_ps maps to the constant zeroinitializer in LLVM IR), even things that are intrinsics at the LLVM IR level can be detached from any particular target instruction and be as optimizable as anything else. For example, @llvm.sqrt can be constant folded, is subject to algebraic simplifications (tho most are only valid in fast-math mode), can be vectorized, etc. and any Rust program that calls f{32,64}::sin is using it.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

@joshtriplett

Aside from the above implementation considerations, I am still unsatisfied with the use cases.

bfloat16 is heavily used in matrix operations, especially in neural network implementations. That includes both writing kernels and moving data around, and for both purposes I'd like to have native support in Rust.

Moving data around doesn't need any operations, you can just as well use any 16 bit data type for the memory loads and stores. Passing scalars by value across FFI may run into ABI issues that can only be solved with some compiler support, but again no need for arithmetic, maybe just a different repr attribute.

As I mentioned at the start, I don't see why writing those kernels in Rust is important enough to justify adding a new primitive type to Rust, for three reasons:

  • Why would anyone want to write these kernels in Rust? They are not the kind of code where I could see Rust, or its ecosystem, bringing much to the table over C or Fortran.
  • Why would the Rust project particularly care to get people to write these kernels in Rust? It's a tiny scrap of the market, and the kernels are well isolated from the rest of the AI ecosystem by ML compiler backends and C ABI boundary (there is work on exposing kernel bodies for automatic kernel fusion, but bringing Rust into that space has almost entirely orthogonal requirements to expressing bfloat16 operations in Rust).
  • How many of these kernels would even be portable, scalar Rust? While there is the occasional parametrized scalar implementation that's subject to forced vectorization (+ auto-tuning for the parameters), in practice a whole lot of implementations are written exclusively for one target and writing them in assembly or with intrinsics is completely par for the course. That is not to say a higher level language wouldn't be useful (in fact, just weeks ago I was talking to someone working on such kernels in assembly wishing for C toolchain support). But for this domain, how useful is a portable type with portable scalar operations, vs. a suite of target-specific scalar and SIMD intrinsics (possibly wrapped in a library that provides operator overloads etc.)?
@dzamlo

This comment has been minimized.

Copy link

commented Apr 24, 2019

I have another suggestion, I'm not sure how much sense it makes:

Another approach would be to support this type via a crate, with a software emulation for the operations on a bfloat16. And then optimize them either in LLVM or as a MIR optimization.

So instead of exposing bfloat16 as the language/stdlib level, we make sure that operations on bfloat16 are optimized the same ways as if there were native type/intrinsics. I believe this could be implemented as "simple" peephole optimizations.

This would have advantage for backward/forward compatibility: we would not guarantee these optimizations and thus could drop them if the cost to maintain them is too big. Code that use them would still compile with older rustc version, it would just run slower.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

@rkruppe

Moving data around doesn't need any operations, you can just as well use any 16 bit data type for the memory loads and stores.

I don't want to make any assumptions about the ABI for passing a bfloat16 value to C or receiving it from C, such as it always matching the ABI for uint16_t. I could easily imagine a platform ABI that specifies a different register-based convention, just as some ABIs specify separate conventions for floating-point parameters versus integer parameters.

As you mentioned, that ABI could use a repr, but that seems like no less work to add to the compiler than adding a native type.

As @oli-obk's described from various angles above, whether we present bfloat16 operations to Rust programmers as first-class operations on a new primitive type or as intrinsic does not make an iota of a difference for the optimization and code generation pipeline.

So the decision should be solely based on source-level factors, such as ease of use and portability. @joshtriplett rightly points out that platform-specific intrinsics will be a burden for portability, and that the type can be useful for its compact memory footprint even if the target hardware doesn't specifically support it. So if we go with intrinsics in rustc, we may want to make them portable (just like many other existing intrinsics are portable). However, if we're going to commit to providing these intrinsics portably and on stable, then I don't see much justification for not provide the significant usability boost of a proper type with literals, arithmetic operators, etc. to users.

I think that we've been talking past each other in the entire discussion of "intrinsics", and I'm going to update the RFC with some clarifications. I was assuming, here, that the mentions of "intrinsics" referred exclusively to platform-specific intrinsics that map one-to-one to hardware instructions, rather than to something portable provided by LLVM that's subject to all the same optimizations as any a + b operation could be.

I was responding to the former, when I suggested that I don't think an entirely intrinsic-based implementation makes sense. To respond to the latter, I don't have any problem with using "portable intrinsics" provided by LLVM, as long as they result in a type that Rust and LLVM can optimize just as well as f32 and f64. I think that achieving the same level of integration as f32 and f64 would require lang items similar to those for f32 and f64, but I'd be happy to discover otherwise in the course of implementation. I'll adapt the RFC to make the high-level requirements more clear, and not constrain the implementation as much.

However, @joshtriplett seems to suggest it would be valuable for adoption to sell Rust as having "first-class" support for it, which would be aided by bundling it

A correction: this has nothing to do with marketing. This is about the usability and performance of having first-class support. I'm expecting that some parts of the implementation will require a lang item, which we can't use from a crate.

They are not the kind of code where I could see Rust, or its ecosystem, bringing much to the table over C or Fortran.

I don't expect Rust to bring less to the table. One of my goals is "there should never be a use case for which C is more capable or more usable than Rust".

Native support for bfloat16 will make Rust more capable than portable C in that area (though at some point I'd expect people to implement non-portable extension types for specific C compilers).

Why would the Rust project particularly care to get people to write these kernels in Rust? It's a tiny scrap of the market

I don't expect the use cases to be limited to those kernels; I listed them because they're a prominent use case. Imagine trying to list all the use cases for native floating-point support in Rust. It's a long list, and I'm expecting bfloat16 to apply to many of those.

But for this domain, how useful is a portable type with portable scalar operations, vs. a suite of target-specific scalar and SIMD intrinsics (possibly wrapped in a library that provides operator overloads etc.)?

I'd like to have both of those. I absolutely want all the target-specific intrinsics, but I'd also like to be able to write scalar code for automatic vectorization, offload engines, and similar. Currently, users of f32 and f64 have and use both options, and I expect the same to be true of f16b. I don't want it to be a self-fulfilling prophecy that we assume people will write target-specific assembly or intrinsics and thus create a world in which people's only reasonable option is to write target-specific assembly or intrinsics. :)

Also, I'm expecting a somewhat smoother spectrum, with target-specific intrinsics for specific hotspots, and higher-level Rust code for anywhere that can use higher-level Rust code. Productivity is an aspect of performance; the code you haven't had time to write yet is the slowest of all, with a throughput of 0.

@starkat99

This comment has been minimized.

Copy link

commented Apr 25, 2019

There's a PR to the half crate adding bfloat16 support to that crate, which I'm all for and plan to merge, I just haven't had an opportunity to really review the PR yet due to being swamped the past few weeks. You can see PR here: starkat99/half-rs#22

I think it is better for this to belong at the crate level rather than the language level, especially considering how sparse hardware support is. Exposing std::arch intrinsics for hardware that supports it and using when available at a crate level is the way to go.

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

I personally think that while binary16 support might be useful in Rust, bfloat16 seems like a bit niche. It might be something worth adding to std::arch, but certainly not as a built-in type in the language.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

I was asked to properly write my “concerns” here, so these are my concerns in the most constructive way I found to phrase them.

  • concern: llvm-bfloat16

This text claims that LLVM will add a native type for bfloa16. The claim is not supported in the text (e.g. no links to mailing list threads of people working or discussing this). The text should either support the claim or justify the native type in a universe where that never happens.

  • concern: unspecified-abi-issues

LLVM binary16 native type, f16, is exposed by clang as __fp16. This is not a standard type in C, and ABIs do not cover it (e.g. SysV), and clang errors if it's used in return type and argument type positions (compilation error).

I don't see any information or arguments suggesting that the ABI story for bfloat16 might be different. If such information exists, the text should include it, and also include prior-art about the ABI of similar types.

  • concern: incomplete-prior-art

The text does not mention how these problems are already solved in Rust (e.g. no mention of the half crate which provides a binary16 type or the TensorFlow crates which provide a bfloat16 type), in other languages like C++ or Julia (which provide bfloat16 as a user-defined type), and it does not link to previous relevant discussions (about C++, Julia, Rust, or LLVM).

The text prior-art section should be comprehensive.

  • concern: unsupported-claim-library-solution-impossible

The text claims that a library solution is impossible and proposes a solution based on the assumption that the claim is true.

Prior art in Rust (e.g. tensorflow-rs) and other languages (C++ Tensorflow, Julia) shows that it is indeed possible to implement bfloat16 in a library just fine.

If LLVM ever adds native intrinsics for bfloat16, we can always expose Rust intrinsics, e.g., in a stable core::bfloat16 module, that libraries can use, e.g., core::bfloat16::add(i16, i16) -> i16 or core::bfloat16::as_f32(i16) -> f32, etc. in a backward-compatible way.

If LLVM also adds a native type, some of these Rust intrinsics might need to, internally, and transparently to users, bitcast from i16 to bfloat16 and back. In my experience LLVM is good enough at removing useless bitcasts. This change would be backward-compatible and transparent to users.

LLVM might also add target-specific intrinsics for this type (e.g. AVX-512 extensions). Those would be exposed via core::arch anyways, and a library could just directly use them in.

Prior-art suggests that the ABI of such a type might just be unspecified. In Rust, this means that we can do whatever we want, as long as we make the type an improper C type - for struct bfloat16(i16); this is automatically the case. If we ever want to give the type a more defined ABI, we can always add a flag #[repr(bfloat16)] struct bfloat16(i16); in a backward-compatible way.

It has also been claimed that such a library implementation cannot be efficient, but that claim is not supported. AFAICT, the claim is incorrect. A library and the native type can be made to generate identical LLVM IR (e.g. by making the library call core::bfloat16 intrinsics). From the POV of optimizations, precision, and performance, I don't see why there should be a difference.

  • concern: lack-of-comparison-with-library-solution

Even if a library solution has no codegen drawbacks over a language-level solution, there are many trade-offs at play that might make one approach better than the other.

For example, a library solution can be more flexible (e.g. support different behaviors via cargo features or different libraries), iterates quicker, and has a smaller impact on the implementation, than a language level solution. A native type solution would properly support literals, casts, ... and would have a "marketing advantage", e.g., Rust has first class support for bfloat16.

The text does not explore these trade-off and it should critically do so. For example, the better marketing claim can be easily misunderstood as Rust not being expressive enough to implement the type as a user-defined type (like Julia and C++ do). If the library ends up being worse than the language level solution, which would be fine, the text should make the extra effort of identifying whether other language features could make the library solution good enough (e.g. user-defined literal and user-defined casts), since maybe pursuing those instead would have a larger impact on the ecosystem than adding new primitive types every time we hit those roadblocks.

  • concern: insufficiently-baked

I'm surprised that it was submitted in this form into the process. It feels insufficiently baked and would benefit from iteration in internals or an RFC issue to collect different use cases, alternatives, prior art, etc. If the tensorflow-rs implementation cannot be refactored as a library, it might make sense to implement this as a library in the mean time first to gain experience, and wait to see what LLVM does. If the library is not good enough due to other limitations in the language, one can attempt to address those.

Some comments have argued that the intent was to discuss the idea. IIUC the RFC process is not the right venue for that, but that might be changing. If that was the intent, the text should have instead focus on discussing the problem that needs solving (e.g. how are Rust programs that are already using bfloat16 solving this problem), include a comprehensive review of prior-art, and enumerate the alternatives discussed here, and try to analyze their trade-offs. Such a text might lend itself better to exploring the design space, than just proposing a concrete solution without much rationale, prior art, or alternatives.

@comex

This comment has been minimized.

Copy link

commented Apr 25, 2019

Potential compromise: add f16b as a primitive type, but not in the prelude (it would have to be imported from somewhere in core), and without a dedicated literal suffix (you'd have to rely on inference).

That way you still get the benefits of a primitive type, e.g. the implementation can be more straightforward, and ABI concerns can be handled in the compiler where they belong, but it wouldn't take up any room in the global namespace.

That would still commit the Rust compiler and stdlib to forever have a notion of bfloat16, which is a valid concern. But a hypothetical core::bfloat16 set of intrinsics, as proposed by @gnzlbg, would do the same, assuming it would be implemented on all platforms rather than just those with hardware support. core::bfloat16 might be a smaller API and thus slightly less of an implementation burden, but not by much, I think. The biggest burden would probably be implementing software emulation for bfloat16s, which wouldn't be needed as long as Rust is only targeting LLVM (assuming LLVM intrinsics will be added as people have suggested – because Rust can piggyback off the LLVM intrinsics, and for const evaluation, use those intrinsics in the compiler itself), but would be needed when porting Rust to a backend without native bfloat16 support. But that would be the case for both the primitive approach and the core::bfloat16 approach.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

@comex

but it wouldn't take up any room in the global namespace.

Which problem does this solve?

The biggest burden would probably be implementing software emulation for bfloat16s,

The main advantage of the intrinsics is that people can just use a bfloat16 library today, and if LLVM (or some other backend) ever adds functionality for this type, we can just easily expose that as it pops up and retrofit the library type to use it.

The intrinsics could be portable, but I don't think that this would be a constraint, and there wouldn't be much value in doing that since a library would already contain a fallback implementation (so why duplicate that in the compiler?). That is, the library would be in charge of choosing intrinsics depending on target, codegen backend, etc. and using them correctly (all this can be done with cfgs at compile-time).

If we end up implementing intrinsics for all bfloat16 operations for most targets and backends, then arguably a primitive type would have a much stronger motivation then. But if that ever happens, we could add a primitive type, and refactor the library to just use a type alias.. or.. not - maybe the library solution is just good enough, and adding a primitive type then wouldn't really add much value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.