New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Un-inline float intrinsics into runtime. #694
Un-inline float intrinsics into runtime. #694
Conversation
@bollu What about the https://github.com/leanprover/lean4/blob/master/src/Init/Data/UInt.lean? We have a bunch o |
@leodemoura Yes, they create problems as well. I'd be happy to modify this patch to replace all of these in one shot, or send separate patches for each component with |
I was also stumbling into this issue while coding an LLVM emitter for Lean. I don't think extracting them into runtime functions is a good idea. For instance, in LLVM, linking to an external function and performing a Instead, if possible, I would suggest instead replacing the Alternatively, one way to resolve this problem without modifying Lean or parsing C expression is to special case these definitions by matching on the name itself (i.e., |
I feel floating In the medium to long term, I quite like the intrinsics solution, since it means that these intrinsics will be documented, which makes it easy to judge what a new backend needs to support to get the LEAN runtime up and running. I am not too fond of special-casing on the names, since it feels like working with a contract written in magic string constants with no stability guarantees. We also lose the nice benefit of exhaustiveness checking for pattern matching on intrinsics. |
I am sympathetic to avoiding 'inline' definitions and instead just calling the runtime library. I see two benefits:
AFAIU GHC has 100s of primops which makes writing a new backend rather painful. I somehow like the fact that backend development could be incremental. First, we just use the lean runtime always. Second, we inline the definitions from the lean runtime for performance (or emit the corresponding IR directly). Then, we optimize the most performance critical functions by emitting custom IR for them that outperforms the IR clang emits. |
We had loads of conversations about doing intrinsics. Long-term it would be desirable to express the intrinsics in lean directly somehow. But it entails defining a whole new c-like language to define them and it is not easy. Because you'd have to define a language that's sufficiently c-like that we can emit it using the c-emitter and then using the LLVM backend. The The low-level embedded DSL approach is more desirable as a long-term solution. |
Right, long-term an embedded DSL would be great. This reminds me very much of Terra+Lua: https://terralang.org/ AFAIU Siddharth had an idea to build something similar, but this seems orthogonal to this very bug report. |
orthogonal, yes. I just wanted to be transparent about the thinking we have done and about what the outcome was. |
I fail to to see how it would be possible to not re-implement the intrinsic in each (completely new) backend. Most of the current intrinsics (i.e., An exception to this case would be examples like @tobiasgrosser's where the idea is only to incrementally modify Lean's current backend. In which case, one may want to fall back on the existing implementation of the intrinsic. But that what would be just as possible with the
How would the intrinsic attribute make the chance for bugs likelier or harder to debug? The only difference I see would be that, instead of specifying the relevant C code at the declaration, it is specified in the emitter. For example, the current definition of def emitExternCall (f : FunId) (ps : Array Param) (extData : ExternAttrData) (ys : Array Arg) : M Unit :=
match getExternEntryFor extData `c with
| some (ExternEntry.inline _ pat) => do emit (expandExternPattern pat (toStringArgs ys)); emitLn ";"
| -- ... where emitLn f"{argToCString ys[0]} + {argToCString ys[1]};" With the intrinsic attribute, one would instead have def emitExternCall (f : FunId) (ps : Array Param) (extData : ExternAttrData) (ys : Array Arg) : M Unit :=
match getExternEntryFor extData `c with
| some (ExternEntry.intrinsic `uint64_add) => emitLn f"{argToCString ys[0]} + {argToCString ys[1]};"
| -- ... which is essentially equivalent. If anything, I would think it would be easier to debug it would be no longer necessary to support arbitrary C injections (which can introduce all kinds of interesting bugs). |
@tydeu thank you for your comment. I am curious to understand the difference between your intrinsic ids as well as the current extern c references to the lean runtime. As far as I understand both assign a string to refer to some external functionality. The difference seems to be that the extern c ships a default implementation of this functionality in the lean runtime while the intrinsic annotation requires a backend to emit the relevant definitions inline. Are there situations and reasons where we would prefer to not have a default implementation available? I guess your argument is that some functionality would be prohibitively slow so one should require a backend to implement it natively? Is this understandijg correct? |
@tobiasgrosser in my view, there are a few reasons:
|
@tydeu the kinds of intrinsics the dsl is mostly meant for are the ref counts, memory layout, unboxing, etc. Not the 1-line ones. But rather the 10-20 line ones which are inexpressible in lean. |
@DanielFabian I was just using you as inspiration. I didn't mean to imply that that was what you meant by the DSL. XD |
@tydeu my comment was responding to how the dsl can help re-implementing for all backends. The non-trivial intrinsics, like ref counting or memory layout do change, albeit sparingly. But I certainly remember a fix in thunks not too long back. If they are implemented in the DSL, they'll make it to all backends. If they are implemented in the backend itself, each backend will have the bug if we find one or will have to make similar changes when we want to change the memory layout. Also, keeping 3-4 code bases vs. 1 in sync is harder. E.g. adding support for structs (value types, unboxed types, however you want to call them) is something we are considering a bit further out. This'll almost certainly entail changing the memory layout. And then, the DSL vs intrinsics in the backend would help. |
By 'backend' are we talking about the Lean C++ implementation, the IR, or the C emitter? I was focusing on alternate emitters, which I don't think would be heavily effected by such changes. |
I did mean the emitter, yes. Because the DSL would be a new low-level IR that's below lean's usual IR. But it would be simple, it would maybe know loops (not even sure of that), it would know pointers, and maybe 2-3 more concepts. Now C and LLVM would have to emit code for the low-level IR. OTOH, you'd be able to write intrinsics in lean (using the DSL). |
yes, they would. Because e.g. ref counting has to be written in C++ right now, since it's inexpressible in lean. |
Regarding different emitters, we already have support for them at the import Lean
@[extern c inline "#1 + #2 * 2" llvm "llvm_foo" javascript "jfoo"]
constant foo (x y : Nat) : Nat
open Lean
def test : MetaM Unit := do
IO.println (getExternNameFor (← getEnv) `llvm `foo)
#eval test
-- some llvm_foo There are other APIs for extracting the As far as I understand, @bollu is developing support for LLVM without writing a new emitter. That is, he is reusing the C emitter. @bollu Could you confirm? |
@leodemoura I'm writing a new I was unaware that one can declare custom My feeling is that the runtime intrinsics ought not to depend on the These problems are avoided if we have a first-class concept of an intrinsic (as @tydeu suggests) which the backends |
If you are building a new emitter, this is the preferred way.
I don't think this is a problem. The annotation is just a string. The emitter can do whatever it wants with it, even creating fresh names.
I am sorry. I don't have cycles to discuss new features (e.g., @[extern "f_impl"]
constant f : Nat → Nat The I wish we had more cycles to explore all these exciting possibilities, but right now our top priority is the Mathlib port, bugs, and missing features. |
It might make sense to implement those as a "lean_rt" MLIR dialect, that way the lowering can be implemented relatively easily in MLIR and there is a clear separation of concerns, in particular one doesn't have to invent a whole new C like intermediate language. |
@cpehle The MLIR dialect defined at https://github.com/bollu/lz/blob/e9c5c0e65975637cb3b3d675afedf69b4a7b2b60/lib/Hask/HaskDialect.cpp#L45-L73 may be of interest. The file name is a misnomer, the MLIR dialect has a combination of |
@bollu Sorry for the delay. Thanks again for the PR. @Kha and I discussed it today. We want to propose the following:
Then, we will |
Yes, this works perfectly. I imagine we'll do the same for |
@@ -1757,6 +1762,7 @@ static inline uint64_t lean_name_hash(b_lean_obj_arg n) { | |||
static inline double lean_float_div(double a, double b) { return a / b; } | |||
static inline double lean_float_negate(double a) { return -a; } | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spurious whitespace change?
You shouldn't need to update stage 0 here, I think |
This is necessary to support backends such as LLVM which do not emit C.
bd4f624
to
13a8412
Compare
Right. I've rebased and fixed the spurious change to |
!bench |
Here are the benchmark results for commit 13a8412. Benchmark Metric Change
================================================
+ stdlib C code generation -2% (-27.8 σ) |
* outline all intrinsics into the runtime. This is necessary to support backends such as LLVM which do not emit C. * fix style
This patch extracts out floating-point functions in the core library which are marked as
@[extern c inline ...]
. This aids the development of new backends such as an LLVM, since the new backend can now link against the corresponding runtime function, instead of having to parse and code-generate arbitrary C expressions.If this PR is of interest, I shall send a sequence of PRs that applies the same transformation to the rest of the
src/Init/*
library.