-
Notifications
You must be signed in to change notification settings - Fork 688
[move-compiler][move-stdlib] Switch to vector bytecode instructions #37
Conversation
cc @emmazzz |
Those ops are in fact not supported, both prover and EVM. I also never liked them ;) Anycase, adding those wont be a triviality, and we first need to do this before we merge this. Team is currently heads down for preparing a demo end of the week, so this may take a week or longer. |
9a2c4f9
to
a8e97b6
Compare
@wrwg @emmazzz I tried to add support to the stackless bytecode for the vector operations but have failed. For the test Test
Generated output
|
@tnowacki This test seems to be for the stackless bytecode interpreter which I'm not familiar with. I don't see why your changes would cause writebacks not to be generated though, which happens in move-prover/bytecode/src/memory_instrumentation.rs. cc @meng-xu-cs who authored the interpreter How did you generate the output by the way? Is it possible the output was generated before memory_instrumentation? |
The interpreter uses the stackless bytecode at the end of the transformation pipeline, after the With that said, I have the same guess as @emmazzz, that is, the output was generated before the If this can be confirmed, the cause of this test failure is probably not due to |
This PR does seem to include some changes to the interpreter to recognize the vector bytecodes, in move-prover/interpreter/src/concrete/player.rs. Do they look good to you @meng-xu-cs ? Are some additional changes needed? |
@meng-xu-cs, the output you see here is from my own debug printing during |
} | ||
Ok(()) | ||
let ty = convert_model_base_type(self.target.global_env(), _ty_arg, &self.ty_args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the _ty_arg
is actually used here, it can be called ty_arg
I guess?
Ah, thanks for pointing out the changes! I did not notice that before and the changes to |
Hmm... if this is coming from |
Confirmed that the The end-of-pipeline dump can be produced by
Now my guess is that somewhere in |
This is more fundamental... The whole move-prover pipeline needs to be updated to catch the new set of bytecodes, including
and possibly more... I would like to consult @wrwg on what would be the best way forward. Failing the interpreter test is just the tip of the iceberg. I guess that any Move code uses |
Yeah, that is what I thought also initially. Plus as long as DPN isn't back and full prover tests are in place based on it, this will be very risky. I wonder why we are doing this (or have done part of it). What is the advantage of having opcodes instead of well-known native functions? The later concept is quite common in compilers. The only reason I can imagine is that code with opcodes for vector ops is potentially more compact. As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them. So those will be als represented via native functions. I can imagine at some point we may want to add special compiler support for maps/sets, and then we are back to square one, that is having to treat native functions as well-known to the compiler anyway. |
The goal is to have bytecode operations for anything that is a primitive type in Move. So the signer and vector native functions are being replaced by bytecode operations. @meng-xu-cs started this process last fall, but was unable to finish the migration before leaving. This PR is trying to finish that migration for the compiler at least. I think I have an idea of moving this forward with less invasive changes to the prover, let me play around with it |
Understand the goal and history but what is the rationale for doing this? Why not stay with what we have right now? As I said above there are likely additional types not only vector which may get compiler support. Using well-known functions is a common pattern in compilers, there is nothing wrong with it. |
The functions still exist in the stdlib at the source level, they just don't exist at the bytecode level. The "well known functions" are the way we communicate the signature information of these bytecode operations to the source language. It is just tricky for the prover as the functions will no longer exist in the compiled modules |
@wrwg, my idea was to just fake it back to a function call in stackless bytecode generation, but I am having trouble doing that as I don't know how to get Id/function symbols for the functions. Maybe this new approach won't be doable? |
In addition to other reasons mentioned, another factor is performance. Move calls are quite expensive compared to bytecode instructions--a call requires a new stack frame, new locals, moving data from the caller's stack into the callee's locals, and so on. This isn't just theoretical--one project we spoke to was considering using Move and was benchmarking against WASM, Python, and a few other VM's. One of the benchmarks they were using was fannkuch, which requires a lot of vector code. Move was slow on this benchmark, but it got ~2x faster when we hacked the compiler to use vector operations instead of calls. Of course, we should look at improving the performance of calls and the VM in general as well, but using the vector bytecodes is an easy win that's available to us in the short term. |
Do you say this because you are worried about opcode space, or because there may not be enough consensus in the Move community to add these as opcodes? On the opcode space point, I took a quick look at where we are today. There are 71 bytecode instructions. 8 of them are vector instructions. Let's assume maps, sets, and tables require 10 instructions each. That puts us at around 100 instructions, which still leaves us with a good amount of room to grow the instruction set. Not that we should necessarily go in this direction, but I think we're in an ok position w.r.t opcode space if we want to go that route. |
Would keeping them around at the bytecode level allow us to move forward with this without changing the prover? |
There is no reason for this inefficiency. A well-known function can be just treated like an opcode by the VM. This is really about the bytecode representation not how the tools handle the bytecode. |
No because the prover verifies bytecode not the AST. Indeed we can get the prover eventually support these bytecodes, I just find this change mostly unnecessary (the change in the VM and bytecode representation in the first place). We are just spending cycles on something with little value, IMHO. Plus I prefer the design of an IR where everything is just a function, based on my experience. (Notice that in stackless bytecode, +/- etc. are all just functions.) The map/set example is just there to show that you will most likely never get rid of well-known functions completely. So if the argument would have been to avoid well-known functions, I don't think it will work. |
Wouldn't this imply that every data structure would then benefit by having opcodes made for it? What about data structures we haven't built yet (minheaps, bsts, linked lists). This could result in somewhat stifled innovation. If there is a way to improve performance of known functions (sounds like function inlining?), that would seem like a generalized win. |
Taking a step back here, and this is probably something we should discuss at the Thursday meeting. As the Move team, the philosophy we have historically held is that "core" types and values of the language should be ground types in the bytecode and should have supporting bytecode operations. This drove the addition of I can hear that you, Wolfgang, do not agree with this philosophy and would have preferred not to add these bytecodes. But a team decision was made to add these bytecodes, and they have already been designed and implemented. We can have a discussion about whether to reverse this decision and eliminate the bytecodes (and I would welcome this discussion), but I don't think the right way to make these decisions is to hold up PR's that are trying to leverage language features that already exist. I don't think it makes sense for us to add these bytecodes and then never use them. Let me be clear that expressing concerns about the migration path for the prover and tooling is totally legitimate + a good reason to hold up a PR until we can come up with a plan to keep the prover working. But this is a separate issue from disagreeing with a decision that has already been made. As Move increasingly gets pulled in multiple directions by its various stakeholders, I think we will often have to be comfortable with disagreeing and committing, including working together to implement/support features we don't like. Otherwise, I think it will be too hard for us to make progress. Let me also say that no path forward on this PR commits us to adding (or not adding) new ground types and/or bytecodes for future collections like maps, tables, sets. The historical philosophy is about making the bytecode operations available for existing "core" ground types--we may or may not choose to expand the set with new collection types in the future. |
Totallly agree, @sblackshear . However, this PR is not hold up in any way. There is no approval or similar this PR is waiting for, it's additional work needed which blocks this PR, namely upgrading all code which depends on stackless bytecode and move-model to the new opcodes (prover, stackless interpreter, EVM). In this context, I brought up whether it is worth to do this work. A possible technical way forward with this PR could be to modify the stackless bytecode generator to re-introduce the vector operations for the new bytecodes. @tnowacki already mentioned that he knows a way forward, and I guess that is what he is referring to. The other option would be to first land a PR which refactors prover & co to new bytecodes, which I or Emma could look at, however, we have currently higher priority at Meta to consolidate the EVM work, so for this we may need a little longer. In any case, while we are trying to make a call what to do here, I thought it's worthwhile to discuss how strongly do we really need this. Perhaps such discussions should go to an issue instead of put on a PR. (Will remember next time.) |
c3cfbb2
to
9ad3af3
Compare
@wrwg: For the stackless bytecode, I hacked in the translation where the vec bytecode gets translated back into the call (for the native function). (note: it has the unfortunate downside of needing to keep around the native functions in the Vector module, but we can tackle that at a later date). I had to make a peculiar change to the prover's borrow analysis, maybe you could take a look? I feel like it is indicative of something else going slightly wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to stackless bytecode look ok. Some notes on the PR. I'm unclear where this goes supposed once we would have the stackless bytecode upgraded. Will the function declarations go away and hardcoded into the compiler?
@@ -555,6 +556,12 @@ pub mod known_attributes { | |||
ExpectedFailure, | |||
} | |||
|
|||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | |||
pub enum NativeAttribute { | |||
// It is a fake native function that actually compiles to a bytecode instruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a fake? Where would you get the typing information from without?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "fake" is used because the design is the functions won't exist in the compiled module.
Normally, a native
function still produces an entry in function definitions, just with no code.
For this PR, they are still in the compiled module. After the stackless bytecode gets full support, we will add the filtering.
if let Some(callee_an) = | ||
let native_an; | ||
let callee_an_opt = if callee_env.is_native() { | ||
native_an = native_annotation(callee_env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you need to do this, but before the native function was visited by the process
function which initialized the borrow annotation, so it could be retrieved here without special casing. Is the "fake" native function not longer appearing in the list of module functions, and thus process
never called? This would explain why you need to do this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the native function is still in the module functions. But maybe it isn't getting marked as a dependency somehow?
Is that process done somewhere else?
I guess what I'm trying to say is I suspect that Vector::borrow_mut
isn't being visited beforehand. And I do not know what spot of code would be causing that to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg, I'll take your approval as a sign that you are fine with this for now. Let me know if you want me to fix it. Also happy to do so after this lands, if that is easier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I still think it's better to keep all those guys well-known functions instead of turning them into bytecodes, I'm OK to merge this. I can come back to this and turn the functions into Operation::VectorBorrow
etc. this will then resolve this issue here.
@@ -13,31 +13,39 @@ module Std::Vector { | |||
/// The index into the vector is out of bounds | |||
const EINDEX_OUT_OF_BOUNDS: u64 = 0; | |||
|
|||
#[bytecode_instruction] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the plan here to eventually remove this annotation, and that is why call it 'fake'? If not, I would consider to use a different notion than 'fake', because then it would be by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation exists purely for devex. It is there so if someone is looking at the compiled module, they are not confused as to why these functions aren't there.
No. The function declarations will always exist for tooling and for other devex experiences. |
- Add annotation for native functions that are bytecode instructions - Add compiler support for swapping those annotated functions to known bytecode instructions
…tecode instructions
…ctor bytecode instructions
a4c7ea5
to
8e546a0
Compare
…ove-language#37) * [move-compiler][move-stdlib] Switch to vector bytecode instructions - Add annotation for native functions that are bytecode instructions - Add compiler support for swapping those annotated functions to known bytecode instructions * fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions * fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions * fixup! fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions * fixup! fixup! fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions
Motivation
Test Plan