Skip to content
This repository has been archived by the owner on May 4, 2024. It is now read-only.

[move-compiler][move-stdlib] Switch to vector bytecode instructions #37

Merged
merged 5 commits into from
May 11, 2022

Conversation

tnowacki
Copy link
Member

  • Add annotation for native functions that are bytecode instructions
  • Add compiler support for swapping those annotated functions to known bytecode instructions
  • Used source files instead of interface files for source language tests
  • Stopped interface files from shadowing fully compiled deps

Motivation

  • The native bytecode operations have been there for a while, but have been unused

Test Plan

  • Existing tests, just swapping out behavior

@tnowacki
Copy link
Member Author

@wrwg and @vgao1996, looks like I'm hitting some prover and EVM test failures. Maybe the vector bytecode operations weren't fully supported?

This is isn't urgent, but let me know how to resolve these test failures

@vgao1996
Copy link
Member

cc @emmazzz

@wrwg
Copy link
Member

wrwg commented Apr 18, 2022

@wrwg and @vgao1996, looks like I'm hitting some prover and EVM test failures. Maybe the vector bytecode operations weren't fully supported?

This is isn't urgent, but let me know how to resolve these test failures

Those ops are in fact not supported, both prover and EVM. I also never liked them ;) Anycase, adding those wont be a triviality, and we first need to do this before we merge this. Team is currently heads down for preparing a demo end of the week, so this may take a week or longer.

@tnowacki tnowacki force-pushed the bytecode-instruction branch 2 times, most recently from 9a2c4f9 to a8e97b6 Compare May 2, 2022 18:49
@tnowacki tnowacki requested a review from emmazzz May 2, 2022 18:49
@tnowacki
Copy link
Member Author

tnowacki commented May 2, 2022

@wrwg @emmazzz I tried to add support to the stackless bytecode for the vector operations but have failed.

For the test property/vector.move the write_backs aren't being generated for following calls to vec_push_back. But they are for other tests (I think). Any idea what I need to do to fix this?

Test

module 0x2::A {
    use Std::Vector;

    #[test]
    public fun check_vector() {
        let a = Vector::empty();
        let b = Vector::empty();
        spec {
            assert len(a) == 0;
            assert a == vec();
        };

        Vector::push_back(&mut a, 42u128);
        Vector::push_back(&mut b, 0u128);
        spec {
            assert len(a) == 1;
            assert a == vec(42);
            assert a[0] == 42;
            assert contains(a, 42);
            assert index_of(a, 42) == 0;
            assert update(b, 0, 42) == a;
            assert concat(a, b) == concat(vec(42), vec(0));
            assert in_range(a, 0);
            assert !in_range(b, 2);
        };
    }
}

Generated output

$t0 := vec_pack_0<u128>()
trace_local[a]($t0)
$t1 := vec_pack_0<u128>()
trace_local[b]($t1)
assert Eq<num>(Len<u128>($t0), 0)
assert Eq<vector<u128>>($t0, EmptyVec<u128>())
$t2 := borrow_local($t0)
$t3 := 42
vec_push_back<u128>($t2, $t3)
// MISSING WRITE BACK?
trace_local[a]($t0)
$t4 := borrow_local($t1)
$t5 := 0
vec_push_back<u128>($t4, $t5)
// MISSING WRITE BACK?
trace_local[b]($t1)
// ALL OF THESE FAIL AS $t0 IS NOT UPDATED
assert Eq<num>(Len<u128>($t0), 1)
assert Eq<vector<u128>>($t0, SingleVec<u128>(42))
assert Eq<u128>(Index($t0, 0), 42)
assert ContainsVec<u128>($t0, 42)
assert Eq<num>(IndexOfVec<u128>($t0, 42), 0)
assert Eq<vector<u128>>(UpdateVec<u128>($t1, 0, 42), $t0)
assert Eq<vector<u128>>(ConcatVec<u128>($t0, $t1), ConcatVec<u128>(SingleVec<u128>(42), SingleVec<u128>(0)))
assert InRangeVec<u128>($t0, 0)
assert Not(InRangeVec<u128>($t1, 2))
label L1
return ()

@emmazzz
Copy link
Member

emmazzz commented May 3, 2022

For the test property/vector.move the write_backs aren't being generated for following calls to vec_push_back. But they are for other tests (I think). Any idea what I need to do to fix this?

@tnowacki This test seems to be for the stackless bytecode interpreter which I'm not familiar with. I don't see why your changes would cause writebacks not to be generated though, which happens in move-prover/bytecode/src/memory_instrumentation.rs. cc @meng-xu-cs who authored the interpreter

How did you generate the output by the way? Is it possible the output was generated before memory_instrumentation?

@meng-xu-cs
Copy link
Collaborator

The interpreter uses the stackless bytecode at the end of the transformation pipeline, after the memory_instrumentation pass so if a WriteBack is produced, it should be there.

With that said, I have the same guess as @emmazzz, that is, the output was generated before the memory_instrumentation pass.

If this can be confirmed, the cause of this test failure is probably not due to WriteBack not being instrumented. Instead, it is because the interpreter does not support vector bytecodes. Some extra work needs to be done to make the interpreter recognizing and interpreting vector bytecodes (similar to Move VM). I am happy to author a PR for this. In the meanwhile, feel free to disable this test case to unblock.

@emmazzz
Copy link
Member

emmazzz commented May 3, 2022

This PR does seem to include some changes to the interpreter to recognize the vector bytecodes, in move-prover/interpreter/src/concrete/player.rs. Do they look good to you @meng-xu-cs ? Are some additional changes needed?

@tnowacki
Copy link
Member Author

tnowacki commented May 3, 2022

@meng-xu-cs, the output you see here is from my own debug printing during player.rs. So it should be after the memory_instrumentation pass?

}
Ok(())
let ty = convert_model_base_type(self.target.global_env(), _ty_arg, &self.ty_args);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the _ty_arg is actually used here, it can be called ty_arg I guess?

@meng-xu-cs
Copy link
Collaborator

This PR does seem to include some changes to the interpreter to recognize the vector bytecodes, in move-prover/interpreter/src/concrete/player.rs. Do they look good to you @meng-xu-cs ? Are some additional changes needed?

Ah, thanks for pointing out the changes! I did not notice that before and the changes to player.rs look good to me!

@meng-xu-cs
Copy link
Collaborator

@meng-xu-cs, the output you see here is from my own debug printing during player.rs. So it should be after the memory_instrumentation pass?

Hmm... if this is coming from player.rs (for example, as an execution trace), it should be after the memory_instrumentation pass and the WriteBack should be there. Let me pull this locally and test.

@meng-xu-cs
Copy link
Collaborator

Confirmed that the WriteBack are not there...

The end-of-pipeline dump can be produced by

cd language/move-prover
cargo run -- interpreter-testsuite/tests/concrete_check/property/vector.move -d ../move-stdlib/sources/Vector.move -a Std=0x1 --dump-bytecode
cat vector.move_17_mono_analysis.bytecode
public fun A::check_vector() {
     var $t0|a: vector<u128>
     var $t1|b: vector<u128>
     var $t2: &mut vector<u128>
     var $t3: u128
     var $t4: &mut vector<u128>
     var $t5: u128
  0: $t0 := vec_pack_0<u128>()
  1: trace_local[a]($t0)
  2: $t1 := vec_pack_0<u128>()
  3: trace_local[b]($t1)
  4: assert Eq<num>(Len<u128>($t0), 0)
  5: assert Eq<vector<u128>>($t0, EmptyVec<u128>())
  6: $t2 := borrow_local($t0)
  7: $t3 := 42
  8: vec_push_back<u128>($t2, $t3)
// MISSING WRITE BACK
  9: trace_local[a]($t0)
 10: $t4 := borrow_local($t1)
 11: $t5 := 0
 12: vec_push_back<u128>($t4, $t5)
// MISSING WRITE BACK
 13: trace_local[b]($t1)
 14: assert Eq<num>(Len<u128>($t0), 1)
 15: assert Eq<vector<u128>>($t0, SingleVec<u128>(42))
 16: assert Eq<u128>(Index($t0, 0), 42)
 17: assert ContainsVec<u128>($t0, 42)
 18: assert Eq<num>(IndexOfVec<u128>($t0, 42), 0)
 19: assert Eq<vector<u128>>(UpdateVec<u128>($t1, 0, 42), $t0)
 20: assert Eq<vector<u128>>(ConcatVec<u128>($t0, $t1), ConcatVec<u128>(SingleVec<u128>(42), SingleVec<u128>(0)))
 21: assert InRangeVec<u128>($t0, 0)
 22: assert Not(InRangeVec<u128>($t1, 2))
 23: label L1
 24: return ()
}

Now my guess is that somewhere in memory_instrumentation needs to be updated. I'll take some quick exploration there.

@meng-xu-cs
Copy link
Collaborator

This is more fundamental...

The whole move-prover pipeline needs to be updated to catch the new set of bytecodes, including

EliminateImmRefsProcessor
MutRefInstrumenter
ReachingDefProcessor
BorrowAnalysisProcessor
MemoryInstrumentationProcessor

and possibly more...

I would like to consult @wrwg on what would be the best way forward. Failing the interpreter test is just the tip of the iceberg. I guess that any Move code uses Std::Vector will not pass the prover...

@wrwg
Copy link
Member

wrwg commented May 6, 2022

This is more fundamental...

The whole move-prover pipeline needs to be updated to catch the new set of bytecodes, including

EliminateImmRefsProcessor
MutRefInstrumenter
ReachingDefProcessor
BorrowAnalysisProcessor
MemoryInstrumentationProcessor

and possibly more...

I would like to consult @wrwg on what would be the best way forward. Failing the interpreter test is just the tip of the iceberg. I guess that any Move code uses Std::Vector will not pass the prover...

Yeah, that is what I thought also initially. Plus as long as DPN isn't back and full prover tests are in place based on it, this will be very risky.

I wonder why we are doing this (or have done part of it). What is the advantage of having opcodes instead of well-known native functions? The later concept is quite common in compilers. The only reason I can imagine is that code with opcodes for vector ops is potentially more compact.

As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them. So those will be als represented via native functions. I can imagine at some point we may want to add special compiler support for maps/sets, and then we are back to square one, that is having to treat native functions as well-known to the compiler anyway.

@tnowacki
Copy link
Member Author

tnowacki commented May 6, 2022

As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them

The goal is to have bytecode operations for anything that is a primitive type in Move. So the signer and vector native functions are being replaced by bytecode operations. @meng-xu-cs started this process last fall, but was unable to finish the migration before leaving. This PR is trying to finish that migration for the compiler at least.

I think I have an idea of moving this forward with less invasive changes to the prover, let me play around with it

@wrwg
Copy link
Member

wrwg commented May 6, 2022

As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them

The goal is to have bytecode operations for anything that is a primitive type in Move.

Understand the goal and history but what is the rationale for doing this? Why not stay with what we have right now? As I said above there are likely additional types not only vector which may get compiler support. Using well-known functions is a common pattern in compilers, there is nothing wrong with it.

@tnowacki
Copy link
Member Author

tnowacki commented May 6, 2022

Using well-known functions is a common pattern in compilers, there is nothing wrong with it.

The functions still exist in the stdlib at the source level, they just don't exist at the bytecode level. The "well known functions" are the way we communicate the signature information of these bytecode operations to the source language.

It is just tricky for the prover as the functions will no longer exist in the compiled modules

@tnowacki
Copy link
Member Author

tnowacki commented May 6, 2022

As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them

The goal is to have bytecode operations for anything that is a primitive type in Move. So the signer and vector native functions are being replaced by bytecode operations. @meng-xu-cs started this process last fall, but was unable to finish the migration before leaving. This PR is trying to finish that migration for the compiler at least.

I think I have an idea of moving this forward with less invasive changes to the prover, let me play around with it

@wrwg, my idea was to just fake it back to a function call in stackless bytecode generation, but I am having trouble doing that as I don't know how to get Id/function symbols for the functions. Maybe this new approach won't be doable?

@sblackshear
Copy link
Member

I wonder why we are doing this (or have done part of it). What is the advantage of having opcodes instead of well-known native functions? The later concept is quite common in compilers. The only reason I can imagine is that code with opcodes for vector ops is potentially more compact.

In addition to other reasons mentioned, another factor is performance. Move calls are quite expensive compared to bytecode instructions--a call requires a new stack frame, new locals, moving data from the caller's stack into the callee's locals, and so on.

This isn't just theoretical--one project we spoke to was considering using Move and was benchmarking against WASM, Python, and a few other VM's. One of the benchmarks they were using was fannkuch, which requires a lot of vector code. Move was slow on this benchmark, but it got ~2x faster when we hacked the compiler to use vector operations instead of calls.

Of course, we should look at improving the performance of calls and the VM in general as well, but using the vector bytecodes is an easy win that's available to us in the short term.

@sblackshear
Copy link
Member

As we move forward and have maps, sets, tables, we will not be able to add opcodes for each of them.

Do you say this because you are worried about opcode space, or because there may not be enough consensus in the Move community to add these as opcodes?

On the opcode space point, I took a quick look at where we are today. There are 71 bytecode instructions. 8 of them are vector instructions. Let's assume maps, sets, and tables require 10 instructions each. That puts us at around 100 instructions, which still leaves us with a good amount of room to grow the instruction set. Not that we should necessarily go in this direction, but I think we're in an ok position w.r.t opcode space if we want to go that route.

@sblackshear
Copy link
Member

The functions still exist in the stdlib at the source level, they just don't exist at the bytecode level. .

Would keeping them around at the bytecode level allow us to move forward with this without changing the prover?

@wrwg
Copy link
Member

wrwg commented May 8, 2022

I wonder why we are doing this (or have done part of it). What is the advantage of having opcodes instead of well-known native functions? The later concept is quite common in compilers. The only reason I can imagine is that code with opcodes for vector ops is potentially more compact.

In addition to other reasons mentioned, another factor is performance. Move calls are quite expensive compared to bytecode instructions--a call requires a new stack frame, new locals, moving data from the caller's stack into the callee's locals, and so on.

This isn't just theoretical--one project we spoke to was considering using Move and was benchmarking against WASM, Python, and a few other VM's. One of the benchmarks they were using was fannkuch, which requires a lot of vector code. Move was slow on this benchmark, but it got ~2x faster when we hacked the compiler to use vector operations instead of calls.

Of course, we should look at improving the performance of calls and the VM in general as well, but using the vector bytecodes is an easy win that's available to us in the short term.

There is no reason for this inefficiency. A well-known function can be just treated like an opcode by the VM. This is really about the bytecode representation not how the tools handle the bytecode.

@wrwg
Copy link
Member

wrwg commented May 8, 2022

The functions still exist in the stdlib at the source level, they just don't exist at the bytecode level. .

Would keeping them around at the bytecode level allow us to move forward with this without changing the prover?

No because the prover verifies bytecode not the AST.

Indeed we can get the prover eventually support these bytecodes, I just find this change mostly unnecessary (the change in the VM and bytecode representation in the first place). We are just spending cycles on something with little value, IMHO. Plus I prefer the design of an IR where everything is just a function, based on my experience. (Notice that in stackless bytecode, +/- etc. are all just functions.)

The map/set example is just there to show that you will most likely never get rid of well-known functions completely. So if the argument would have been to avoid well-known functions, I don't think it will work.

@davidiw
Copy link
Collaborator

davidiw commented May 8, 2022

Wouldn't this imply that every data structure would then benefit by having opcodes made for it? What about data structures we haven't built yet (minheaps, bsts, linked lists). This could result in somewhat stifled innovation. If there is a way to improve performance of known functions (sounds like function inlining?), that would seem like a generalized win.

@sblackshear
Copy link
Member

Taking a step back here, and this is probably something we should discuss at the Thursday meeting.

As the Move team, the philosophy we have historically held is that "core" types and values of the language should be ground types in the bytecode and should have supporting bytecode operations. This drove the addition of vector and signer to the ground types, and the addition of vector-related bytecode operations. There was a plan for signer-related bytecodes to follow, but no one has gotten around to this yet.

I can hear that you, Wolfgang, do not agree with this philosophy and would have preferred not to add these bytecodes. But a team decision was made to add these bytecodes, and they have already been designed and implemented. We can have a discussion about whether to reverse this decision and eliminate the bytecodes (and I would welcome this discussion), but I don't think the right way to make these decisions is to hold up PR's that are trying to leverage language features that already exist. I don't think it makes sense for us to add these bytecodes and then never use them.

Let me be clear that expressing concerns about the migration path for the prover and tooling is totally legitimate + a good reason to hold up a PR until we can come up with a plan to keep the prover working. But this is a separate issue from disagreeing with a decision that has already been made. As Move increasingly gets pulled in multiple directions by its various stakeholders, I think we will often have to be comfortable with disagreeing and committing, including working together to implement/support features we don't like. Otherwise, I think it will be too hard for us to make progress.

Let me also say that no path forward on this PR commits us to adding (or not adding) new ground types and/or bytecodes for future collections like maps, tables, sets. The historical philosophy is about making the bytecode operations available for existing "core" ground types--we may or may not choose to expand the set with new collection types in the future.

@wrwg
Copy link
Member

wrwg commented May 9, 2022

We can have a discussion about whether to reverse this decision and eliminate the bytecodes (and I would welcome this discussion), but I don't think the right way to make these decisions is to hold up PR's that are trying to leverage language features that already exist

Totallly agree, @sblackshear . However, this PR is not hold up in any way. There is no approval or similar this PR is waiting for, it's additional work needed which blocks this PR, namely upgrading all code which depends on stackless bytecode and move-model to the new opcodes (prover, stackless interpreter, EVM). In this context, I brought up whether it is worth to do this work.

A possible technical way forward with this PR could be to modify the stackless bytecode generator to re-introduce the vector operations for the new bytecodes. @tnowacki already mentioned that he knows a way forward, and I guess that is what he is referring to. The other option would be to first land a PR which refactors prover & co to new bytecodes, which I or Emma could look at, however, we have currently higher priority at Meta to consolidate the EVM work, so for this we may need a little longer. In any case, while we are trying to make a call what to do here, I thought it's worthwhile to discuss how strongly do we really need this. Perhaps such discussions should go to an issue instead of put on a PR. (Will remember next time.)

@tnowacki tnowacki force-pushed the bytecode-instruction branch 2 times, most recently from c3cfbb2 to 9ad3af3 Compare May 9, 2022 19:15
@tnowacki
Copy link
Member Author

tnowacki commented May 9, 2022

@wrwg: For the stackless bytecode, I hacked in the translation where the vec bytecode gets translated back into the call (for the native function). (note: it has the unfortunate downside of needing to keep around the native functions in the Vector module, but we can tackle that at a later date).

I had to make a peculiar change to the prover's borrow analysis, maybe you could take a look? I feel like it is indicative of something else going slightly wrong.

Copy link
Member

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to stackless bytecode look ok. Some notes on the PR. I'm unclear where this goes supposed once we would have the stackless bytecode upgraded. Will the function declarations go away and hardcoded into the compiler?

language/move-compiler/src/naming/fake_natives.rs Outdated Show resolved Hide resolved
@@ -555,6 +556,12 @@ pub mod known_attributes {
ExpectedFailure,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub enum NativeAttribute {
// It is a fake native function that actually compiles to a bytecode instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a fake? Where would you get the typing information from without?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "fake" is used because the design is the functions won't exist in the compiled module.
Normally, a native function still produces an entry in function definitions, just with no code.

For this PR, they are still in the compiled module. After the stackless bytecode gets full support, we will add the filtering.

if let Some(callee_an) =
let native_an;
let callee_an_opt = if callee_env.is_native() {
native_an = native_annotation(callee_env);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you need to do this, but before the native function was visited by the process function which initialized the borrow annotation, so it could be retrieved here without special casing. Is the "fake" native function not longer appearing in the list of module functions, and thus process never called? This would explain why you need to do this here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the native function is still in the module functions. But maybe it isn't getting marked as a dependency somehow?
Is that process done somewhere else?

I guess what I'm trying to say is I suspect that Vector::borrow_mut isn't being visited beforehand. And I do not know what spot of code would be causing that to happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg, I'll take your approval as a sign that you are fine with this for now. Let me know if you want me to fix it. Also happy to do so after this lands, if that is easier :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I still think it's better to keep all those guys well-known functions instead of turning them into bytecodes, I'm OK to merge this. I can come back to this and turn the functions into Operation::VectorBorrow etc. this will then resolve this issue here.

@@ -13,31 +13,39 @@ module Std::Vector {
/// The index into the vector is out of bounds
const EINDEX_OUT_OF_BOUNDS: u64 = 0;

#[bytecode_instruction]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the plan here to eventually remove this annotation, and that is why call it 'fake'? If not, I would consider to use a different notion than 'fake', because then it would be by design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation exists purely for devex. It is there so if someone is looking at the compiled module, they are not confused as to why these functions aren't there.

@tnowacki
Copy link
Member Author

Will the function declarations go away and hardcoded into the compiler?

No. The function declarations will always exist for tooling and for other devex experiences.
Ideally we should have very few builtin functions/operations in the compiler, as having them written down somewhere like normal code is always a win for devex.
For some things in Move, (e.g. + or == or freeze or borrow_global) our type system lacks the expressivity to declare those somewhere as module functions.
For some other things, namely assert! it would be nice to stuff that in a module somewhere once we have macros of some form.

- Add annotation for native functions that are bytecode instructions
- Add compiler support for swapping those annotated functions to known bytecode instructions
@tnowacki tnowacki merged commit 96c2f62 into move-language:main May 11, 2022
@tnowacki tnowacki deleted the bytecode-instruction branch May 11, 2022 23:14
villesundell pushed a commit to villesundell/move that referenced this pull request Jul 7, 2022
…ove-language#37)

* [move-compiler][move-stdlib] Switch to vector bytecode instructions

- Add annotation for native functions that are bytecode instructions
- Add compiler support for swapping those annotated functions to known bytecode instructions

* fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions

* fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions

* fixup! fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions

* fixup! fixup! fixup! fixup! [move-compiler][move-stdlib] Switch to vector bytecode instructions
brson pushed a commit to brson/move that referenced this pull request Apr 27, 2023
brson pushed a commit to brson/move that referenced this pull request Jul 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants