Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store/deploy execution traces #602

Closed
igormcoelho opened this issue Feb 23, 2019 · 6 comments
Closed

Store/deploy execution traces #602

igormcoelho opened this issue Feb 23, 2019 · 6 comments
Labels
discussion Initial issue state - proposed but not yet accepted feature Type: Large changes or new features vm New features that affect the Neo Virtual Machine or the Interop layer

Comments

@igormcoelho
Copy link
Contributor

igormcoelho commented Feb 23, 2019

This proposal is very abstract now, but perhaps can be useful in two aspects: (1) provide a good way of achieving parallelism on existing contracts, being Native or not (2) allow users to deploy of Native Contracts or parts of it ("Execution Traces" or "Native Functions").
The idea is to store the Execution Trace over an existing (or perhaps new) contract. Over an execution trace, besides being deterministic, all branch decisions should also be pre-determined, meaning that NeoVM/NeoContract will know exactly which opcodes to be executed, only varying some input parameters. The scripthash of the trace could consist of the set of operations executed on the contract (for specific calls of interest), and a binary could be automatically generated and cached on nodes, after trace creation is requested (perhaps someone could pay a price to "deploy" that, don't know yet...). The price payed by the user executing the trace could be the same as executed without it (the only advantage is faster execution), or a discounted/fixed price to incentive this model.

Consider this example (if x is more than 10, double it, else subtracts one):

        public static int Twice(int x) {
            return x*2;
        }
        
        public static int Minus1(int x) {
            return x-1;
        }
        
        public static int Main(int x) {
            if(x > 10) return Twice(x);
            else return Minus1(x);
        }

It has the following opcodes:

51c56b6c766b00527ac46a00c35aa1630e006a00c361651200616c75666a
00c361651a00616c756651c56b6c766b00527ac46a00c35295616c756651
c56b6c766b00527ac46a00c35194616c7566
scripthash (0x8c2074abcdba9fec8a7559e6d1448fa64eda0d3d)

Meaning that, if x = 30 (and all x>10), only a few opcodes will be used (asterisk are removed):

51 PUSH1  # The number 1 is pushed onto the stack.
c5 NEWARRAY  #
6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
76 DUP  # Duplicates the top stack item.
6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
00 PUSH0  #An empty array of bytes is pushed onto the stack
52 PUSH2  # The number 2 is pushed onto the stack.
7a ROLL  # The item n back in the stack is moved to the top.
c4 SETITEM  #
6a DUPFROMALTSTACK  #
00 PUSH0  #An empty array of bytes is pushed onto the stack
c3 PICKITEM  #
5a PUSH10  # The number 10 is pushed onto the stack.
a1 LTE  # Returns 1 if a is less than or equal to b, 0 otherwise.
63 JMPIF 0e00 # 14     <============== THIS IS JUMP FOR ELSE CASE <= 10 (ALWAYS FAILS ON THIS TRACE)
6a DUPFROMALTSTACK  # <========== CODE ALWAYS COMES HERE ON THIS TRACE
00 PUSH0  #An empty array of bytes is pushed onto the stack
c3 PICKITEM  #
61 NOP  # Does nothing.
65 CALL 1200 # 18 <============== THIS IS INVOCATION FOR CASE x>10 (ALWAYS CALLED)
61 NOP  # Does nothing. 
6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
75 DROP  # Removes the top stack item.
66 RET  # <============ ALWAYS FINISHES HERE
* 6a DUPFROMALTSTACK  #    <============= THIS PART IS UNUSED BECAUSE OF ELSE
* 00 PUSH0  #An empty array of bytes is pushed onto the stack
* c3 PICKITEM  #
* 61 NOP  # Does nothing.
* 65 CALL 1a00 # 26
* 61 NOP  # Does nothing.
* 6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
* 75 DROP  # Removes the top stack item.
* 66 RET  #
51 PUSH1  # <========= THIS IS THE IMPLEMENTATION OF Twice FUNCTION (ALWAYS EXECUTES)
c5 NEWARRAY  #
6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
76 DUP  # Duplicates the top stack item.
6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
00 PUSH0  #An empty array of bytes is pushed onto the stack
52 PUSH2  # The number 2 is pushed onto the stack.
7a ROLL  # The item n back in the stack is moved to the top.
c4 SETITEM  #
6a DUPFROMALTSTACK  #
00 PUSH0  #An empty array of bytes is pushed onto the stack
c3 PICKITEM  #
52 PUSH2  # The number 2 is pushed onto the stack.
95 MUL  # a is multiplied by b.
61 NOP  # Does nothing.
6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
75 DROP  # Removes the top stack item.
66 RET  # <====== ALWAYS GO TO THIS PART TOO (END OF Twice FUNCTION)
* 51 PUSH1  # <= THIS PART IS ALSO UNUSED BECAUSE OF ELSE
* c5 NEWARRAY  #
* 6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
* 6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
* 76 DUP  # Duplicates the top stack item.
* 6b TOALTSTACK  # Puts the input onto the top of the alt stack. Removes it from the main stack.
* 00 PUSH0  #An empty array of bytes is pushed onto the stack
* 52 PUSH2  # The number 2 is pushed onto the stack.
* 7a ROLL  # The item n back in the stack is moved to the top.
* c4 SETITEM  #
* 6a DUPFROMALTSTACK  #
* 00 PUSH0  #An empty array of bytes is pushed onto the stack
* c3 PICKITEM  #
* 51 PUSH1  # The number 1 is pushed onto the stack.
* 94 SUB  # b is subtracted from a.
* 61 NOP  # Does nothing.
* 6c FROMALTSTACK  # Puts the input onto the top of the main stack. Removes it from the alt stack.
* 75 DROP  # Removes the top stack item.
* 66 RET  #

So, on this example, trace would be: 51c56b6c766b00527ac46a00c35aa1630e006a00c36165120051c56b6c766b00527ac46a00c35295616c7566616c7566 (scripthash: 0xcab571c009a140cc7a0879592bcd3003ca3601fc).
Note that trace is smaller than avm, but if code contained loops, these loops could make trace even bigger than original avm. If any branch fails the follow the expected sequence, the trace fails and throws.

This way, we could possibly store the sequence of used opcodes as a Native Contract (or "Native Function" of a contract), and the user provides the desired trace during invocation.
Deploy Trace example: Neo.Contract.DeployTrace Contract_scripthash + trace_operations => Neo.Contract.DeployTrace 0x8c2074abcdba9fec8a7559e6d1448fa64eda0d3d 51c56b6c766b00527ac46a00c35aa1630e006a00c36165120051c56b6c766b00527ac46a00c35295616c7566616c7566.
Invocation could be the same (passing trace scripthash as execution hash): PUSH30, APPCALL 0xcab571c009a140cc7a0879592bcd3003ca3601fc (instead of APPCALL 0x8c2074abcdba9fec8a7559e6d1448fa64eda0d3d). Note that PUSH9, APPCALL 0xcab571c009a140cc7a0879592bcd3003ca3601fc would fail because of unexpected jumps (as x = 9 <= 10).

Advantage: parallel code can explore the trace the predict conflicts and if no storage-write is used, all traces free of storage-put could be done in batch parallel.
Cons: The only extra overhead is the storage of these traces, so it would be important to have a price on that (but so much less than a regular contract...).

@igormcoelho
Copy link
Contributor Author

@erikzhang erikzhang added the discussion Initial issue state - proposed but not yet accepted label Feb 23, 2019
@shargon
Copy link
Member

shargon commented Feb 25, 2019

Why we don't add the allowed contracts calls in a manifest (like permissions), for ensure that the VM know the possible relationships between contracts before the execution?

@erikzhang
Copy link
Member

erikzhang commented Feb 25, 2019

@shargon #287

Permission

The permissions field is an array containing a set of Permission objects. It describes which contracts may be invoked and which methods are called.

The definition of the Permission object is as follows:

{
  "contract": "hash | group | *",
  "methods": [] | "*"
}

The contract field indicates the contract to be invoked. It can be a hash of a contract, a public key of a group, or a wildcard *.

If it specifies a hash of a contract, then the contract will be invoked; If it specifies a public key of a group, then any contract in this group will be invoked; If it specifies a wildcard *, then any contract will be invoked.

The methods field is an array containing a set of methods to be called. It can also be assigned with a wildcard *. If it is a wildcard *, then it means that any method can be called.

If a contract invokes a contract or method that is not declared in the manifest at runtime, the invocation will fail.

@shargon
Copy link
Member

shargon commented Feb 25, 2019

So.. we should have enough information to know the relationship with the permission file.

@jsolman
Copy link
Contributor

jsolman commented Feb 25, 2019

parallel code can explore the trace the predict conflicts and if no storage-write is used, all traces free of storage-put could be done in batch parallel.

Code analysis is one way to try to determine what can be run in parallel; however, the solution needs to be able to support executing contracts that are using storage-get and storage-put. As long as there is guaranteed not to be any storage collisions it should execute them still in parallel. This is where the permission manifest should provide the necessary information.

Having a solution that runs parts of a contract sequentially and other parts of a contract in parallel is more complex and likely shouldn't be in the initial design. What do you think?

Perhaps we should discuss byte code / trace analysis as a mechanism to support parallel contract execution separately from adding support for a lightweight contract deployment in the form of an execution trace, since the two things shouldn't need to be coupled together, should they?

@igormcoelho
Copy link
Contributor Author

igormcoelho commented Feb 26, 2019

I fully agree that the permission system is very fundamental, top priority, mainly from security aspects, and much less from parallel computing perspective. So, this discussion here is not meant to replace any existing proposal, but to complement some of them, if necessary. Going back to the permission system, I personally don't see how it will help us developing parallel code, because most of the information that will be provided to us can currently be extracted from contract bytecode. For example, if contract supports dynamic invoke, we could "know it" because of two things: (1) it payed for it (2) it has some APPCALL 000...000 somewhere. Regarding static invokes, we could also "know it", because it will have somewhere: APPCALL scripthash. I emphasize "know it", because it's not 100% sure just to analyse opcodes, we would need to analyse all possible execution traces to see if some opcodes are unreachable, otherwise it would provide us false positives on static invokes (but even that we can detect in most situations). So, taking aside these static invokes false positives, we could already (and easily) provide an "upper bound" on the current capabilities of all existing contracts on the network just by analysing the opcodes (and I'm pretty confident this will match the expected manifests for these contracts, unless people already deployed really messed up contracts). The only thing is nearly impossible to do right now is to predict dynamic invoke possibilities, so this is a fundamental contribution of the manifest system, but parallel challenges are hard already even if only static invokes are provided.

Having this said, with manifest or not, we may already know, for most situations, which contracts could be invoked from others. Question is: how to explore these properties for parallel execution?

This proposal tries to tackle efficient exploration of situations where a common code is often executed, including static invokes, with full parallelization (but storage-write-free). I agree that this is a huge flaw on the proposal (not having storage writes on the traces), because this would be useless for NEP-5, which is the most interesting scenario. In my opinion, this is also the hardest part (not the external contract invocation parts), which I've thought a lot but haven't yet found a good solution.

The only thought I've had so far, which solves Native NEP-5 parallelization (but it's hard for other cases), is whitelisting storage writes. This way, user should provide a list of storage keys that are affected by its operation, so the transaction execution engine will know in advance which keys are affected, so non-conflicting transactions could be run in parallel. On the other hand, this will work on Native NEP-5, but will not work on contracts that have unpredictable results (depending on block height, random values and other info), which is a common use-case for blockchain too. Another problem is that, if too many keys are affected, a longer transaction would be issued (because of longer whitelisting list) and processing time could also increase for tx scheduler.

@lock9 lock9 added feature Type: Large changes or new features vm New features that affect the Neo Virtual Machine or the Interop layer labels Aug 12, 2019
@lock9 lock9 added this to Backlog in NEO 3 Releases Aug 12, 2019
Thacryba pushed a commit to simplitech/neo that referenced this issue Feb 17, 2020
Thacryba pushed a commit to simplitech/neo that referenced this issue Feb 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Initial issue state - proposed but not yet accepted feature Type: Large changes or new features vm New features that affect the Neo Virtual Machine or the Interop layer
Projects
Development

No branches or pull requests

5 participants