New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add a text describing runtimes and their requirements #5881

Merged

near-bulldozer merged 6 commits into master from nagisa/runtime-md

Jan 4, 2022

Collaborator

nagisa commented Dec 17, 2021 •

edited

Loading

It is nice to agree on and have a document describing our requirements and priorities and criteria that we evaluate the runtimes based on and try to maintain.

This PR adds a document that attempts to primarily document our requirements and criteria. While I also added some text describing two backends we have considered, I wouldn't consider those sections to be the primary focus of this PR. Instead I think it would make sense to update them in a follow up alongside the discussion on the frontend options we've considered.


          Add a text describing runtime choice(s)

e313e5d

nagisa requested review from matklad and olonho as code owners

December 17, 2021 13:49

Collaborator Author

nagisa commented Dec 17, 2021

I haven't yet proof-read this and there's definitely a lot that can be added to this still (e.g. comparison between the frontents (wasmer vs wasmtime), but I appreciate all nitpicks ^^

matklad reviewed

View reviewed changes

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated

Comment on lines 24 to 30

+              A VM implementation must, first and foremost, implement the Wasm specification precisely in order
+              for it to be considered correct. Any deviations from the specification would render an
+              implementation incorrect.
+              In addition to this, the NEAR protocol adds a requirement that all executions of a Wasm program
+              must be deterministic. In other words, it must not be possible to observe different execution
+              results based on the environment within which the VM runs.

Contributor

matklad Dec 17, 2021

Yeah, I think determinism is more important than strict adherence to the spec. Roughly, if the runtime implements WASM incorrectly in some edge case, that makes devx worse, but is something we can rectify in a protocol upgrade. Like what we did with wasmer0 -> wasmer2 upgrade.

In contrast, non-determinism would be a critical issue, as it'll break consensus.

That is, it's bad, but acceptable, if all the nodes come to the same wrong answer.

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated

Comment on lines 101 to 103

+              On a scale of trade-offs, runtime performance is probably one of the less important metrics. Slower
+              execution of a contract doesn't make NEAR protocol unsound as an idea, whereas something like a
+              non-deterministic execution outcome would.

Contributor

matklad Dec 17, 2021

wants/needs framing from https://apenwarr.ca/log/?m=202110 is good here. performance is a "want" for us, but the critical one. I also see in theory that we can trade perf for correctness. Eg, if we find out that not adhering to wasm sepc in some cases can make the code ten times faster, that would actually be a strong reason to break the spec. Hope we won't have to make such calls though.

Actually, I'd say that for us the order of priorities, if we have to make hard choices is:

// Needs:
* security
* reliability (== our confidence in security)
// Wants:
* performance
* correctness (adherence to wasm spce)

Collaborator Author

nagisa Dec 17, 2021 •

edited

Loading

Can we really afford correctness to be just a “want”? Wasm is a target for optimizing compilers, and they can definitely make optimizations with an underlying presumption that the generated code will be ran according to the spec. With a non-conformant runtime it can take very innocuous deviation in behaviour before e.g. a wrong branch is taken, unauthorized user initiates a token transfer, etc.

Contributor

matklad Dec 17, 2021

🤔 that's a very good point actually, and something I didn't consider before. To give a specific example: wasmer0 has this fun property that some memory accesses wrap around, rather than trap (toroidal virtual memory). I thought that that's a weird quirk, but not too bad, as it doesn't violate security. But this reasoning is wrong. While indeed modulo arithmetic for addresses doesn't allow the contract to escape sandbox, it may make the smart-contracts themselves exploitable.

runtime/near-vm-runner/RUNTIMES.md Outdated

+              operating systems it would be a huge boon to the development experience if the runtime used also
+              supported these other targets.
+              ## Runtime performance

Contributor

matklad Dec 17, 2021

Let's maybe split this into two primary metrics we care about:

latency to run a simple function in a big contract (rationale: many contracts are relatively simple in terms of business logic and do little compute)
throughput to run a relatively long wasm computation as fast as possible (rationale: at the same time, there are some compute-heavy contracts, especially those that run interpreters in wasm)

Contributor

matklad Dec 17, 2021

In general, I feel like our runtime model has three parts to it:

compilation (read wasm, compile to machine code, write machine code to disk)
linking/loading (load machine code from disk, load it into the executable memory, unload it after execution)
actual execution

Collaborator Author

nagisa Dec 20, 2021

I attempted rewording the sections around performance to introduce more detailed view of the requirements and the parts that go into it.

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md

+              soon-to-be replaced regalloc algorithm is currently `O(n²)`.
+              One detail where Cranelift may fall short is in the ability to produce super-optimized machine code
+              sequences for hot operations such as gas counting.

Contributor

matklad Dec 17, 2021

I feel something's missing here... Especially, my main worry with cranelift is that it'd be hard to estimate costs for generated code. Basically, with singlepass we somewhat confidente that the compiler isn't being overly smart. With cratelift, it feels like it could happen that we are benchmarking happy case.

matklad requested a review from jakmeier

December 17, 2021 15:28


          Slight wording updates

051002e

jakmeier reviewed

View reviewed changes

runtime/near-vm-runner/RUNTIMES.md Outdated

+              operating systems it would be a huge boon to the development experience if the runtime used also
+              supported these other targets.
+              ## Runtime performance

Contributor

jakmeier Dec 17, 2021

I think there is another important point to be mentioned about performance: Predictability

Essentially, our blockchain is a real-time system, as a transaction that consumes e.g. 1TGas should finish within 1ms. Optimizing performance for a RT system is a completely different engineering problem than optimization otherwise, since in RT, it is always the worst-case that counts.

This might sound like something that's only relevant to the preparation cost. But I would argue it is just as important for the generated code. For example, an optimization which tries to be smart with arrangement of more/less likely branches could be problematic for us, for several reasons that pop to my mind:

It doesn't improve worst-case performance but creates extra compiler overhead.
Gaining some extra performance on average doesn't help us a bit in terms of overall system performance, since we have to be conservative in how many transactions go in a block anyway.
It makes it harder for us to estimate a realistic worst-case execution time. As the average time per branch goes down, we are even tempted to lower the gas fee parameters for it. Which can lead to subtle security vulnerabilities which are potentially hard to spot.

Collaborator Author

nagisa Dec 20, 2021

I tried to incorporate a paragraph about this.


          Wording improvements

bc26d5e

nagisa changed the title ~~Add a text describing runtimes~~ Add a text describing runtimes and their requirements

bowenwang1996 reviewed

View reviewed changes

Collaborator

bowenwang1996 left a comment

This is great! It provides a very good overview of the problems we face and our priorities. One question: it seems that we did not cover wasmtime in the document. Is that intentional?

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md

+              * correctness – does the runtime do what it claims to do;
+              * reliability – how much confidence is there in the implementation of runtime;
+              * platform support – does the runtime support targets that we want to target;
+              * performance – how quickly can the runtime execute the requested operations;

Collaborator

bowenwang1996 Dec 21, 2021

This is interesting. What is reason for putting platform support above performance?

Collaborator Author

nagisa Dec 21, 2021

I reasoned about this in #5881 (comment). In short, I think our requirements to support x86_64-linux bring this above performance. But given that the question comes up a second time I'm wondering if either the description is lacking or this really should be less important compared to performance.

Contributor

olonho Dec 22, 2021

Guess we need to define "requested operations".

Collaborator Author

nagisa Dec 22, 2021

I want to avoid defining specifics in this list of criteria as the requirements section that comes after is better suited for elaboration in this aspect.

Re-reading the section on the performance requirements below I think it does list the kinds of operations we tend to ask a runtime to execute, but I do agree that the wording does not necessarily make the connection obvious.

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md

Comment on lines +129 to +132

+              An estimate of the deployment cost will typically involve a function which uses the size of the
+              input Wasm code as its primary input. Such a function can only exist if we have a good knowledge of
+              the runtime’s time complexity properties are known. For our purposes a linear or `O(n log n)`
+              relationship between the input size and execution time is the highest we can accept.

Collaborator

bowenwang1996 Dec 21, 2021

Actually, what is the reason why a quadratic algorithm is absolutely unacceptable? @matklad

Contributor

matklad Dec 21, 2021

The max size of the contract is 4mb, so we can have up to roughly 10^6 "things" (functions, instruction in a single function, totall call depth, etc). 10^6 ^ 2 would be prohibitively expensive to compute.

That being said, we could accept wore time complexities if we reflect them in the cost model. For example, if we now that our register allocation is, eg, quadratic in the number of local variables, we can charge quadratic cost for that. Non-linear cost models feel like the can of worms though.

Collaborator Author

nagisa Dec 21, 2021

I think it could be feasible to utilize quadratic algorithms for select stages of our runtime if we did meter and charge for those stages separately from everything else. That way the linear operations with larger constant factors would be isolated from the quadratic stuff with potentially small factors and either approach would be reasonably feasible.

That said, since we're trying to estimate fees from the input wasm code, establishing the relationship between input wasm and the work an algorithm such as regalloc would do would probably involve a fair amount of tessomancy.

runtime/near-vm-runner/RUNTIMES.md

Comment on lines +140 to +143

+              When executing a tiny function part of a larger contract these operations will dominate and
+              contribute greatly to the observed latency of the contract execution. These overheads contribute to
+              the fees paid by anybody using the protocol, making any unnecessary overhead a potential roadblock
+              in NEAR protocol's adoption.

Collaborator

bowenwang1996 Dec 21, 2021

N00b question: is the overhead here fundamentally unavoidable, i.e, is there some way to only deserialize part of the compiled wasm module to reduce the cost when only a tiny function is invoked?

Collaborator Author

nagisa Dec 21, 2021

In theory it is possible to only load the functions necessary for execution by constructing a call graph. In practice a non-negligible number of contracts will utilize the call_indirect instruction which makes this sort of analysis much more complicated.

An alternative could be to load the machine code into memory lazily, as it is accessed. This is exactly what operating systems like Linux do when they execute a program.

runtime/near-vm-runner/RUNTIMES.md Outdated Show resolved Hide resolved

runtime/near-vm-runner/RUNTIMES.md Outdated

Comment on lines 183 to 185

+              implementation of the `wasmer-singlepass` codegen. The global state is definitely a source of
+              potential spooky action at a distance problems where changing code generation of a specific
+              instruction affects correctness and behaviour of another one.

Collaborator

bowenwang1996 Dec 21, 2021

I have trouble parsing this sentence. Is there a typo somewhere?

Collaborator Author

nagisa Dec 21, 2021

Gave a shot at rewording this.

nagisa added 2 commits

December 21, 2021 13:55


          Wording


          s/validator/protocol/ where makes sense

01dd5b2

olonho reviewed

View reviewed changes

runtime/near-vm-runner/RUNTIMES.md

+              NEAR protocol. Some of the criteria are already listed in the [FAQ] document and this document
+              gives a more thorough look. Listed roughly in the order of importance:
+              * security – how well does the runtime deal with untrusted input;

Contributor

olonho Dec 22, 2021

"how well" is somewhat vague, we'd better define what are exact criteria here: i.e. have ability to limit resource consumption, trap on incorrect input, etc.

Collaborator Author

nagisa Dec 22, 2021

Right the requirement section lists some of the… requirements with regards to security but I definitely missed e.g. resource consumption aspect. Will add that in.

olonho reviewed

View reviewed changes

runtime/near-vm-runner/RUNTIMES.md

+              * security – how well does the runtime deal with untrusted input;
+              * correctness – does the runtime do what it claims to do;
+              * reliability – how much confidence is there in the implementation of runtime;
+              * platform support – does the runtime support targets that we want to target;

Contributor

olonho Dec 22, 2021

Guess this sums up to "Linux/x86 for development and deployment and macOS arm/x64 for development".

Collaborator Author

nagisa Dec 22, 2021

Correct, this is also described in the Plaform Support section within Requirements.

matklad approved these changes

View reviewed changes


          Merge branch 'master' into nagisa/runtime-md

63602e7

matklad added the S-automerge label

near-bulldozer bot merged commit 4166c15 into master

near-bulldozer bot deleted the nagisa/runtime-md branch

January 4, 2022 11:50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet