New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

bench: Add design specs for ZK WASM benchmarking infrastructure #182

Closed

mooori wants to merge 4 commits into near:main from mooori:benchmark-design

mooori commented Jan 5, 2024 •

edited

Loading

Outlines the benchmarking infrastructure to be built for ZK WASM.

mooori added 2 commits

January 5, 2024 12:36


          Setup Design.md

ee7c1ca


          Add first draft

40b859b

aborg-dev mentioned this pull request

docs: infra #183

Closed


          Make draft ready for review

8594ec0

mooori marked this pull request as ready for review

January 9, 2024 16:45

mooori requested review from aborg-dev and MCJOHN974

January 9, 2024 16:45

Author

mooori commented Jan 10, 2024

@krlosMata this PR adds the benchmarking infrastructure design doc which was discussed in the last sync meeting. (Github unfortunately only allows requesting reviews from members of the repo’s org, so I couldn’t request a review from you.)

aborg-dev suggested changes

View reviewed changes

aborg-dev left a comment

I like how well this document summarizes the requirements that we have. I think it would still be useful to go a bit deeper into how we are going to achieve these goals. Ideally, it should be possible for someone to read this document and implement the benchmarking infrastructure without having to make any large design decisions.

I've left a few comments with examples of big questions I would still have if I tried to implement this design now.

cranelift/zkasm_data/benchmarks/Design.md


		## Different hardware

		Some benchmarks of other ZK systems are run on different hardware. ZK WASM might run benchmarks on the hardware used by other systems as well to allow for a more complete comparison. This work is lower priority and rather unlikely to be realized in the first iteration on the benchmarking infrastructure.

aborg-dev Jan 10, 2024

Another reason why running benchmarks on different hardware doesn't make sense for us right now is that we're not directly measuring the proving time, but instead tracking a proxy for it which is the number of VM cycles, and this number is independent from the hardware in use.

cranelift/zkasm_data/benchmarks/Design.md


		## Instrumentation

		Instrumentation is based on logging zkASM instructions by calling Javascript from zkASM. The indirection via Javascript is required as `zkevm-proverjs` executes compiled PIL instead of zkASM. The code for zkASM instruction logging might be inserted to zkASM by enabling a compilation flag or in post processing.

aborg-dev Jan 10, 2024

Let's try to expand this section with a bit more detail on:

How would a sample raw instruction log look like (e.g. which fields we need to track)
JS helpers that we plan to add to facilitate this tracking (e.g. their signature and what they are supposed to do)
Changes to ZKASM that are needed to facilitate this tracking

This all depends on the visualizations that we have in mind. We can start with simple tables, but eventually, I think we'll need to use something more visual like CPU flamegraphs to reason about large programs. So I would have that target in mind when answering the questions above.

Author

mooori Jan 12, 2024

Added a proposal for a visualization schema in 8c2bcf0. I would follow up with details on the other points once we have specified the visualization schema (as you mentioned, the other points depend on that).

cranelift/zkasm_data/benchmarks/Design.md Show resolved Hide resolved

cranelift/zkasm_data/benchmarks/Design.md

+              - zkASM instructions and the number of cycles required for their execution.
+              - The number of cycles required for benchmarks across different points in the git commit history.
+              ## Relating zkASM instructions to cycles

aborg-dev Jan 10, 2024

What exactly are the instructions that we are going to count? Consider the following ZKASM line from ZKEVM ROM:

$               :ADD, MSTORE(SP++), JMP(readCode)

This is a valid code that pipelines the execution of 3 different primitive instructions in sequence to use fewer cycles and intermediate registers. I would expect our codegen to use this feature heavily in the future.
How will we track this? Do we canonicalize this into a single compound instruction? Or do we track it as 3 separate instructions?

Author

mooori Jan 12, 2024

Do we canonicalize this into a single compound instruction? Or do we track it as 3 separate instructions?

I think both might be interesting, depending on the context of the analysis for which benchmarking is used. Therefore I would suggest to collect data to enable both and provide a UI or CLI switch to choose whether an instruction list is accounted for as a single compound instruction or as separate instructions.


          Propose a concrete visualization for *Analysis*

8c2bcf0

aborg-dev reviewed

View reviewed changes

cranelift/zkasm_data/benchmarks/Design.md


		### Format

		If feasible, visualizations should be SVG files, again taking inspiration from flame graphs. To avoid cluttering the graph, `<count>` is initially hidden and revealed for a rectangle on hovering.

aborg-dev Jan 12, 2024

Do you have any thoughts on how we can produce such SVGs for the profile? Are there any existing tools we can reuse for this?

aborg-dev reviewed

View reviewed changes

cranelift/zkasm_data/benchmarks/Design.md

+                  - For `zkasm_op_x` sums are taken separately for each `MInst` that emitted `zkasm_op_x`.
+              - The order of the tuple elements can be chosen via an UI or CLI flags. For example, `num_cycles` can be shown first, in which case `a = num_cycles`.
+              - Sorting is done by comparing the first element of `<count>` tuples.
+              - All `zkasm_op_*` rectangles have the same width.

aborg-dev Jan 12, 2024

What was your motivation for this two-level breakdown?

The more I think about this, the more it seems that the further breakdown of MInst::Instruction into zkasm_op_* will not bring much value compared to the complexity it will introduce.
The lowering from the concrete MInst to ZKASM opcodes is fairly deterministic and can be seen in the ZKASM backend code. I think we'll all soon get the intuition about how much each instruction costs and would not need to dive into this detail when doing an optimization of the benchmark like SHA256.
The only scenario where we want to see this breakdown is when we optimize a specific instruction but in this case, it's enough for us to see a breakdown only for this instruction and not for all instructions in the program.

It might be useful to think end-to-end about the common optimization journeys and what information we need to be able to do them. Two that come to mind right now are:

Make a single benchmark faster
Make a particular instruction faster

Author

mooori Jan 15, 2024

How to use flame graphs?

I assume flame graphs are typically used to see where a program spends most time and then try to optimize that work. For instance, a web frontend might be slow because it spends a lot of time drawing colors on the screen. Further examinations show mostly blue rectangles are re-colored blue, which is unnecessary. So the frontend is optimized by drawing a color only if the area currently has another color.

The compiler receives wasm as input and generates zkASM. We also want to see where most of the time is spent and then optimize that work. However, our understanding of the work done by the program is not as deep as in the previous example and therefore we cannot change what the program does (draw a color only if it differs from the area’s current color). At least in the general case, i.e. when the program is not a benchmark for which we control the original source code.

The wasm we receive as input likely already has been optimized by the compiler that produced it, since developers are not expected to write larger programs directly in wasm. This makes it even harder to figure out what the program does.

For these reasons I’m wondering if (conventional) flame graphs contain a lot of information that we cannot utilize to optimize the emitted zkASM? In that case something simpler might already be sufficient. Or perhaps I am missing something or having a wrong understanding of something?

Usage of Analysis benchmarking tools

My assumption was that Analysis benchmarking tools will be used to identify:

Hot MInsts for a given program and a breakdown of their opcodes and costs on the zkASM side. This helps to identify what to optimize since even micro optimizations of hot opcodes can have a significant impact overall.
Opcode sequences that lend themselves to peephole optimizations. For this task I think it would be helpful to see which blocks/labels of a program are on the hot path, since optimizations here are likely to have a higher impact.

Motivation for graph schema in `8c2bcf0`

The motivation was to help with 1. described above. Looking at WAT files like the sha256 benchmark doesn’t reveal which MInsts or zkASM opcodes are hot. The graph aggregates <count> values over the execution of the program and hence helps to identify hot MInsts and zkASM opcodes.

Having the breakdown of MInst to zkASM opcodes in the graph helps to make information available in one place. Though as you mentioned the zkASM opcodes can be looked up in backend code and including them in the graph might not be worth the extra complexity.

Another visualization: hot labels

The visualization proposed above wouldn’t help with 2. An idea for a visualization that helps with identifying peephole optimization opportunities would be printing hot labels. Below is a rough sketch of how that might look. There could be CLI flags to determine how much data is included, e.g. to include only the X most frequent instructions per block to avoid cluttering the graph.

This shows hot labels for a given program and would allow us to identify the labels where peephole optimizations might have the biggest impact on performance. It could also help with the “make a single benchmark faster” optimization journey.

Flame graphs via wasm profiling

If control flow of wasm and the zkASM it gets compiled to are comparable, we might consider producing flame graphs by profiling wasm execution. It is supported by wasmtime and there are many libraries to generate flame graphs from standard profiling data (e.g. flamegraph for Rust). This procedure could be used for the benchmark optimization journey where we know and control the Rust and wasm of the program that is compiled to zkASM.

aborg-dev Jan 22, 2024

The compiler receives wasm as input and generates zkASM. We also want to see where most of the time is spent and then optimize that work. However, our understanding of the work done by the program is not as deep as in the previous example and therefore we cannot change what the program does (draw a color only if it differs from the area’s current color). At least in the general case, i.e. when the program is not a benchmark for which we control the original source code.

Thanks for raising this very important point. When we build the tooling for performance optimization, we need to have a clear set of use cases we aim to address. For Stage 2, those use cases are:

Support and optimize benchmarks equivalent to https://risc0.github.io/ghpages/dev/benchmarks/index.html for the purpose of fair comparison
Support and optimize full WASM interpreter

In both cases, we do have control over the code that will be compiled to WASM and later to ZKASM and I would expect us to change that code to yield better performance (like we already do for benchmarks by compiling them with no_std). We also can annotate this code with tracing statements (e.g. using tracing) to make it easier to see the correspondence between the original and generated code.

In other words, we're not aiming to build tooling to optimize arbitrary ZKASM programs in the wild, instead, we focus on a selected subset of "pet" programs that are important to us.
Note, that this will also be the case beyond Stage 2 - in Stage 3 we will do optimization specifically for NEAR Protocol code and NEAR host functions. Similarly, any users that will be using ZKWASM will have a particular Rust program that they are trying to produce ZK proofs for (e.g. a NEAR light client) and they will be willing to optimize this Rust program and will also benefit from such tooling.

In this light, I think we should focus on visualizations, that allow us to work effectively with programs that have medium-to-large sizes and are annotated/structured well enough to allow us to efficiently optimize them.

aborg-dev reviewed

View reviewed changes

cranelift/zkasm_data/benchmarks/Design.md


		- No dependencies required as SVG files can be viewed in a web browser.
		- They can be embedded in Github comments, markdown files and other documents.

aborg-dev Jan 12, 2024 •

edited

Loading

I agree with these advantages, but I think it's also worth listing the limitations and how SVGs compare with alternative solutions (e.g. no built-in diff compared to plaintexts, limited interactivity compared to HTML).
Then, based on our requirements for visualization, we can choose between these solutions.

aborg-dev reviewed

View reviewed changes

cranelift/zkasm_data/benchmarks/Design.md


		Due to the sort order defined above, there are different graphs for different assignments of `a`. This allows developers to highlight different costs, for instance register writes if `a = num_reg_writes` or virtual machine cycles if `a = num_cycles`.

		Once the data described in the previous section is available and has been used while working on the repository, opportunities for visualizations might be identified. They should help to make the information more easily digestible. Some examples of possible visualizations are:

aborg-dev Jan 12, 2024

Just to throw in one more visualization idea from perf world - Trace Profile visualizer like https://profiler.firefox.com/docs/#/

It would allow us to use an interactive profile explorer like https://share.firefox.dev/3OFPnAP that supports flamegraphs as long as we export the profile in a standard JSON format.

Author

mooori commented Jan 19, 2024

@nagisa brought up the following points in today’s sync meeting:

Wall clock time

We might want to include wall clock time in the output of benchmarking tools.
Even if zkASM instrumentation does not affect VM cycles, its impact on wall clock time might still become an issue. If benchmarking infrastructure is too slow it might not be used.

Additional things that could be measured

Number of columns or rows.

Approach to measuring the impact of register allocation

Execute a program on machines with different numbers of registers and then compare:
- cycles
- wall clock time
Better performance with more registers implies that we can optimize generated zkASM by making register allocation more efficient.
Increasing the number of VM registers probably requires support from Polygon.
Decreasing the number of used registers could be achieved by modifying cranelift. However, since the number of registers is already small, we probably couldn’t use less than the current number.

aborg-dev mentioned this pull request

[ProjectTracking]: ZK WASM Stage 2, WASM Interpreter near/near-one-project-tracking#6

Open

mooori mentioned this pull request

bench: Add MVP design and implementation outline #197

Merged

Author

mooori commented Feb 9, 2024

Work on this design doc has shown that there are open questions regarding what exactly we want to build. To answer that question a MVP is built:

Once conclusions from the MVP have been drawn, I suggest to start a new design doc. This might be easier to review compared to an almost complete rewrite of this doc.

mooori closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet