Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zkVM tests: add feature flag to build multi-test with docker environment #912

Merged
merged 19 commits into from
Sep 28, 2023

Conversation

SchmErik
Copy link
Contributor

This PR adds a test-exact-cycles feature for risc0-zkvm. This is used to build test guest binaries like multi-test using the docker environment in the cargo test command. The intention here is to run the tests using a reproducible ELF binary to eliminate test failures across different architectures that are caused by entropy in the ELF binaries resulting from rust build tools.

I've gated 4 tests using this feature flag. Initially, what I had in mind was to isolate these tests to a different CI test step but I realized that it was cumbersome to isolate the 4 tests as integration tests and prevent other tests from running and I would have to proliferate a bunch of #[cfg(feature = ...)] directives. My solution is to create a new feature flag that the CI can use and it will run all tests including the ones that rely on exact cycles counts. If users do not wish to run the tests that require the reproducible binary, they can simply leave out the test-exact-cycles feature flag in their cargo test invocation.

Most of the code changes in this PR involves moving code from the cargo risczero build command to the risc0-build crate.

The reproducible build implementation used to be a part of the cargo-risczero
utility. This change moves the code from cargo-risczero to risc0-build. By
doing so, we will be able to integrate the docker builds as a part of the
risc0-build mechanism. Eventually, it would be nice for users to be able to
build guest code using by setting feature flags for the risc0-build crate.

This code movement is the first step in facilitating users to use docker to
build their guest code.
The tests being gated by this feature flag measure cycles and segments and are
extremely sensitive to changes in the elf binary. We have seen cases where CI
machines fail these tests on different architectures depending on the commit.
What's interesting is that the test behavior changes even for commits that did
not actually change the test. The root of this problem lies in the fact that
rust does not support reproducible builds so that each architecture is running
slightly mismatched ELF binaries from eachother.

This is an attemp to eliminate test failures that happen from reproducible
builds. The tests under this new flag must be run on elfs generated by the
docker environment. All others can be run by the usual `cargo test` command.
risc0/zkvm/methods/build.rs Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
risc0/build/src/docker.rs Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
@github-actions
Copy link

Benchmark for Linux-cuda fabecb5

Click to hide benchmark
Test Base PR %
fib/100/execute 5.3±0.17ms 5.2±0.14ms -1.89%
fib/100/prove 1165.7±28.87ms 851.3±21.79ms -26.97%
fib/100/total 1157.2±11.06ms 838.4±15.07ms -27.55%
fib/1000/execute 5.9±0.07ms 5.7±0.11ms -3.39%
fib/1000/prove 1192.3±22.61ms 872.0±13.56ms -26.86%
fib/1000/total 1174.8±16.46ms 877.2±14.20ms -25.33%
fib/10000/execute 12.0±0.19ms 11.9±0.20ms -0.83%
fib/10000/prove 3.6±0.03s 3.4±0.02s -5.56%
fib/10000/total 3.5±0.01s 3.4±0.02s -2.86%

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default fabecb5

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.16ms 2.8±0.12ms 0.00%
fib/100/prove 3.7±0.05s 3.6±0.05s -2.70%
fib/100/total 3.7±0.06s 3.7±0.07s 0.00%
fib/1000/execute 3.1±0.10ms 3.0±0.08ms -3.23%
fib/1000/prove 3.7±0.06s 3.7±0.08s 0.00%
fib/1000/total 3.7±0.10s 3.7±0.08s 0.00%
fib/10000/execute 6.2±0.10ms 6.1±0.08ms -1.61%
fib/10000/prove 15.1±0.10s 15.0±0.15s -0.66%
fib/10000/total 15.0±0.13s 15.0±0.19s 0.00%

Benchmark for macOS-metal fabecb5

Click to hide benchmark
Test Base PR %
fib/100/execute 2.9±0.12ms 2.7±0.11ms -6.90%
fib/100/prove 814.4±6.26ms 802.4±4.93ms -1.47%
fib/100/total 833.6±7.47ms 830.4±5.16ms -0.38%
fib/1000/execute 3.2±0.09ms 3.1±0.02ms -3.13%
fib/1000/prove 833.8±6.21ms 820.1±4.04ms -1.64%
fib/1000/total 850.3±6.23ms 849.6±4.25ms -0.08%
fib/10000/execute 6.2±0.09ms 6.1±0.15ms -1.61%
fib/10000/prove 3.1±0.01s 3.1±0.02s 0.00%
fib/10000/total 3.1±0.01s 3.1±0.01s 0.00%

@github-actions
Copy link

Benchmark for Linux-cuda f7725bc

Click to hide benchmark
Test Base PR %
fib/100/execute 5.0±0.10ms 5.0±0.08ms 0.00%
fib/100/prove 1522.1±78.09ms 1146.4±8.49ms -24.68%
fib/100/total 1416.2±23.73ms 1123.9±6.46ms -20.64%
fib/1000/execute 5.6±0.11ms 5.6±0.09ms 0.00%
fib/1000/prove 1515.3±32.76ms 1173.4±15.12ms -22.56%
fib/1000/total 1402.8±23.78ms 1149.8±9.98ms -18.04%
fib/10000/execute 11.6±0.12ms 11.6±0.09ms 0.00%
fib/10000/prove 4.6±0.02s 3.7±0.03s -19.57%
fib/10000/total 4.4±0.04s 3.8±0.08s -13.64%

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-metal

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

@github-actions
Copy link

Benchmark for Linux-cuda

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default 9e31afc

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.15ms 2.8±0.15ms 0.00%
fib/100/prove 3.6±0.06s 3.6±0.08s 0.00%
fib/100/total 3.7±0.05s 3.6±0.05s -2.70%
fib/1000/execute 3.1±0.08ms 3.0±0.05ms -3.23%
fib/1000/prove 3.7±0.06s 3.6±0.08s -2.70%
fib/1000/total 3.7±0.09s 3.6±0.07s -2.70%
fib/10000/execute 6.2±0.10ms 6.2±0.07ms 0.00%
fib/10000/prove 15.1±0.08s 15.0±0.11s -0.66%
fib/10000/total 15.1±0.17s 15.0±0.18s -0.66%

Benchmark for macOS-metal 9e31afc

Click to hide benchmark
Test Base PR %
fib/100/execute 2.9±0.16ms 2.8±0.04ms -3.45%
fib/100/prove 802.2±5.54ms 802.0±4.84ms -0.02%
fib/100/total 827.4±4.64ms 823.9±6.47ms -0.42%
fib/1000/execute 3.1±0.05ms 3.1±0.03ms 0.00%
fib/1000/prove 821.9±4.12ms 821.5±3.41ms -0.05%
fib/1000/total 852.2±6.72ms 846.6±5.96ms -0.66%
fib/10000/execute 6.2±0.05ms 6.0±0.13ms -3.23%
fib/10000/prove 3.1±0.01s 3.1±0.02s 0.00%
fib/10000/total 3.1±0.01s 3.1±0.01s 0.00%

@github-actions
Copy link

Benchmark for Linux-cuda

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default c2a7b26

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.09ms 2.8±0.10ms 0.00%
fib/100/prove 3.6±0.06s 3.6±0.07s 0.00%
fib/100/total 3.6±0.05s 3.6±0.06s 0.00%
fib/1000/execute 3.1±0.09ms 3.0±0.07ms -3.23%
fib/1000/prove 3.7±0.06s 3.7±0.08s 0.00%
fib/1000/total 3.7±0.09s 3.6±0.09s -2.70%
fib/10000/execute 6.2±0.04ms 6.0±0.05ms -3.23%
fib/10000/prove 15.1±0.10s 15.0±0.08s -0.66%
fib/10000/total 15.0±0.10s 15.0±0.08s 0.00%

Benchmark for macOS-metal

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

The cargo metadata crate seems to not play nicely with absolute paths...
@github-actions
Copy link

Benchmark for Linux-cuda

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-metal

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

@github-actions
Copy link

Benchmark for Linux-cuda 101d599

Click to hide benchmark
Test Base PR %
fib/100/execute 5.1±0.09ms 5.1±0.09ms 0.00%
fib/100/prove 1171.4±47.79ms 1161.4±20.35ms -0.85%
fib/100/total 1141.3±22.91ms 1103.7±8.83ms -3.29%
fib/1000/execute 5.7±0.10ms 5.7±0.11ms 0.00%
fib/1000/prove 1181.4±15.36ms 1111.5±12.84ms -5.92%
fib/1000/total 1145.0±6.63ms 1135.1±16.79ms -0.86%
fib/10000/execute 11.9±0.11ms 11.7±0.12ms -1.68%
fib/10000/prove 4.1±0.04s 3.5±0.03s -14.63%
fib/10000/total 4.2±0.03s 3.5±0.01s -16.67%

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-metal 101d599

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.08ms 2.8±0.13ms 0.00%
fib/100/prove 805.6±5.23ms 798.7±3.36ms -0.86%
fib/100/total 826.0±4.18ms 822.5±7.08ms -0.42%
fib/1000/execute 3.1±0.07ms 3.1±0.05ms 0.00%
fib/1000/prove 820.8±3.57ms 820.6±3.35ms -0.02%
fib/1000/total 843.7±5.99ms 843.4±5.67ms -0.04%
fib/10000/execute 6.2±0.06ms 6.1±0.05ms -1.61%
fib/10000/prove 3.1±0.01s 3.1±0.02s 0.00%
fib/10000/total 3.1±0.01s 3.1±0.01s 0.00%

@github-actions
Copy link

Benchmark for Linux-cuda

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default 848e1ee

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.10ms 2.8±0.16ms 0.00%
fib/100/prove 3.6±0.04s 3.6±0.07s 0.00%
fib/100/total 3.6±0.07s 3.6±0.05s 0.00%
fib/1000/execute 3.0±0.08ms 3.0±0.10ms 0.00%
fib/1000/prove 3.7±0.04s 3.6±0.04s -2.70%
fib/1000/total 3.7±0.06s 3.7±0.05s 0.00%
fib/10000/execute 6.1±0.06ms 6.0±0.05ms -1.64%
fib/10000/prove 15.0±0.13s 15.0±0.17s 0.00%
fib/10000/total 15.1±0.14s 15.0±0.16s -0.66%

Benchmark for macOS-metal

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

@SchmErik SchmErik enabled auto-merge (squash) September 28, 2023 18:00
@github-actions
Copy link

Benchmark for Linux-cuda acfdd8e

Click to hide benchmark
Test Base PR %
fib/100/execute 5.1±0.15ms 5.1±0.14ms 0.00%
fib/100/prove 1461.9±26.41ms 1179.0±66.73ms -19.35%
fib/100/total 1348.0±43.42ms 1134.5±9.88ms -15.84%
fib/1000/execute 5.8±0.06ms 5.7±0.12ms -1.72%
fib/1000/prove 1405.4±65.90ms 1192.3±69.01ms -15.16%
fib/1000/total 1357.6±17.26ms 1156.3±8.05ms -14.83%
fib/10000/execute 12.1±0.18ms 11.7±0.16ms -3.31%
fib/10000/prove 4.6±0.01s 4.3±0.01s -6.52%
fib/10000/total 4.6±0.03s 4.1±0.03s -10.87%

Benchmark for Linux-default

    <details open>
      <summary>Click to hide benchmark</summary>
      Benchmarks have changed between the two branches, unable to diff.
    </details>

Benchmark for macOS-default acfdd8e

Click to hide benchmark
Test Base PR %
fib/100/execute 2.8±0.15ms 2.7±0.14ms -3.57%
fib/100/prove 3.7±0.05s 3.6±0.05s -2.70%
fib/100/total 3.6±0.08s 3.6±0.05s 0.00%
fib/1000/execute 3.1±0.09ms 3.0±0.10ms -3.23%
fib/1000/prove 3.7±0.06s 3.6±0.06s -2.70%
fib/1000/total 3.7±0.05s 3.7±0.07s 0.00%
fib/10000/execute 6.2±0.13ms 6.2±0.06ms 0.00%
fib/10000/prove 15.0±0.16s 15.0±0.12s 0.00%
fib/10000/total 15.0±0.15s 15.0±0.10s 0.00%

Benchmark for macOS-metal acfdd8e

Click to hide benchmark
Test Base PR %
fib/100/execute 2.9±0.14ms 2.8±0.09ms -3.45%
fib/100/prove 800.3±5.01ms 795.1±4.01ms -0.65%
fib/100/total 824.3±5.82ms 820.4±4.46ms -0.47%
fib/1000/execute 3.1±0.08ms 3.0±0.04ms -3.23%
fib/1000/prove 815.9±3.98ms 814.8±4.45ms -0.13%
fib/1000/total 845.4±3.01ms 838.9±7.42ms -0.77%
fib/10000/execute 6.1±0.15ms 6.0±0.04ms -1.64%
fib/10000/prove 3.1±0.01s 3.1±0.01s 0.00%
fib/10000/total 3.1±0.01s 3.1±0.01s 0.00%

@SchmErik SchmErik merged commit dcad3a9 into main Sep 28, 2023
20 checks passed
@SchmErik SchmErik deleted the erik/repro-multi-test branch September 28, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants