Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stabilize alt_bn128 familiy of host functions #6824

Merged
merged 6 commits into from
Jun 2, 2022

Conversation

matklad
Copy link
Contributor

@matklad matklad commented May 18, 2022

Feature to stabilize

This PR stabilizes three host functions: alt_bn128_g1_multiexp, alt_bn128_g1_sum, alt_bn128_pairing_check. They implement addition, scalar multiplication, and pairing check for a specific elliptic curve used in the ethereum ecosystem (eip-196).

Testing and QA

This feature underwent extensive testing:

  • we had several audits
  • aurora impements ethereum precompiles on top of these host functions, and those precompiles pass ethereum tests
  • this PR adds a couple more tests generating using the implementation used in go-ethereum.
  • we verified our costs against costs in ethereum, they are roughly comparable in terms of wall-clock time

Pre-mortem

The biggest risk I see is that we are not experts in elliptic curve crypto, so it's hard to judge if the API overall makes sense. Maybe it could be more general, maybe there are better curves, etc. However, it does fit aurora use-case and, given that the impl here is rather straightforward, even if we change something in the future, keeping the current functions won't be too onerous.

Checklist

@matklad matklad requested a review from a team as a code owner May 18, 2022 12:07
@matklad matklad requested a review from mm-near May 18, 2022 12:07
@matklad matklad requested review from akhi3030 and jakmeier May 18, 2022 12:10
core/primitives-core/src/config.rs Outdated Show resolved Hide resolved
core/primitives-core/src/config.rs Outdated Show resolved Hide resolved
@@ -166,7 +166,7 @@ pub fn arbitrary_contract(seed: u64) -> Vec<u8> {
config.exceptions_enabled = false;
config.saturating_float_to_int_enabled = false;
config.sign_extension_enabled = false;
config.available_imports = Some(rs_contract().to_vec());
config.available_imports = Some(base_rs_contract().to_vec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some problems understanding why this is changed. Can you explain it to me please?
For one, it seems counter-intuitive that arbitrary_contract now returns not the standard test contract. I would have thought that a caller that specifically asks for "arbitrary" is able to handle any contract, so the standard contract should be good enough.
Further, I don't really see how this relates to the feature being stabilized here. Is it to avoid testing the change in test_wasmer2_artifact_output_stability? Why wouldn't we want to test that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rely on arbitrary_contact being deterministic, as we use it our artifact stability test here:

fn test_wasmer2_artifact_output_stability() {

rs_contract may grow new imports over time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains why we are doing it. But doesn't it change the semantics of arbitrary_contract to an extent that we should rename the function and change the comment on it? (I think we only call it from this test, which relies on contract properties making it non-arbitrary)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m confused as well. What does ‘arbitrary_contact being deterministic,’ mean here? There is no guarantee that base_rs_contract will not be changed in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that matters here is the imports of the base_rs_contract, and those are unlikely to change (b/c adding import is a protocol change).

We could rename it to arbitrary_deterministic_contract, or add a comment explaining how we rely on it being deterministic, but I'd rather not do this. Today, we have a single call-site for this function, and its seems to early to enshrine a specific semantics here. Maybe we'd want to just move this function from common library to that test!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explaining. Moving it to the test would probably make sense, yeah. But it doesn't make a big difference. The signature of arbitrary_contract, to which I count the name itself, too, is still awkward to me.
But I feel it does not matter that much. I was only worried that we are adding a tiny bit of technical debt here but IMO it's not worth the time spent on further discussions. :)

@PandaRR007
Copy link

Hi guys, will this feature be included in release 1.27.0? Thanks.

@matklad
Copy link
Contributor Author

matklad commented May 19, 2022

@PandaRR007 I think it will be!

@matklad matklad requested a review from mina86 May 19, 2022 09:22
@PandaRR007
Copy link

@PandaRR007 I think it will be!

Good news. I'm looking forward to it.

@mina86
Copy link
Contributor

mina86 commented May 21, 2022

Hi guys, will this feature be included in release 1.27.0? Thanks.

It most likely won’t. The current policy is that we cut a release a week before the testnet release which happens next Wednesday. In other words only things which were in master on 18th will be included in 1.27.0-rc.1 and no new futures come in during -rc cycle.

@@ -233,10 +232,9 @@ impl ProtocolFeature {
| ProtocolFeature::LimitContractLocals
| ProtocolFeature::ChunkNodesCache
| ProtocolFeature::LowerStorageKeyLimit => 53,
ProtocolFeature::AltBn128 => 54,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

55, and you’ll also need to increase STABLE_PROTOCOL_VERSION.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I see that now we "jump" over versions 52 and 54, in a sense that these versions won't have any protocol features associated with them. How does this happen? Intuitively it seems that, to make a protocol change, we should have ProtocolFeature, so that we can apply old logic for old protocol versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

54 is a network layer protocol change when @pompon0 added protobuf support so it doesn’t affect the chain itself. We probably should decouple the two versions at some point. Though with protobufs it might be easier not to worry about network layer protocol version that much.

@@ -166,7 +166,7 @@ pub fn arbitrary_contract(seed: u64) -> Vec<u8> {
config.exceptions_enabled = false;
config.saturating_float_to_int_enabled = false;
config.sign_extension_enabled = false;
config.available_imports = Some(rs_contract().to_vec());
config.available_imports = Some(base_rs_contract().to_vec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m confused as well. What does ‘arbitrary_contact being deterministic,’ mean here? There is no guarantee that base_rs_contract will not be changed in the future.

Comment on lines +49 to 55
# FIXME(#6822): we should just remove the payload logic. 10Kib variant is
# broken, because the baseline contract is >10KiB (data for alt_bn estimatons).

# 10KiB
dd if=/dev/urandom of=./res/payload bs=$(expr 10240 - ${bare_wasm}) count=1
dd if=/dev/urandom of=./res/payload bs=1 count=1
cargo build --target wasm32-unknown-unknown --release --features "payload$features_with_comma"
cp target/wasm32-unknown-unknown/release/test_contract.wasm ./res/stable_small_contract.wasm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why FIXME? Can’t we just do it now? If bare_wasm is ≥ 10240 than just cp -- test_contract.wasm stable_small_cotract.wasm and we can go on with our lives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bare_wasm is ≥ 10240 than just cp -- test_contract.wasm stable_small_cotract.wasm and we can go on with our lives.

The contract will then be bigger than 10KiB, but the current code is written as if it being exactly 10KiB matters.

Ultimately, I suspect that this whole file is mostly dead code at this point, but I'd rather not deal with it during stabilization PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah please do not change it in this PR. The estimator uses the sizes. Eventhough it doesn't rely on it being exact, I would still want to check that and a stabilization PR is not the right place for such a change anyway.

Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to stabilise this, LGTM. I am happy to see this moving forward! Too bad we will have to wait for another cycle, I did not have the 1 week on my radar...

(Second approval is also required although not enforced by GH.)

@mina86
Copy link
Contributor

mina86 commented May 23, 2022

I did not have the 1 week on my radar...

Yeah, it’s a new policy. What has been happening so far is that on the day of the release we would scramble to make the cut and then test things before pushing testnet release which wasn’t ideal.

@matklad matklad force-pushed the m/stabilize-altbn-128 branch 5 times, most recently from 7bf52a3 to 02a4655 Compare May 24, 2022 12:32
@matklad
Copy link
Contributor Author

matklad commented May 24, 2022

Test failure is interesting:

[2022-05-24 12:35:35] INFO: Got protocol 53 in mainnet release 1.26.0.
[2022-05-24 12:35:35] INFO: Got protocol 53 in testnet release 1.26.0-rc.3.
[2022-05-24 12:35:35] INFO: Got protocol 55 on master branch.

This is probably a side-effect of our time-based protocol upgrade process?

@mina86
Copy link
Contributor

mina86 commented May 24, 2022

I was afraid upgradable.py might be an issue. The time-based upgrades aren’t an issue here. The test compares versions that --version outputs. The problem in this instance is that 1.27.0-rc.1 with version 54 hasn’t been released yet and the test doesn’t understand that upcoming 1.27.0-rc.1 will use protocol version 54. I think at this point the easiest solution is to wait till Wednesday evening or Thursday once 1.27.0 rolls out and then the test will see 53 on mainnet, 54 on testnet and will allow 55 on master.

@matklad
Copy link
Contributor Author

matklad commented May 24, 2022

sgtm!

@matklad matklad force-pushed the m/stabilize-altbn-128 branch 3 times, most recently from 7f310fc to 27e6560 Compare June 1, 2022 17:56
@matklad matklad requested a review from mina86 June 1, 2022 18:10
@matklad
Copy link
Contributor Author

matklad commented Jun 1, 2022

@mina86 PTAL!

@@ -109,12 +109,31 @@ fn test_alt_bn128_g1_multiexp() {
}

check_ok(&le_bytes![], &le_bytes![0x0 0x0]);
check_ok(
&le_bytes![
0x2d6b17489d86fcd5f91e8e92eb55081d8cb4413e408047249ef4fb5baa1b518b 0x1e4d0a30dbadd9dad40f7847c7013754ded8d0371c052d19f01453f4ae1506d7 0x1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m confused by formatting in this file. Sometimes the data is on separate line, sometimes the whole thing is a single line. Commas also seem to be used arbitrarily. Furthermore, I’d wrap all the buffers at the space. It’s probably too much noise to change it all though so whatever.

Comment on lines 107 to 125
let prepared_hashes = [
5920482302426237644,
4305202105567340810,
5775536517394665889,
6282866610476321669,
9987754974020503265,
2522443647498253022,
1434775828544411571,
12248437801724644735,
2647244875869025389,
892153519407678490,
8592050243596620350,
2309330154575012917,
9323529151210819831,
11488755771702465226,
];
let mut got_prepared_hashes = Vec::with_capacity(seeds.len());
let compiled_hashes = [
4678798493694903297,
4722680261811640693,
7795642610370765019,
15143423944524767029,
7504125870827587271,
3662584175683490815,
13449186496170384379,
5827744486935367002,
3163481497450515654,
12932669301919595047,
4509630115775888919,
5285162149441033812,
15892844827657184765,
7871022777077203514,
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we want to change this to use insta as well at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we probably should, though, this shouldn't be changing all that often (this change is particular is because I adjusted the infra to be more stable, not because this is a genuine change).

Extra test cases were generated using go-ethereum implementation of the
curves.
@near-bulldozer near-bulldozer bot merged commit 60d9f4b into master Jun 2, 2022
@near-bulldozer near-bulldozer bot deleted the m/stabilize-altbn-128 branch June 2, 2022 10:27
nikurt pushed a commit that referenced this pull request Jun 2, 2022
# Feature to stabilize

This PR stabilizes three host functions: `alt_bn128_g1_multiexp`, `alt_bn128_g1_sum`,  `alt_bn128_pairing_check`. They implement addition, scalar multiplication, and pairing check for a specific elliptic curve used in the ethereum ecosystem ([eip-196](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-196.md)).

# Testing and QA

This feature underwent extensive testing:

* we had several audits
* aurora impements ethereum precompiles on top of these host functions, and those precompiles pass ethereum tests
* this PR adds a couple more tests generating using the implementation used in go-ethereum. 
* we verified our costs against costs in ethereum, they are roughly comparable in terms of wall-clock time

# Pre-mortem

The biggest risk I see is that we are not experts in elliptic curve crypto, so it's hard to judge if the API overall makes sense. Maybe it could be more general, maybe there are better curves, etc. However, it does fit aurora use-case and, given that the impl here is rather straightforward, even if  we change something in the future, keeping the current functions won't be too onerous. 

# Checklist
- [x] Link to nightly nayduck run: https://nayduck.near.org/#/run/2510
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
@frol
Copy link
Collaborator

frol commented Jul 28, 2022

DISCLAIMER: I just ask it out of curiosity, so feel free to ignore it if you don't have the time to answer.

@matklad I am quite late to the party, but I am curious whether we measured the performance of these host functions vs Wasm implementation. It sounds quite unfortunate that we need to have host functions to optimize number crunching performance as Wasm by design supposedly should have covered us here.

P.S. It would be helpful to have link(s) to the PRs that implemented this as a nightly feature to see potential discussions there

@matklad
Copy link
Contributor Author

matklad commented Jul 28, 2022

P.S. It would be helpful to have link(s) to the PRs that implemented this as a nightly feature to see potential discussions there

Good call, #7288

I am quite late to the party, but I am curious whether we measured the performance of these host functions vs Wasm implementation.

There are two questions here:

  • what is the perf gap between native and our particular WebAssembly runtime (optimized for reliability)
  • what is the perf gap between native and a WebAssembly runtime optimized performance

For 1, I can't recall super-specific numbers, but I think we did measure a massive cost reduction. To get specific number, you want to play with this test before/after commit locally:

birchmd/aurora-engine@fd4243b#diff-ed8b4fc612dfece459decfe0a47cf4079f5b3e3b7c29cc1f7c4e3be2b42d9b87

The test proves that host fn brought the cost under 200TGas. I don't know the exact difference, but my vague recollection is that was huge.

For 2, we didn't do any measurements, though I'd still expect a significant perf difference there.

Wasm by design supposedly should have covered us here.

My current gut feeling is that our wasm runtime provides non-horrible number crunching perf, but that it is expected to be significantly worse than what you get from a host function

Reasons specific to our WebAssembly runtime (reliability over perf):

  • we have a simple non-optimizing single-pass compiler
  • gas-counting is non-trivial overhead
  • we don't support certain perf-oriented wasm extensions, eg. SIMD. Note that even if we did support SIMD, there would be a perf penalty for more complicated gas accounting.
  • with a wasm impl, the cost is pessimistic -- even if particular hash function runs fast in WebAssembly, we have to pessimistically estimate it (we use worst-case cost for WebAssembly instruction). For a host function, we estimate a fixed computation, so we don't need to be pessimistic across this dimension.

Reasons general to WebAssembly:

  • At the moment, WebAssembly generally doesn't expose performance-oriented CPU instructions like fused multiply-add or add-with-carry. My intuition here is that WebAssembly is 2X slower for run-of-the-mill code (pointer chaising, conditions) and 20X slower for really hot code, something you traditionally write in asm (codecs, crypto, interpreters).

@akhi3030
Copy link
Collaborator

As a related data point, in https://gov.near.org/t/near-polkadot-using-ibc-trustless-bridging-requests/22807/5, @blasrodri benchmarked the performance difference between wasm and native execution for some signature verification. More specifically they showed that native execution can be much faster.

@robert-zaremba
Copy link
Contributor

alt_bn128 is from being fast (in 2022). And there are some security concerns.

In the meantime, many most of the projects opted in BLS12-381. Now, I think the most exciting is Pasta (halo2). Would be great to consider them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants