Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0095: Extendable Output (XOF) Hashing Support #95

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ankeleralph
Copy link

This proposal introduces three new concepts to the Solana runtime:

  • Support extendable Output Functions (XOF) hashing, based on cSHAKE
  • Support cSHAKE as a customable
    version of SHAKE
  • Support the STROBE protocol based on cSHAKE

Using the above new concepts would enable regular Solana programs to:

  • Use merlin transcripts, automating the Fiat-Shamir transform for
    zero-knowledge proofs, which turns interactive proofs into non-interactive proofs
  • Use the widely used BulletProofs zero-knowledge library

@ankeleralph ankeleralph changed the title SIMD-XXXX: Extendable Output (XOF) Hashing Support SIMD-0095: Extendable Output (XOF) Hashing Support Dec 14, 2023
@ripatel-fd
Copy link
Contributor

ripatel-fd commented Dec 29, 2023

@ankeleralph Thanks for submitting this SIMD. Somewhat related: The Firedancer team has also been working on a native new cryptographic set accumulator to replace the epoch account hash, but we haven't gotten to publishing a SIMD yet. This accumulator needs a 2048-bit hash function. Conveniently, accounts are already hashed with BLAKE3, which can be trivially expanded to an XOF while calculating the final block of the hash function.

Solana already suffers from a proliferation of too many hash functions, and it costs a lot of time to write optimized implementations for validator clients. We should therefore avoid standardizing two different XOF hash functions.

Without considering the cryptographic quality of SHAKE vs BLAKE3, I would prefer BLAKE3: It is already widely used and native to the Solana runtime. Exposing it to smart contracts is therefore also a much smaller implementation change than introducing a new hash function.

Could you amend this proposal to evaluate BLAKE3 XOF as an alternative? I'm sure there are some arguments against BLAKE3 for your proposed use cases.

Comment on lines +84 to +88
Another alternative is to implement the BulletProof zero-knowledge library as a
native program entirely, however, this would limit the use cases that can
additionally be enabled by supporting the customable extendable output functin
cSHAKE, and the merlin transcripts. Though, supporting a native zero-knowledge
proof library would likely be more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like introducing syscalls for cSHAKE would only serve to accelerate computation.

When these syscalls are introduced, usually before/after benchmarks are provided.
How much faster would your proposed syscalls be compared to an eBPF implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will re-double this message. Adding new syscalls can hurt overall performance for all smart contracts. It would be useful to see an SBPF implementation as that may be sufficient, especially if only one or a handful of apps need this right now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are much more benefits that other developers could have from existing cSHAKE syscalls. A cSHAKE implementation would enable to build customisable output length hash functions, domain separation for different protocol components and potential higher security due to customisation features. Overall it would make it a more versatile and potentially robust choice for some cryptographic operations within the Solana VM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankeleralph I'm comparing against a pure eBPF implementation here, not to other syscalls. A pure eBPF implementation is even more versatile than the proposed syscall because you can optimize and extend the logic (e.g. use different parameters for the sponge function) without a hard fork.

Comment on lines +157 to +160
define_syscall!(fn sol_cshake128(vals: *const u8, val_len: u64, func_name:
*const u8, cust_string: *const u8, hash_result: *mut u8) -> u64);
define_syscall!(fn sol_cshake256(vals: *const u8, val_len: u64, func_name:
*const u8, cust_string: *const u8, hash_result: *mut u8) -> u64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, cshake is only a thin wrapper over SHAKE and Keccak. keccak256 is already exposed via a syscall. Why not separately expose a SHAKE syscall and implement the wrapper in eBPF? This would be more flexible and have small overhead in eBPF.

Although other syscalls don't do this yet, I'd strongly recommend a batch-style API that takes multiple inputs.
On modern x86, the fastest hashing technique for the SHA2/3 and BLAKE2/3 families of hash functions is typically a SIMD approach where 8 or 16 hash states are calculated at once. This is typically 2-4x faster over hashing one message at a time.

Comment on lines +238 to +246
define_syscall!(fn sol_strobe128_ad(...) -> u64);
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe128_key(...) -> u64);
define_syscall!(fn sol_strobe128_prf(...) -> u64);

define_syscall!(fn sol_strobe256_ad(...) -> u64);
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe256_key(...) -> u64);
define_syscall!(fn sol_strobe256_prf(...) -> u64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that separate syscalls would provide a significant speedup over an eBPF implementation of strobe that uses a batch SHAKE or Keccak syscall as described here: https://github.com/solana-foundation/solana-improvement-documents/pull/95/files#r1438427328

AFAICT, the most expensive operation in the Strobe framework is Keccak/SHAKE hashing. The rest seems to be just byte array concatenation, which is decently fast in eBPF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We need benchmarks to see how long the keccak portion takes vs the other work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its obvious that most performance will be required by the underlying call to the sponge function (in this case cSHAKE or Keccak), as the metadata operations will not require a lot of overhead. Let me know if you want to see some more concrete numbers for benchmarks, happy to quickly do a few to support the adoption of this proposal.

Copy link
Contributor

@ripatel-fd ripatel-fd Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these metadata operations then? My overall concern as a validator maintainer is the maintenance burden.

It would take me a week to build cross-client testing infrastructure for all of these syscalls. This would also include formal verification (https://saw.galois.com/) of the Rust (Labs/Agave) and C (Firedancer) implementations for equivalence. All this overhead would be unnecessary if implemented in eBPF.

Copy link
Contributor

@lheeger-jump lheeger-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the proposal is a bit confusing. It would be useful to know who would actually be the concrete app(s)/user(s) of all of these new syscalls and hashing formulations. Its seems as though the STROBE bits can be implemented in the VM and do not require their own syscalls. Benchmarks will also be necessary. I would recommend also addressing @ripatel-fd's comments.


This proposal introduces three new concepts to the Solana runtime:

- Support extendable Output Functions (XOF) hashing, based on cSHAKE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLAKE3 has XOF, could we not make the existing syscall for BLAKE3 just support XOF?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding, cSHAKE vs BLAKE3, as @samkim-crypto already mentioned this would require developers to change the proof generation as well, which would create an additional barrier to move the proof verification on-chain.

Comment on lines +84 to +88
Another alternative is to implement the BulletProof zero-knowledge library as a
native program entirely, however, this would limit the use cases that can
additionally be enabled by supporting the customable extendable output functin
cSHAKE, and the merlin transcripts. Though, supporting a native zero-knowledge
proof library would likely be more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will re-double this message. Adding new syscalls can hurt overall performance for all smart contracts. It would be useful to see an SBPF implementation as that may be sufficient, especially if only one or a handful of apps need this right now.


#### Implementation Details

The Strobe designers released an official implementation in C available at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlikely the FD will use this implementation and will roll our own.

Comment on lines +238 to +246
define_syscall!(fn sol_strobe128_ad(...) -> u64);
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe128_key(...) -> u64);
define_syscall!(fn sol_strobe128_prf(...) -> u64);

define_syscall!(fn sol_strobe256_ad(...) -> u64);
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe256_key(...) -> u64);
define_syscall!(fn sol_strobe256_prf(...) -> u64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We need benchmarks to see how long the keccak portion takes vs the other work.

Comment on lines +238 to +245
define_syscall!(fn sol_strobe128_ad(...) -> u64);
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe128_key(...) -> u64);
define_syscall!(fn sol_strobe128_prf(...) -> u64);

define_syscall!(fn sol_strobe256_ad(...) -> u64);
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64);
define_syscall!(fn sol_strobe256_key(...) -> u64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every new syscall adds overhead and hurts performance for the VM vs an SBPF implementation for some specific app. I would recommend finding an approach similar to what @ripatel-fd has mentioned.

Copy link
Contributor

@ripatel-fd ripatel-fd Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concur with @lheeger-jump. The technical reasons for this is the code footprint of a compiled interpreter. Right now, all the interpreter core and all its cheap syscalls probably fit in L2 cache. The more syscalls we add, the more instruction cache pressure is increased across every program execution.

On an FPGA implementation of the VM, this is even worse as you start running into hard physical constraints.

@samkim-crypto
Copy link
Contributor

The main application for this would be the boomerang protocol for Brave (https://arxiv.org/pdf/2401.01353.pdf), which uses Bulletproofs for the zkp. The main components that currently make the bulletproof implementation difficult is the curve operations and the support for hash function needed for Fiat-Shamir. The curve operations can use the curve25519 syscalls, but we currently do not have support for XOF hash support to implement Fiat-Shamir.

In theory, the Fiat-Shamir heuristic can be implemented using any cryptographic hash function including BLAKE3 XOF.
However, many standard zk implementations use SHAKE as it is more standard in academic works. This is the case for bulletproofs as well. Re-implementing bulletproofs/other zk implementations using alternative hash functions would require quite a bit of effort and time for application builders.

Brave expects boomerang to have about 5 million users for the boomerang protocol (@ankeleralph ). Since token22 also uses bulletproofs, this proposal can also benefit future zkp additions to token22 as well. Given that SHAKE is used in many of the newer (non-bulletproof) zk systems for Fiat-Shamir as well, I would be in favor of this SIMD given that the above points are addressed.

@lheeger-jump
Copy link
Contributor

Since token22 also uses bulletproofs, this proposal can also benefit future zkp additions to token22 as well

I think then instead, I would propose a replacement for ZkTokenProgram as these two would be redundant

@samkim-crypto
Copy link
Contributor

Since token22 also uses bulletproofs, this proposal can also benefit future zkp additions to token22 as well

I think then instead, I would propose a replacement for ZkTokenProgram as these two would be redundant

Yes, ZkTokenProgram uses bulletproofs plus additional smaller proofs that are outside the scope of bulletproofs, but if this proposal is implemented, then the instructions in ZkTokenProgram could just be replaced with the curve25519 and XOF hash support.

@ankeleralph
Copy link
Author

Hi @lheeger-jump, @ripatel-fd, to clarify my intended usecase, we are building a protocol (more details in our research paper, or blog post) that uses Bulletproof zero-knowledge proofs, which we would like to verify within a Solana program. Therefore, we would need support for the merlin transcripts, that are based on STROBE, and STROBE is using cSHAKE.
While I agree, that the STROBE functionality can be implemented in a eBPF program, I think it would be nice if Solana would provide an implementation within the VM, as it would simplify other developers to verify the widely used BulletProof zero-knowledge proofs within a Solana program.
Regarding, cSHAKE vs BLAKE3, as @samkim-crypto already mentioned this would require developers to change the proof generation as well, which would create an additional barrier to move the proof verification on-chain.
With regards to benchmarks of the STROBE framework, I think its obvious that most performance will be required by the underlying call to the sponge function (in this case cSHAKE or Keccak), as the metadata operations will not require a lot of overhead. Let me know if you want to see some more concrete numbers for benchmarks, happy to quickly do a few to support the adoption of this proposal.

@ripatel-fd
Copy link
Contributor

@ankeleralph Thank you for the context.

While I agree, that the STROBE functionality can be implemented in a eBPF program, I think it would be nice if Solana would provide an implementation within the VM, as it would simplify other developers to verify the widely used BulletProof zero-knowledge proofs within a Solana program.

Solana provides facilities for sharing eBPF code across different programs either by statically linking it in via Cargo or by deploying a shared program invoked via cross-program-invocation. Would you be open to trying that first?

@lheeger-jump
Copy link
Contributor

Doing an implementation on chain as @ripatel-fd mentioned is a good way to verify the feasibility of the feature.

@sakridge sakridge added the core Standard SIMD with type Core label Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Standard SIMD with type Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants