Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test vectors & benchmark for PCS #222

Merged
merged 14 commits into from
Jan 5, 2024

Conversation

tchataigner
Copy link
Member

@tchataigner tchataigner commented Jan 2, 2024

Goal of this PR

The goal of this PR is to implement test vectors & benchmark for PCS.

Current progress

Implemented two generic test functions that we can use over any <Engine, EvaluationEngine<Engine>> for both the correct prove/verify flow and a bad proof verification flow.

Left TODO

  • Part of the issue is to tackle benchmark for proof size, proving time and verification time.

Related issue

This is part of tackling the issue #212

benches/pcs.rs Outdated Show resolved Hide resolved
src/provider/util/mod.rs Outdated Show resolved Hide resolved
src/provider/non_hiding_zeromorph.rs Outdated Show resolved Hide resolved
huitseeker added a commit to huitseeker/arecibo that referenced this pull request Jan 2, 2024
* refactor: removed ROConstantsTrait

- trait refactored to Default in associated type,
- obviates the need for a separate trait,
- refactored call sites and only implementation

* fix: remove uneeded Sized bounds

* fix: rename VanillaRO -> NativeRO
@tchataigner tchataigner force-pushed the feature/test-ipa branch 2 times, most recently from 2ca3654 to fb570c5 Compare January 3, 2024 13:35
@tchataigner
Copy link
Member Author

tchataigner commented Jan 3, 2024

I think that I've got the bulk of the logic down.

A sample of what the benches look like can be found here: https://phnmlt.csb.app/

Now, I would like to iterate on what we actually display on the benchmark and how it should be formatted. Currently, the results are divided in two groups: proving and verifying. They are displayed per the power of two used to generate the polynomial we are testing over.

I'm not sure this fully corresponds to what we wanted. I started to look into the Throughput measurements to capture proof size but I'm not sure on how to capture a proof size in a generic manner.

@tchataigner tchataigner marked this pull request as ready for review January 3, 2024 13:51
Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks great, I left some inline comments for simplification.
A rebase would be great, as it would allow us to see improvements from #216.

benches/recursive-snark.rs Show resolved Hide resolved
Comment on lines 85 to 89
let poly = Self::new(
std::iter::from_fn(|| Some(Scalar::random(&mut rng)))
.take(1 << num_vars)
.collect(),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use random to avoid code duplication.

benches/pcs.rs Outdated
}
}

fn deterministic_assets<E: Engine, EE: EvaluationEngineTrait<E>, R: CryptoRng + RngCore>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name throws me off: if you have an &mut rng as an argument, are you creating something deterministic?

(I notice you are using this with a seeded rng, but the determinism isn't an invariant upheld by your function, rather by your usage of it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that the naming here is off. I specified it at deterministic... because if was a pre-determined polynomial size used to generate the assets. I've the implementation to be over BenchAssets and the name of the method to from_num_vars. should be more explicit this way.

@@ -73,6 +77,27 @@ impl<Scalar: PrimeField> MultilinearPolynomial<Scalar> {
)
}

/// Returns a random polynomial, a point and calculate its evaluation.
pub fn random_with_eval<R: RngCore + CryptoRng>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case for generating a random polynomial, a random point, and the corresponding evaluation is pretty narrow. So it may be quite confusing to see this in the public API.

Do we need this somewhere not in benchmark/testing code, or can we just move it to benchmark/testing code where it's needed?
If we need this both in benchmarks and testing, can we just inline this short function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually an update on my code based on a discussion with @adr1anh that pointed out to me that if we had a random we could have a random_with_eval.

What you're saying about it being a test function is true, and it is also the same for random. I could remove both from the public API and instead inline their code where we have some usage.

Would that be alright with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case for random is a bit larger than random_with_eval, so I'd tend to leave random, and move random_with_eval. I think my overall point is that as long as we don't add things to the public API of Arecibo, any decision we take is a 2-way door.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved random_with_eval only to eval :)

benches/pcs.rs Outdated

// Macro to generate benchmark code for multiple engine and evaluation engine types
macro_rules! benchmark_engines {
($ell:expr, $rng:expr, $group:expr, $internal_fn:expr, $( ($engine:ty, $eval_engine:ty, $engine_name:expr) ),*) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use type_name! to avoid the need for the last parameter

benches/pcs.rs Outdated
macro_rules! benchmark_engines {
($ell:expr, $rng:expr, $group:expr, $internal_fn:expr, $( ($engine:ty, $eval_engine:ty, $engine_name:expr) ),*) => {
$(
let mut assets = deterministic_assets::<$engine, $eval_engine, StdRng>($ell, &mut $rng);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you populate the eval_engine, type inference should populate the first parameter. IOW, this should work and remove the need for the engine param:
deterministic_assets::<_, $eval_engine, StdRng>($ell, &mut $rng);


/// Generates a random polynomial and point from a seed to test a proving/verifying flow of one
/// of our EvaluationEngine over a given Engine.
pub(crate) fn prove_verify_from_ell<E: Engine, EE: EvaluationEngineTrait<E>>(ell: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks great, though I'm not sure about its name: ell isn't a standard nomenclature.
Here are things that would help me grok what this function is for out of context, and that you could use to edit the comment, the name of this function, or more likely both :

  • make it clear we're testing the prove and verify flow of Multilinear Polynomial Commitment Schemes (PCS),
  • which in Nova are represented by EvaluationEngineTrait implementations for proving,
  • replacing ell by the more evocative num_var, which becomes clear once the first bullet point has been stated clearly,
  • avoid the mention of an Engine, which is implied and selected by a choice of EvaluationEngineTrait implementation,

},
)
.unzip();
.zip_eq(offsets_of_x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that cargo fmt didn't complain with the original indentation!?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes pretty strange, I kept the update as it seemed ok

src/spartan/polys/multilinear.rs Show resolved Hide resolved
Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue that only pops off the page now: at large values of ell, the generation of a proof is expensive, as you've no doubt noticed. Nonetheless, the output is completely deterministic.

One bad thing that's obscured by the benchmark_engines macro is that you're generating assets for a specific size (and EE choice), benching the proving, then re-generating those assets for the verification, and finally benching the verification. The assets could and should be shared between proving and verification I think.

benches/pcs.rs Outdated

criterion_main!(pcs);

const TEST_ELL: [usize; 11] = [10, 11, 12, 23, 14, 15, 16, 17, 18, 19, 20];
Copy link
Member

@huitseeker huitseeker Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need all points in the [10-20] interval. [10, 12, 14 ,16, 18, 20] seem amply sufficient to me.
/cc @adr1anh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the protocols we are benchmarking are linear in ell, I think it's fair to sample fewer iterations as we should be able to interpolate the results to the odd sizes.

benches/pcs.rs Outdated

criterion_main!(pcs);

const TEST_ELL: [usize; 11] = [10, 11, 12, 23, 14, 15, 16, 17, 18, 19, 20];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the protocols we are benchmarking are linear in ell, I think it's fair to sample fewer iterations as we should be able to interpolate the results to the odd sizes.

Comment on lines +75 to +119
if evaluate_bad_proof {
// Generate another point to verify proof. Also produce eval.
let altered_verifier_point = point
.iter()
.map(|s| s.add(<E as Engine>::Scalar::ONE))
.collect::<Vec<_>>();
let altered_verifier_eval =
MultilinearPolynomial::evaluate_with(poly.evaluations(), &altered_verifier_point);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of running the verification twice, you can do something like

let point = if evaluate_bad_proof {
    altered_verifier_point
} else {
  point
}
// Verify proof, should fail.
let mut verifier_transcript = E::TE::new(b"TestEval");
let res = EE::verify(
        &verifier_key,
        &mut verifier_transcript,
        commitment,
        &altered_verifier_point,
        &altered_verifier_eval,
        &proof,
);
assert_eq!(res.is_err(), evaluate_bad_proof); 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, however this would also change how the method behave. Currently if evaluate_bad_proof is set to true it allows the method to test both a valid and a non valid proof. While your proposal makes it that I only test one of the two cases. The downside is that I would then have to call this helper method twice for unit tests, doubling the number of prove method call instead.

Even after this comment, if you still think I should go with your proposal I'll be happy to do it :)

Copy link
Member

@storojs72 storojs72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Obviously, it is hard to make some statements about performance without running benchmarks on variety of ELLs but I would may be consider specifying the ELL as a single digit via a command-line which can be useful while running single instance of the benchmark on developer's machine in order to get very rough information whether performance has been increased / decreased / left unchanged after some PR merged.

benches/pcs.rs Outdated
{
let mut proving_group = c.benchmark_group("PCS-Proving");
proving_group
.sampling_mode(SamplingMode::Flat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, Flat is only suitable to use for very long-running benchmarks. Probably Auto can be more appropriate for the short-running ones, where ell is say 10 / 11 / 12. Does it make sense to split the benchmarks into short / long running categories?

Copy link
Member

@huitseeker huitseeker Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny enough: in general the Flat sampling mode is fine for most cases ... except I looked at the sample distribution for this PR, and the standard deviation for ZM verification at ell = 10 doesn't look great. You're probably right @storojs72, good catch!

Though I would note one very simple way of addressing this is noting the bench runtimes, which are reasonable (<20s per ell setting, see the bench log on my M1 Mac at https://gist.github.com/huitseeker/015f07534f410e94d615684ecd86876f) re-run them without setting the sampling mode (defaulting to Auto), and see if the runtimes are still reasonable.

Screenshot 2024-01-04 at 10 03 38 AM Screenshot 2024-01-04 at 10 04 06 AM Screenshot 2024-01-04 at 10 04 22 AM Screenshot 2024-01-04 at 10 04 30 AM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have removed all odds sizes we are only left with 10 and 12 from your example. I'm not sure that there is a need to split the code for 2 cases.

I would either make it Auto for all (might not be great for higher-end number of variables) or Flat (losing a bit of iterations on smaller number of variables but we should still get enough to have convincing results).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm explicitly suggesting Auto for all, and then revisiting the question if the overall bench runtime is larger than say 15 min.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why I couldn't see your comment when I put mine up. I'll update to Auto

benches/pcs.rs Outdated
)
.unwrap();

BenchAssests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BenchAssets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assets used during our benchmark that are shared between the proving and verification steps. Useful to avoid having lengthy methods signatures.

@huitseeker
Copy link
Member

Obviously, it is hard to make some statements about performance without running benchmarks on variety of ELLs but I would may be consider specifying the ELL as a single digit via a command-line which can be useful while running single instance of the benchmark on developer's machine in order to get very rough information whether performance has been increased / decreased / left unchanged after some PR merged.

A way to do this is to have the ELL array (currently a constant) be overridden by an environment variable. We can probably open an issue and tackle this after merging this PR.

test(ipa): proof verified

test(ipa): switch from keccak to grumpkin transcript

test(ipa): wip test IPA prove & verify

chore: added Jetbrain config to gitignore
- Implemented test_fail_bad_proof that generates a test case from a seed to try and verify a non valid proof.
- Implement both test_fail_bad_proof & test_from_seed for IPA over Grumpkin, MLKZG over Bn256 and ZM over Bn256
- Created a benchmark for proof generation and verification for IPA over Grumpkin, MLKZG over Bn and ZM over Bn
- multilinear.rs to public for benchmark purposes
- Refactored generic test methods per @adr1anh comment
Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-run the benches on the same machine and the end-to-end runtime stands at 00:16:30, which I deem acceptable.
This looks great @tchataigner, thank you so much!

@tchataigner tchataigner added this pull request to the merge queue Jan 5, 2024
Merged via the queue into lurk-lab:dev with commit ba29a22 Jan 5, 2024
7 checks passed
@tchataigner tchataigner deleted the feature/test-ipa branch January 5, 2024 13:32
huitseeker added a commit to huitseeker/arecibo that referenced this pull request Jan 30, 2024
See lurk-lab#222, the commit_open_prove_verify test is subsumed by the subsequent unit test.
@tchataigner was right from the start.
huitseeker added a commit to huitseeker/arecibo that referenced this pull request Jan 30, 2024
See lurk-lab#222, the commit_open_prove_verify test is subsumed by the subsequent unit test.
@tchataigner was right from the start.
huitseeker added a commit to huitseeker/arecibo that referenced this pull request Feb 1, 2024
See lurk-lab#222, the commit_open_prove_verify test is subsumed by the subsequent unit test.
@tchataigner was right from the start.
huitseeker added a commit to huitseeker/arecibo that referenced this pull request Feb 1, 2024
See lurk-lab#222, the commit_open_prove_verify test is subsumed by the subsequent unit test.
@tchataigner was right from the start.
github-merge-queue bot pushed a commit that referenced this pull request Feb 2, 2024
See #222, the commit_open_prove_verify test is subsumed by the subsequent unit test.
@tchataigner was right from the start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants