fix: Improve serialization for prime fields #85

huitseeker · 2023-08-31T17:49:30Z

Summary

256-bit field serialization is currently 4x u64, ie. the native format which isn't common
This implements the standard byte-serialization for those fields (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of those bytes for (de)serializers that are human-readable (e.g. json) and hence typically not raw byte-friendly.

Hex is a straightforward, common and very simple choice for string formats that's harder to mess up than Base64.

Details

Added a new macro serialize_deserialize_32_byte_primefield! for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and secp256k1 modules.
Implemented the new macro for serialization and deserialization in various structs, replacing the previous serde::{Deserialize, Serialize} direct derive use.
Enhanced error checking in the custom serialization methods to ensure valid field elements.
Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.

han0110

LGTM!

src/derive/field.rs

mratsim · 2023-09-05T07:31:23Z

For fields, we should point to the I2OSP and OSP2I spec as reference: https://www.rfc-editor.org/rfc/rfc8017.html#section-4

This serializing bytes by bytes, most significant byte first. (i.e. bigEndian serialization)

han0110 · 2023-09-05T08:20:17Z

This serializing bytes by bytes, most significant byte first. (i.e. bigEndian serialization)

I think this makes more sense considering it's for human reading, didn't realize that the output will be in little-endian. Tho pasta_curves has it in little-endian.

huitseeker · 2023-09-05T17:24:38Z

RFC 8017 compliance is spotty at best, but more importantly it really is the domain of the field definition, which in the trait Primefield owns the methods (to_repr, from_repr). This PR deals with the more narrow scope of simply reusing this trait, which defines the binary representation, for serialization.
Perhaps remarks as to RFC conformance should be placed in the Primefield implementation?

mratsim · 2023-09-05T19:55:30Z

Looking at upstream: https://github.com/zkcrypto/ff/blob/e853770/ff_derive/src/lib.rs#L51-L103 they actually don't look like their purpose is serialization but more having a way to dump the physical representation (and not the logical one). Maybe @str4d can confirm.

Looking at zkcrypto/bls12_381, for serialization they use to_bytes:

https://github.com/zkcrypto/bls12_381/blob/7de7b9d/src/g1.rs#L219-L224
and it is big-endian as specified in https://github.com/zkcrypto/bls12_381/blob/7de7b9d/src/notes/serialization.rs

huitseeker · 2023-09-06T00:53:17Z

Looking at zkcrypto/bls12_381, the to_bytes of the Scalar field is in lower endian:
https://github.com/zkcrypto/bls12_381/blob/7de7b9d9c509b9973b35a3241b74bbbea95e700a/src/scalar.rs#L282-L296
The correct link for Fp's definition of to_bytes is this:
https://github.com/zkcrypto/bls12_381/blob/7de7b9d9c509b9973b35a3241b74bbbea95e700a/src/fp.rs#L211-L227
and you're right, it's in big endian. That to_bytes method is not in any trait, incidentally, and seems specific to serialization for BLS12-381. But note that to_repr and from_repr proxy those.

The point to me seems to be: to_repr, from_repr encode a specific choice of conversion of prime field elements to a byte string.

You mentioned the ff-derive macros, I would note that this includes:

a modular reduction,
requiring the user to specify one compile-time choice of endianness, using the macro attribute PrimeFieldReprEndianness. There is no runtime optionality re: endianness.

In that context, I'm not sure what you mean by "dumping the physical representation".

But the footprint of these derive macros is rather sparse, so perhaps taking a tour through some landmark implementations would help ¹:

pasta : canonical lower-endian,
rustcrypto canonical big-endian, and the linked PR documents why they removed endianness optionality (their to_repr implementations point to the to_bytes I link to)
curve25519-dalek canonical lower-endian, and their to_repr implementation points to the to_bytes I link to,
JubJub canonical lower-endian, and their to_repr implementation points to the to_bytes I link to,

All of this to say: there is a plurality of choices regarding endianness in {to,from}_repr, that's correct. Due to the orphan rule, those choices can only be made by the implementer of the curve, without overrides. Using to_repr in serialization (resp. from_repr in deserialization) offers a way to defer to the curve implementer's choice in that regard, which guarantees compatibility with those implementations, if not, admittedly, RFC 8017 compliance.

If we want to revisit the endianness choices for the curves defined in this repo, to e.g. align them with RFC 8017, I'm all for giving this consideration. But I think that falls beyond the scope of this PR, which is about making sure we don't "dump the physical representation" (as you say), and use a defined byte representation, as specified by the traits we both implement and use.

by canonical I mean the reduced form of the field element ↩

mratsim · 2023-09-06T07:51:29Z

To be clear, I'm OK with the PR as is. I'll open a RFC to discuss the standard.

CPerezz

LGTM! Would appreciate if I can get a reply first to my comment from @huitseeker .

Also, two things to add.

I did not know about is_human_readable and it's super cool!
Endianness-wise, I'm a big supporter of LE. I kinda agree that the concern with the spec should be probably lifted to ff/group first. And then resolve it here. While I understand that the spec is important, it's also important to consider that as @huitseeker all the traits & libs use LE almost everywhere.

I belive we can leave an issue for that maybe? And discuss it back during the next days?

CPerezz · 2023-09-18T07:06:02Z

src/derive/field.rs

+                if serializer.is_human_readable() {
+                    hex::serde::serialize(bytes, serializer)
+                } else {
+                    bytes.serialize(serializer)
+                }


I'm afraid this if will even make performance worse when it comes to serialization.

On another hand, this never has anything to do with raw serialization (which is usually the used for performance).
Do I understand correctly that we're using serde then as "debugging serialization"??

Serializers usually just implement is_human_readable as a constant, so this if is as cheap as can be. Examples:

the default, inlined and often not overridden

bincode

cborium

The semantics of is_human_readable, as the doc indicates, is not debugging. It's simply articulating the difference between formats that are meant to be human-readable (JSON, YAML), and those that are not (bincode, CBOR, RLP, ...).

Serializers usually just implement is_human_readable as a constant, so this if is as cheap as can be.

Fair point.

* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <tinghan0110@gmail.com> --------- Co-authored-by: Han <tinghan0110@gmail.com> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI --------- Co-authored-by: David Nevado <davidnevadoc@users.noreply.github.com> Co-authored-by: Han <tinghan0110@gmail.com> Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com> Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: einar-taiko <126954546+einar-taiko@users.noreply.github.com>

* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <tinghan0110@gmail.com> --------- Co-authored-by: Han <tinghan0110@gmail.com> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI * Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93) fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export * fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94) fix regression in privacy-scaling-explorations#93, field benches aren't run * Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com> * Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com> * chore: delete bernsteinyang module (replaced by ff_inverse) * Bump version to 0.4.1 --------- Co-authored-by: David Nevado <davidnevadoc@users.noreply.github.com> Co-authored-by: Han <tinghan0110@gmail.com> Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com> Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: einar-taiko <126954546+einar-taiko@users.noreply.github.com> Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co> Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com>

huitseeker force-pushed the field_serialization branch from 3ec8689 to 4217a8a Compare August 31, 2023 17:50

huitseeker mentioned this pull request Sep 1, 2023

Use common serialisation format for Pasta and Grumping curves lurk-lab/arecibo#20

Closed

han0110 self-requested a review September 1, 2023 10:59

han0110 approved these changes Sep 2, 2023

View reviewed changes

src/derive/field.rs Outdated Show resolved Hide resolved

huitseeker force-pushed the field_serialization branch 3 times, most recently from b7c1690 to 7dfe10f Compare September 4, 2023 17:50

fixup! fix: Improve serialization for prime fields

f018c0c

huitseeker force-pushed the field_serialization branch from 7dfe10f to f018c0c Compare September 4, 2023 17:53

mratsim mentioned this pull request Sep 8, 2023

Insert MSM and FFT code and their benchmarks. #86

Merged

huitseeker mentioned this pull request Sep 14, 2023

refactor: (De)Serialization of points using GroupEncoding #88

Merged

han0110 requested a review from CPerezz September 18, 2023 03:23

CPerezz approved these changes Sep 18, 2023

View reviewed changes

Merge branch 'main' into field_serialization

f3a70f5

CPerezz added this pull request to the merge queue Sep 18, 2023

Merged via the queue into privacy-scaling-explorations:main with commit 8e3a33a Sep 18, 2023
7 checks passed

huitseeker mentioned this pull request Oct 23, 2023

New crate release #96

Closed

CPerezz mentioned this pull request Dec 18, 2023

chore: Bump to 0.5.0 for release #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Improve serialization for prime fields #85

fix: Improve serialization for prime fields #85

huitseeker commented Aug 31, 2023 •

edited

Loading

han0110 left a comment

mratsim commented Sep 5, 2023

han0110 commented Sep 5, 2023

huitseeker commented Sep 5, 2023 •

edited

Loading

mratsim commented Sep 5, 2023

huitseeker commented Sep 6, 2023 •

edited

Loading

mratsim commented Sep 6, 2023

CPerezz left a comment

CPerezz Sep 18, 2023

huitseeker Sep 18, 2023

CPerezz Sep 18, 2023

fix: Improve serialization for prime fields #85

fix: Improve serialization for prime fields #85

Conversation

huitseeker commented Aug 31, 2023 • edited Loading

Summary

Details

See also

han0110 left a comment

Choose a reason for hiding this comment

mratsim commented Sep 5, 2023

han0110 commented Sep 5, 2023

huitseeker commented Sep 5, 2023 • edited Loading

mratsim commented Sep 5, 2023

huitseeker commented Sep 6, 2023 • edited Loading

Footnotes

mratsim commented Sep 6, 2023

CPerezz left a comment

Choose a reason for hiding this comment

CPerezz Sep 18, 2023

Choose a reason for hiding this comment

huitseeker Sep 18, 2023

Choose a reason for hiding this comment

CPerezz Sep 18, 2023

Choose a reason for hiding this comment

huitseeker commented Aug 31, 2023 •

edited

Loading

huitseeker commented Sep 5, 2023 •

edited

Loading

huitseeker commented Sep 6, 2023 •

edited

Loading