Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve serialization for prime fields #85

Merged

Conversation

huitseeker
Copy link
Contributor

@huitseeker huitseeker commented Aug 31, 2023

Summary

256-bit field serialization is currently 4x u64, ie. the native format which isn't common
This implements the standard byte-serialization for those fields (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of those bytes for (de)serializers that are human-readable (e.g. json) and hence typically not raw byte-friendly.

Hex is a straightforward, common and very simple choice for string formats that's harder to mess up than Base64.

Details

  • Added a new macro serialize_deserialize_32_byte_primefield! for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and secp256k1 modules.
  • Implemented the new macro for serialization and deserialization in various structs, replacing the previous serde::{Deserialize, Serialize} direct derive use.
  • Enhanced error checking in the custom serialization methods to ensure valid field elements.
  • Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests

See also

#88

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of
that for (de)serializers that are human-readable (concretely, json).

- Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries.
- Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use.
- Enhanced error checking in the custom serialization methods to ensure valid field elements.
- Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.
Copy link
Contributor

@han0110 han0110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/derive/field.rs Outdated Show resolved Hide resolved
@huitseeker huitseeker force-pushed the field_serialization branch 3 times, most recently from b7c1690 to 7dfe10f Compare September 4, 2023 17:50
@mratsim
Copy link
Contributor

mratsim commented Sep 5, 2023

For fields, we should point to the I2OSP and OSP2I spec as reference: https://www.rfc-editor.org/rfc/rfc8017.html#section-4

This serializing bytes by bytes, most significant byte first. (i.e. bigEndian serialization)

@han0110
Copy link
Contributor

han0110 commented Sep 5, 2023

This serializing bytes by bytes, most significant byte first. (i.e. bigEndian serialization)

I think this makes more sense considering it's for human reading, didn't realize that the output will be in little-endian. Tho pasta_curves has it in little-endian.

@huitseeker
Copy link
Contributor Author

huitseeker commented Sep 5, 2023

RFC 8017 compliance is spotty at best, but more importantly it really is the domain of the field definition, which in the trait Primefield owns the methods (to_repr, from_repr). This PR deals with the more narrow scope of simply reusing this trait, which defines the binary representation, for serialization.
Perhaps remarks as to RFC conformance should be placed in the Primefield implementation?

@mratsim
Copy link
Contributor

mratsim commented Sep 5, 2023

Looking at upstream: https://github.com/zkcrypto/ff/blob/e853770/ff_derive/src/lib.rs#L51-L103 they actually don't look like their purpose is serialization but more having a way to dump the physical representation (and not the logical one). Maybe @str4d can confirm.

Looking at zkcrypto/bls12_381, for serialization they use to_bytes:

@huitseeker
Copy link
Contributor Author

huitseeker commented Sep 6, 2023

Looking at zkcrypto/bls12_381, the to_bytes of the Scalar field is in lower endian:
https://github.com/zkcrypto/bls12_381/blob/7de7b9d9c509b9973b35a3241b74bbbea95e700a/src/scalar.rs#L282-L296
The correct link for Fp's definition of to_bytes is this:
https://github.com/zkcrypto/bls12_381/blob/7de7b9d9c509b9973b35a3241b74bbbea95e700a/src/fp.rs#L211-L227
and you're right, it's in big endian. That to_bytes method is not in any trait, incidentally, and seems specific to serialization for BLS12-381. But note that to_repr and from_repr proxy those.

The point to me seems to be: to_repr, from_repr encode a specific choice of conversion of prime field elements to a byte string.

You mentioned the ff-derive macros, I would note that this includes:

  • a modular reduction,
  • requiring the user to specify one compile-time choice of endianness, using the macro attribute PrimeFieldReprEndianness. There is no runtime optionality re: endianness.

In that context, I'm not sure what you mean by "dumping the physical representation".

But the footprint of these derive macros is rather sparse, so perhaps taking a tour through some landmark implementations would help 1:

  • pasta : canonical lower-endian,
  • rustcrypto canonical big-endian, and the linked PR documents why they removed endianness optionality (their to_repr implementations point to the to_bytes I link to)
  • curve25519-dalek canonical lower-endian, and their to_repr implementation points to the to_bytes I link to,
  • JubJub canonical lower-endian, and their to_repr implementation points to the to_bytes I link to,

All of this to say: there is a plurality of choices regarding endianness in {to,from}_repr, that's correct. Due to the orphan rule, those choices can only be made by the implementer of the curve, without overrides. Using to_repr in serialization (resp. from_repr in deserialization) offers a way to defer to the curve implementer's choice in that regard, which guarantees compatibility with those implementations, if not, admittedly, RFC 8017 compliance.

If we want to revisit the endianness choices for the curves defined in this repo, to e.g. align them with RFC 8017, I'm all for giving this consideration. But I think that falls beyond the scope of this PR, which is about making sure we don't "dump the physical representation" (as you say), and use a defined byte representation, as specified by the traits we both implement and use.

Footnotes

  1. by canonical I mean the reduced form of the field element

@mratsim
Copy link
Contributor

mratsim commented Sep 6, 2023

To be clear, I'm OK with the PR as is. I'll open a RFC to discuss the standard.

Copy link
Member

@CPerezz CPerezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would appreciate if I can get a reply first to my comment from @huitseeker .

Also, two things to add.

  • I did not know about is_human_readable and it's super cool!
  • Endianness-wise, I'm a big supporter of LE. I kinda agree that the concern with the spec should be probably lifted to ff/group first. And then resolve it here. While I understand that the spec is important, it's also important to consider that as @huitseeker all the traits & libs use LE almost everywhere.

I belive we can leave an issue for that maybe? And discuss it back during the next days?

Comment on lines +699 to +703
if serializer.is_human_readable() {
hex::serde::serialize(bytes, serializer)
} else {
bytes.serialize(serializer)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this if will even make performance worse when it comes to serialization.

On another hand, this never has anything to do with raw serialization (which is usually the used for performance).
Do I understand correctly that we're using serde then as "debugging serialization"??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serializers usually just implement is_human_readable as a constant, so this if is as cheap as can be. Examples:

The semantics of is_human_readable, as the doc indicates, is not debugging. It's simply articulating the difference between formats that are meant to be human-readable (JSON, YAML), and those that are not (bincode, CBOR, RLP, ...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serializers usually just implement is_human_readable as a constant, so this if is as cheap as can be.

Fair point.

@CPerezz CPerezz added this pull request to the merge queue Sep 18, 2023
Merged via the queue into privacy-scaling-explorations:main with commit 8e3a33a Sep 18, 2023
7 checks passed
jonathanpwang added a commit to axiom-crypto/halo2curves that referenced this pull request Sep 23, 2023
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80)

* feat: add field conversion to/from `[u64;4]`

* Added conversion tests
* Added `montgomery_reduce_short` for no-asm
* For bn256, uses assembly conversion when asm feature is on

* fix: remove conflict for asm

* chore: bump rust-toolchain to 1.67.0

* Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77)

* Add `Legendre` trait and macro

 - Add Legendre macro with norm and legendre symbol computation
 - Add macro for automatic implementation in prime fields

* Add legendre macro call for prime fields

* Remove unused imports

* Remove leftover

* Add `is_quadratic_non_residue` for hash_to_curve

* Add `legendre` function

* Compute modulus separately

* Substitute division for shift

* Update modulus computation

* Add quadratic residue check func

* Add quadratic residue tests

* Add hash_to_curve bench

* Implement Legendre trait for all curves

* Move misplaced comment

* Add all curves to hash bench

* fix: add suggestion for legendre_exp

* fix: imports after rebase

* Add simplified SWU method (privacy-scaling-explorations#81)

* Fix broken link

* Add simple SWU algorithm

* Add simplified SWU hash_to_curve for secp256r1

* add: sswu z reference

* update MAP_ID identifier

Co-authored-by: Han <tinghan0110@gmail.com>

---------

Co-authored-by: Han <tinghan0110@gmail.com>

* Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82)

* refactor: bring back curve algorithms for `a = 0`

* fix: clippy warning

* fix: Improve serialization for prime fields (privacy-scaling-explorations#85)

* fix: Improve serialization for prime fields

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of
that for (de)serializers that are human-readable (concretely, json).

- Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries.
- Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use.
- Enhanced error checking in the custom serialization methods to ensure valid field elements.
- Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.

* fixup! fix: Improve serialization for prime fields

---------

Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com>

* refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88)

* refactor: implement (De)Serialization of points using the `GroupEncoding` trait

- Updated curve point (de)serialization logic from the internal representation to the
  representation offered by the implementation of the `GroupEncoding` trait.

* fix: add explicit json serde tests

* Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86)

* Insert MSM and FFT code and their benchmarks.

Resolves taikoxyz/zkevm-circuits#150.

* feedback

* Add instructions

* feeback

* Implement feedback:  Actually supply the correct arguments to `best_multiexp`.

Split into `singlecore` and `multicore` benchmarks so Criterion's result
caching and comparison over multiple runs makes sense.

Rewrite point and scalar generation.

* Use slicing and parallelism to to decrease running time.

Laptop measurements:
k=22: 109 sec
k=16:   1 sec

* Refactor msm

* Refactor fft

* Update module comments

* Fix formatting

* Implement suggestion for fixing CI

---------

Co-authored-by: David Nevado <davidnevadoc@users.noreply.github.com>
Co-authored-by: Han <tinghan0110@gmail.com>
Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com>
Co-authored-by: einar-taiko <126954546+einar-taiko@users.noreply.github.com>
@huitseeker huitseeker mentioned this pull request Oct 23, 2023
jonathanpwang added a commit to axiom-crypto/halo2curves that referenced this pull request Nov 13, 2023
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80)

* feat: add field conversion to/from `[u64;4]`

* Added conversion tests
* Added `montgomery_reduce_short` for no-asm
* For bn256, uses assembly conversion when asm feature is on

* fix: remove conflict for asm

* chore: bump rust-toolchain to 1.67.0

* Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77)

* Add `Legendre` trait and macro

 - Add Legendre macro with norm and legendre symbol computation
 - Add macro for automatic implementation in prime fields

* Add legendre macro call for prime fields

* Remove unused imports

* Remove leftover

* Add `is_quadratic_non_residue` for hash_to_curve

* Add `legendre` function

* Compute modulus separately

* Substitute division for shift

* Update modulus computation

* Add quadratic residue check func

* Add quadratic residue tests

* Add hash_to_curve bench

* Implement Legendre trait for all curves

* Move misplaced comment

* Add all curves to hash bench

* fix: add suggestion for legendre_exp

* fix: imports after rebase

* Add simplified SWU method (privacy-scaling-explorations#81)

* Fix broken link

* Add simple SWU algorithm

* Add simplified SWU hash_to_curve for secp256r1

* add: sswu z reference

* update MAP_ID identifier

Co-authored-by: Han <tinghan0110@gmail.com>

---------

Co-authored-by: Han <tinghan0110@gmail.com>

* Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82)

* refactor: bring back curve algorithms for `a = 0`

* fix: clippy warning

* fix: Improve serialization for prime fields (privacy-scaling-explorations#85)

* fix: Improve serialization for prime fields

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of
that for (de)serializers that are human-readable (concretely, json).

- Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries.
- Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use.
- Enhanced error checking in the custom serialization methods to ensure valid field elements.
- Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.

* fixup! fix: Improve serialization for prime fields

---------

Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com>

* refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88)

* refactor: implement (De)Serialization of points using the `GroupEncoding` trait

- Updated curve point (de)serialization logic from the internal representation to the
  representation offered by the implementation of the `GroupEncoding` trait.

* fix: add explicit json serde tests

* Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86)

* Insert MSM and FFT code and their benchmarks.

Resolves taikoxyz/zkevm-circuits#150.

* feedback

* Add instructions

* feeback

* Implement feedback:  Actually supply the correct arguments to `best_multiexp`.

Split into `singlecore` and `multicore` benchmarks so Criterion's result
caching and comparison over multiple runs makes sense.

Rewrite point and scalar generation.

* Use slicing and parallelism to to decrease running time.

Laptop measurements:
k=22: 109 sec
k=16:   1 sec

* Refactor msm

* Refactor fft

* Update module comments

* Fix formatting

* Implement suggestion for fixing CI

* Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93)

fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export

* fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94)

fix regression in privacy-scaling-explorations#93, field benches aren't run

* Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83)

* Bernstein yang modular multiplicative inverter (#2)

* rename similar to privacy-scaling-explorations#95

---------

Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com>

* Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95)

* Derivatives of the Pornin's method (taikoxyz#3)

* renaming file

* make cargo fmt happy

* clarifications from privacy-scaling-explorations#95 (comment) [skip ci]

* Formatting and slightly changing a comment

---------

Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com>

* chore: delete bernsteinyang module (replaced by ff_inverse)

* Bump version to 0.4.1

---------

Co-authored-by: David Nevado <davidnevadoc@users.noreply.github.com>
Co-authored-by: Han <tinghan0110@gmail.com>
Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com>
Co-authored-by: einar-taiko <126954546+einar-taiko@users.noreply.github.com>
Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co>
Co-authored-by: Aleksei Vambol <77882392+AlekseiVambol@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants