proposal: serialization format for BLS12-381 points #16

kwantam · 2019-05-26T23:21:30Z

Here's what zcash uses for BLS12-381 point serialization (see the issue where this was decided).

Reproducing here:

Fq elements are encoded in big-endian form. They occupy 48 bytes in this form.

Fq2 elements are encoded in big-endian form, meaning that the Fq element c0 + c1 * u is represented by the Fq element c1 followed by the Fq element c0. This means Fq2 elements occupy 96 bytes in this form.

The group G1 uses Fq elements for coordinates. The group G2 uses Fq2 elements for coordinates.

G1 and G2 elements can be encoded in uncompressed form (the x-coordinate followed by the y-coordinate) or in compressed form (just the x-coordinate). G1 elements occupy 96 bytes in uncompressed form, and 48 bytes in compressed form. G2 elements occupy 192 bytes in uncompressed form, and 96 bytes in compressed form.

The most-significant three bits of a G1 or G2 encoding should be masked away before the coordinate(s) are interpreted. These bits are used to unambiguously represent the underlying element:

The most significant bit, when set, indicates that the point is in compressed form. Otherwise, the point is in uncompressed form.

The second-most significant bit indicates that the point is at infinity. If this bit is set, the remaining bits of the group element's encoding should be set to zero.

The third-most significant bit is set if (and only if) this point is in compressed form and it is not the point at infinity and its y-coordinate is the lexicographically largest of the two associated with the encoded x-coordinate.

The above works, but suffers from a potential issue: it does not differentiate between points on G1 and points on G2.

Admittedly, this might not matter, because it may be that the group is unambiguously determined by the context. For example, if the ciphersuite byte is part of a serialized signature, that suffices to indicate whether the expected point is in G1 or G2.

On the other hand, encoding the group in the serialization format may be more convenient, because it means that the encoding unambiguously indicates the size in bytes of the serialized point. In contrast, the ZCash approach relies on either knowing the group a priori or using the size of the encoding to distinguish between points on G1 and G2. This means that messages encoding points may in some cases require a higher-level preamble or header indicating message length.

Conveniently, it turns out that there are enough spare bits in the above encoding to differentiate between points on G1 and points on G2 in a way that allows the first byte of an encoded point to fully specify the length of the encoding. The reason is that elements of G2 are encoded as c1 || c0 (per the above) with the 3 header bits squirreled away in the 3 MSBs of the encoding of c1; and this leaves 3 MSBs of c0 with which to encode more information.

Here's my concrete proposal.

3 MSBs of byte 0	ZCash semantics	Proposed semantics
000	uncompressed point	same, G1 only
001	invalid	invalid (but see below)
010	uncompressed point at infinity	same, G1 only (but see below)
011	invalid	uncompressed point on G2
100	compressed point, +y	same, G1 only
101	compressed point, -y	same, G1 only
110	compressed point at infinity	same, G1 only
111	invalid	compressed point on G2

When the 3 MSBs are 011 or 111, this is a point on G2, so its serialized length is 192 or 96 bytes, respectively. Otherwise, 0xx is 96 bytes and 1xx is 48 bytes. Thus, the first byte of the serialized point lets you know exactly how many more bytes are to come.

When the point is on G2, the most significant bit of byte 48 should be 1, and the remaining two bits are determined by the table below. Note that compressed/uncompressed is determined in byte 0, so it is not redundantly encoded here.

3 MSBs of byte 48	Proposed semantics for G2 point
000	invalid
001	invalid
010	invalid
011	invalid
100	compressed point on G2, +y; or uncompressed point
101	compressed point on G2, -y
110	point at infinity (but see below)
111	invalid

The 3 MSBs of the encoding of all other residues mod p in any encoded point must be 000. This means that byte 48 of uncompressed points on G1 and bytes 96 and 144 of uncompressed points on G2 must begin 000.

There's one thing about the above that is slightly annoying: why bother sending (at least) 47 null bytes when encoding a point at infinity? Not only does this waste space on the wire, it means (at least in the ZCash spec) that user code must check that those bytes are actually 0 (see above, "the remaining bits ... should be set to zero") and reject malformed points of this sort.

One argument in favor of keeping "long" encodings of the point at infinity is that it means that all encoded points are some multiple of 48 bytes, and thus user code doesn't have to read one byte, figure out how many more to read, and then read that many more. And arguably it's not worthwhile to make things complex just to shorten the encoding of a point that will only very rarely be encoded in the first place!

Perhaps it would be worthwhile to make 001 in the MSBits of byte 0 mean "point at infinity on G2, next 47 bytes must be null," and get rid of the other point at infinity representations for G2. In this case I'd also argue that the "uncompressed point at infinity on G1" encoding should be illegal. But I don't think these are crucial.

The text was updated successfully, but these errors were encountered:

kwantam · 2019-05-27T17:01:05Z

note: updated the above proposal to avoid redundancy in the encoding that could potentially require additional consistency checks

This impl follows the proposal pairingwg/bls_standard#16

kwantam · 2019-06-12T18:51:19Z

when deciding on serialization format, it occurs that we should also decide whether to require serialized points to be in the prime-order subgroup or not. This makes deserialization more expensive (need to check that the point has the correct order).

If strongly unforgeable signatures are required, then probably the signature has to be checked for subgroup membership (otherwise---I think---one can obtain a new signature by adding a point of small order to an existing signature). But I have to admit I don't remember whether this is true (I suppose checking is easy...), and I just plain don't know whether strong unforgeability is a requirement here.

Probably there's also some attack that involves adding a point of small order to an existing public key. But intuitively it seems like proof of possession should help. Again, I need to think more carefully about this.

In any case, all of this thinking should happen before deciding on whether or not to require serialized points to be in the prime-order subgroup :)

kwantam · 2019-06-23T22:01:21Z

Someone asked in the last meeting whether this can be made backwards compatible with the ZCash format, and the answer is approximately "yes."

The short-short-short version is, given a serialized point in the ZCash format, as long as its length is known a priori (which is already a requirement for deserializing such points, because that's the only way to tell a G1 from a G2 point), it's very simple to distinguish it from a point in the format described above, and thereby unambiguously decode it.

In more detail, here's how "compatible decoding" would work. Rows in the following table correspond to the 3 MSBs of the serialized point, and columns correspond to an optional length argument supplied to the deserialization procedure.

tag	length == `None`	length == 48	length == 96	length == 192
000	uncompressed G1	invalid	uncompressed G1	uncompressed G2
001	invalid	invalid	invalid	invalid
010	uncompressed G1 ∞	invalid	uncompressed G1 ∞	uncompressed G2 ∞
011	uncompressed G2	invalid	invalid	uncompressed G2
100	compressed G1, +y	compressed G1, +y	compressed G2, +y	invalid
101	compressed G1, -y	compressed G1, -y	compressed G2, -y	invalid
110	compressed G1 ∞	compressed G1 ∞	compressed G2 ∞	invalid
111	compressed G2	invalid	compressed G2	invalid

Decoding according to the above table simultaneously supports both the ZCash and the extended serialization format.

Note that it's not obvious to me that this should be done; the point is only to establish that it can be done :)

ebfull · 2019-07-23T19:20:11Z

My current position leans toward "don't change the serialization" mainly because not only has the existing format been deployed in Zcash, but it has also been deployed in protocols for public resource like Powers of Tau. I'm not sure it's worth it really.

kwantam · 2019-08-02T14:13:59Z

Probably we won't specify serialization formats in this draft, which essentially moots this proposal.

Beyond that, @hoeteck and @sergeynog point out (out-of-band) that the ciphersuite will be necessary when validating signatures, and knowing the ciphersuite is sufficient to disambiguate the serialized point (because csuite will encode which curve sigs live on). This means we can live with the low-level ambiguity.

So: closing this.

kwantam added a commit to kwantam/bls_sigs_ref-archive that referenced this issue May 27, 2019

sage: serialization / deserialization

20ce24c

This impl follows the proposal pairingwg/bls_standard#16

kwantam mentioned this issue Aug 2, 2019

string-to-curve should be injective (for valid inputs) #12

Closed

kwantam closed this as completed Aug 2, 2019

hwwhww mentioned this issue Dec 4, 2020

Proposal: BLS serialization tests ethereum/consensus-spec-tests#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: serialization format for BLS12-381 points #16

proposal: serialization format for BLS12-381 points #16

kwantam commented May 26, 2019 •

edited

Loading

kwantam commented May 27, 2019

kwantam commented Jun 12, 2019 •

edited

Loading

kwantam commented Jun 23, 2019 •

edited

Loading

ebfull commented Jul 23, 2019

kwantam commented Aug 2, 2019 •

edited

Loading

proposal: serialization format for BLS12-381 points #16

proposal: serialization format for BLS12-381 points #16

Comments

kwantam commented May 26, 2019 • edited Loading

kwantam commented May 27, 2019

kwantam commented Jun 12, 2019 • edited Loading

kwantam commented Jun 23, 2019 • edited Loading

ebfull commented Jul 23, 2019

kwantam commented Aug 2, 2019 • edited Loading

kwantam commented May 26, 2019 •

edited

Loading

kwantam commented Jun 12, 2019 •

edited

Loading

kwantam commented Jun 23, 2019 •

edited

Loading

kwantam commented Aug 2, 2019 •

edited

Loading