Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dag-cbor,dag-json: relax strictness rules for decoders #214

Merged
merged 2 commits into from
May 13, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 10, 2022

Fixes: #196

Current text reflects an idealistic perspective, not a pragmatic one that
accounts for historical data and handling data from encoders that don't quite
follow the rules. Nor do the current statements requiring strictness reflect
the reality of our current decoders, which are lose and don't even yet have
full opt-in strictness modes; even though we note the lack of strictness in
the implementation notes.

@rvagg rvagg requested review from vmx, aschmahmann and a team May 10, 2022 11:15
Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy about having this as a must on encode and should on decode.

@rvagg
Copy link
Member Author

rvagg commented May 10, 2022

I'm happy about having this as a must on encode and should on decode.

so I was just writing a response to this and getting references to existing code when I noticed that I actually do some decode strictness in JS for dag-cbor:

> dagCBOR.decode(Buffer.from('190064', 'hex'))
Uncaught:
Error: CBOR decode error: integer encoded in more bytes than necessary (strict decode)

When I wrote the new codec stack for dag-cbor I started building in a decode strictness option but only did the easiest bits, with the hopes of getting to the rest later (I still hope to); and I enabled that strictness all the way up to the decode() people use. It's only doing checks on int sizes, but that also applies to lengths and tags. And this has been the dominant JS decoder for nearly 2 years now without anyone reporting problems with that, which is nice.

So the non-strict pieces for the JS codec is:

  • Map key ordering: map entries may be accepted in any order
  • Integer encoding lengths need not be as short as possible
  • Length descriptors of major types 2 through 5 need not be as short as possible
  • The expression of tag 42 need not be as short as possible (0xd82a)
  • Floating point values may be represented as single and half precision

Floating point values are rarely used and we recommend against them; but map key ordering is something we can't afford to break, cbor-gen does it wrong, the Filecoin genesis block has it wrong, 🤷.

So I've pushed another commit that adjusts things somewhat. The decode strictness section now starts with:

Due to the existence and active use of historical data, and the existence and active use of non-conforming encoders, DAG-CBOR decoders may relax strictness requirements by default. A strictness opt-in may be offered for systems where round-trip determinism is a desirable feature and backward compatibility with old, non-strict data is unnecessary.

And I've also added cbor-gen to the implementations list while I'm in there, which lead me to add a recommendation for what to use for both JS and Go.. I'm not comfortable recommending cbor-gen while whyrusleeping/cbor-gen#56 remains unmerged.

@@ -68,12 +68,12 @@ Therefore the DAG-CBOR codec must:

### Decode strictness

DAG-CBOR decoders should not enforce all of the above strictness requirements by default, but may provide an opt-in for systems where round-trip determinism is a desireable feature and backward compatibility with old, non-strict data is unnecessary.
Due to the existence and active use of historical data, and the existence and active use of non-conforming encoders, DAG-CBOR decoders may relax strictness requirements by default. A strictness opt-in may be offered for systems where round-trip determinism is a desirable feature and backward compatibility with old, non-strict data is unnecessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this wording even more! I was close commenting that I would prefer a "may" over a "should", but then didn't want to block this PR.

Copy link
Contributor

@RangerMauve RangerMauve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The wording and explanation is clear and concise.

rvagg added 2 commits May 13, 2022 18:04
Fixes: #196

Current text reflects an idealistic perspective, not a pragmatic one that
accounts for historical data and handling data from encoders that don't quite
follow the rules. Nor do the current statements requiring strictness reflect
the reality of our current decoders, which are lose and don't even yet have
full opt-in strictness modes; even though we note the lack of strictness in
the implementation notes.
@rvagg rvagg merged commit 756520d into master May 13, 2022
@rvagg rvagg deleted the rvagg/strictness-nope branch May 13, 2022 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Codec specs should clarify that decoders shouldn't require strictness
4 participants