Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce maximum length #142

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Oct 17, 2023

Enforce maximum string length when encoding.

Fix: #140

@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 2 times, most recently from c41081c to 44e605e Compare October 17, 2023 04:08
@apoelstra
Copy link
Member

The maximum length needs to be a parameter of the checksum. 93 is only for bech32.

And IMO we should only enforce the maximum length of 93 for segwit ... for the non-segwit API we should use 1023 since that's the actual length of the code.

@tcharding
Copy link
Member Author

Patch separation for the win! I had a feeling there was going to be something to this that I was missing :) Thanks, will re-spin.

@tcharding
Copy link
Member Author

tcharding commented Oct 18, 2023

The maximum length needs to be a parameter of the checksum. 93 is only for bech32.

And IMO we should only enforce the maximum length of 93 for segwit ... for the non-segwit API we should use 1023 since that's the actual length of the code.

Why do you write 93 here? In BIP-173 it says 90, or am I missing something?

A Bech32[2] string is at most 90 characters long and consists of:

https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki

@apoelstra
Copy link
Member

Sorry, I was confusing 93 (which is 1023 / 11, a fact which is relevant when searching for BCH codes) with 90 (an empirically derived limit in BIP 173).

@tcharding
Copy link
Member Author

Deleted that long message to save anyone having to read it. I found my problem

MAY exceed the 90-character limit specified in BIP-0173.

ref: https://github.com/lightning/bolts/blob/master/11-payment-encoding.md

src/segwit.rs Outdated
@@ -364,6 +378,8 @@ pub enum EncodeError {
WitnessVersion(InvalidWitnessVersionError),
/// Invalid witness length.
WitnessLength(WitnessLengthError),
/// Encoding HRP, witver, and program into a bech32 string exceeds limit of 90 characters.
TooLong(EncodedLengthError),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not EncodedLengh here?

Copy link
Member Author

@tcharding tcharding Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were reviewing while I coded so this comment is stale now since I removed the error type all together. FTR the policy we have come up with is to always prefix error types with Error.

Thanks for the review, please see the new version for better handling of the maximum string length.

@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 3 times, most recently from b45c4f7 to b728eab Compare October 18, 2023 23:52
@tcharding tcharding marked this pull request as draft October 18, 2023 23:54
@tcharding
Copy link
Member Author

Converted to draft while I push up the changes in the first three patches.

@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 4 times, most recently from d1ead65 to 64310d5 Compare October 22, 2023 17:08
@tcharding
Copy link
Member Author

Rebased. Also I pulled a random column width fix out into its own patch because I just mentioned it to @realeinherjar in another PR/repo :)

@tcharding tcharding marked this pull request as ready for review October 22, 2023 17:15
@apoelstra
Copy link
Member

OK, I think this is fine. TBH I think we should just enforce the 1023-character maximum and say "screw LN" because if they're actually using an error-correcting code with no length limit then the resulting code is unable to correct even a single error and cannot detect more than one.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 65790bf

@tcharding
Copy link
Member Author

I can do that, can you point me to some docs so I can (a) understand the 1023 limit and (b) reference it in the code?

@tcharding
Copy link
Member Author

A real limit is bettor for #138 too, saves looping.

@apoelstra
Copy link
Member

I don't know where it's documented. But it's a fact of the bech32 and bech32m checksums that they have length 1023. sipa's find_bch.py can be used to regenerate the code and it outputs some extra data.

codex32 has length 93; the "long codex32" checksum has length 1023.

I'm not sure about the descriptor checksum, I'll have to check.

@tcharding
Copy link
Member Author

I don't know where it's documented.

I found this at the end of bip-173 (in the "Checksum design" section):

Even though the chosen code performs reasonably well up to 1023 characters, other designs are preferable for lengths above 89 characters (excluding the separator).

I'll add that as a reference for the numbers.

@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 2 times, most recently from d244386 to 24f1496 Compare October 24, 2023 00:06
@apoelstra
Copy link
Member

We can't unconditionally limit strings to 1023. The constant needs to be a property of the checksum.

@tcharding
Copy link
Member Author

tcharding commented Oct 24, 2023

Here is your previous comment:

And IMO we should only enforce the maximum length of 93 for segwit ... for the non-segwit API we should use 1023 since that's the actual length of the code.

I went searching for what "since that's the actual length of the code" meant, and all I could find was the docs in bip-173 "Checksum design" section. I went through a bunch of wiki pages on codes and couldn't find any mention of an inherent "length". Its just to do with how many errors can be detected with a certain length checksum for a certain amount of input data, right?

@apoelstra
Copy link
Member

https://en.wikipedia.org/wiki/BCH_code#General_BCH_codes defines length, though pretty obtusely. Here q is 32 for every code this library supports, m is always 2 (though this only affects error correction so maybe some users could define checksums with other values..). So q^m - 1 is 1023, and the length of a BCH code, being the order of an element in a multiplicative group of size 1023, must be 1023 or one of its factors.

A more useful definition is probably the one in https://secretcodex32.com/docs/2023-08-23--math.pdf on page 18:

The length of the code is how long a coded message can be (including the checksum!)
for the code to retain its error-correcting properties.

@tcharding

I don't see how (abstractly not technically). The segwit stuff wants to enforce 90 char limit for Bech32m but lib.rs does not? And the limit is a bip-173 thing not a BCH code thing, right?

Yes, the limit is defined by bip 173 based on empirical error-correcting properties. It is not related to the design length of the code and non-segwit users can (probably) get away without caring about it. Though they will get somewhat worse error correcting properties. It is a requirement of segwit but for non-segwit bech32 uses we shouldn't enforce it, because the code does continue to work up to length 1023.

Also, if its on Checksum folk that implement the trait would assume it is checked in lib.rs functions?

Yes, if we add a LENGTH field to the Checksum trait, people would assume that it's checked in lib.rs. And if people want to override this they can make their own copy of Bech32 where they change the field to usize::MAX or whatever.

@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 3 times, most recently from 0a8b2ca to 6885ece Compare October 25, 2023 02:11
@tcharding tcharding mentioned this pull request Oct 25, 2023
@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 2 times, most recently from 2a2a703 to 78180aa Compare October 25, 2023 02:24
@tcharding
Copy link
Member Author

I had an epic rebase fail, not totally sure I didn't loose stuff ...

apoelstra added a commit that referenced this pull request Oct 25, 2023
8426386 Use string not address in lib.rs (Tobin C. Harding)
c9aa166 Use NoData instead of MissingWitnessVersion (Tobin C. Harding)
2eac8e6 Fix colum width typo in rustdoc (Tobin C. Harding)

Pull request description:

  This is the first 4 patches from #142, pushing up separately because that PR is quite complicated to review well and these cleanups are unrelated.

ACKs for top commit:
  apoelstra:
    ACK 8426386

Tree-SHA512: 154bce12df068e80f17aa3fa5096390ce43eafd5efb9ed2319e0e26767ebb178b729f268bdb2a76c8a709c867d226d67522999da31dbb6f0bc30eb7106c0be9e
Currently we do not enforce the BIP-173 maximum character length for
bech32 endoded strings.

Add two public functions, both called `encoded_length` that calculate
the length of an encoded bech32 string.

Do not call the functions yet.
@tcharding tcharding force-pushed the 10-17-enforce-maximum-length branch 4 times, most recently from 801febf to 1235674 Compare October 26, 2023 00:19
@tcharding
Copy link
Member Author

Force push is fix a couple of things from yesterday's rebase fail.

@tcharding
Copy link
Member Author

This one turned out to be pretty complicated, thanks for your patience reviewing.

/// The maximum enforced string length of a segwit address.
///
/// The maximum length as specified in BIP-173, this is less than the 1023 character code length.
/// This limit is based on empirical error-correcting properties, see ["Checksum design"] section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 6ca06b1:

I'd grep for , see. Almost all of the instances should . See :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! I literally was looking at someone else's code yesterday thinking the opposite. I'm going to go ahead and trust you, your English tends to be better than mine :)

@apoelstra
Copy link
Member

6ca06b1 looks good other than the doc nit.

In a followup PR I think we should change the checksum length check so that we enforce it only on the data part, not the HRP or separator. (It appears that for segwit, the BIP says the 90 limit applies to the whole string.)

But this should be a followup PR because it's a debatable thing and I'd rather it have its own reference for future readers.

BIP-173 states that a bech32 string must not exceed 90 characters but
this is a bip specification, the bech32/bech32m checksum algos can
actually cover more characters than that.

To keep the `segwit` stuff bip compliant and also keep the library
general add an associated `CODE_LENGTH` const to the `Checksum` trait.
Then in the `segwit` module and types enforce the 90 character limit but
in the general API and modules enforce the `Ck::CODE_LENGTH` limit.

FTR in `bech32 v0.9.0` no lengths were not enforced.
@tcharding
Copy link
Member Author

Force push is just the docs thing, no other changes.

@tcharding tcharding mentioned this pull request Oct 26, 2023
@tcharding
Copy link
Member Author

Please see #156, I think this PR may be buggy. Perhaps we should just drop the length stuff all together from the Checksum trait and lib.rs? That would still be an improvement over v0.9 (because it enforces 90 chars) but not a regression because it allows any length strings with lib.rs?

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 12d7b38

@apoelstra
Copy link
Member

Perhaps we should just drop the length stuff all together from the Checksum trait and lib.rs? That would still be an improvement

It would still be an improvement, yes, but it would really be better if you couldn't (easily) use checksums for strings for which the checksums' properties didn't apply.

@apoelstra
Copy link
Member

In any case I would like to move the checksum length discussion into a separate PR, since this one also contains doccomment improvements and segwit stuff and adds the CODE_LENGTH field to the Checksum trait, all of which we definitely want to do.

@apoelstra apoelstra merged commit ac74415 into rust-bitcoin:master Oct 27, 2023
12 checks passed
@tcharding tcharding deleted the 10-17-enforce-maximum-length branch October 27, 2023 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bech32 string maximum length is not enforced
3 participants