New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve base58 crate #2481
Improve base58 crate #2481
Conversation
Pull Request Test Coverage Report for Build 8334091085Details
💛 - Coveralls |
e9d0618
to
3cef6a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, light review.
/// Legacy address is too long. | ||
LegacyAddressTooLong(LegacyAddressTooLongError), | ||
/// Invalid base58 payload data length for legacy address. | ||
InvalidBase58PayloadLength(InvalidBase58PayloadLengthError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not treat too long and InvalidBase58PayloadLenght
the same? They are both invalid lengths just in different layers. It'd be nicer to merge them into a simple InvalidLength
if we can compute one from the other (which I believe we can) and only present the outer one to the user. (The user is only concerned with number of chars, not decoded bytes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning, hand wavey response.
Without commenting too much on this specific case, I find complex errors annoying to work with. Error handling is proving to be hard enough as it is using simple types, every time we have more complex errors I end up having to spend ages working out if they conform to all the million nuanced rules we have, this is compounded when errors get used in more places - then when they break the rules it takes much longer to detangle them than if there are just many types. One of the reasons this PR adds multiple errors that are the same "shape" (same fields). I have the gut feeling that forward compatibility will be harder the more we reuse errors in different places also. I think the error code should all be brain dead simple and ridiculously uniform if we are to have any hope of limiting mistakes. Its a moving target still, unfortunately, I can do error stuff mechanically in the afternoon if its brain dead easy - if its complex it means I have to do it earlier in the day when I have more brain power and it also needs more thought when reviewing. Long wall of text, sorry, I've spent a lot of time writing error stuff lately :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree -- and to the extent possible, as a user I like when every error is only constructible from one place. Then I can trace backward from the rust-bitcoin source to figure out where the error actually came from (after grepping to reverse the error message into an error variant).
While conceptually "invalid length" always means that the string was somehow an invalid length, it's useful to know which layer of the decoding complained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tcharding is it possible that you're confusing this with an objection against same-shaped errors? Because I'm not objecting to that. Same-shaped errors with different semantics are fine. Differently-shaped errors with the same semantics are not really that great.
I know it can be annoying but really, programming is hard anyway. If we merge the errors in thee library we can save downstream users from verbose matching. I tend to prefer making libraries internally complex if it can make downstream crates less complex because there are multiple downstream projects.
@apoelstra by "users" here I mean the people who are not programmers. I see no reason why they should concern themselves with layers. If you need to debug things we could make the debug representation print out the source or we could even add optional backtrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apoelstra by "users" here I mean the people who are not programmers
You've defined "user" in a way that can't possibly include me, which isn't fair. I also use software. In basically every software I use when I encounter an error of some sort I dig into the source code.
If there is no source code (e.g. the opaque "Contact Key Verification" error that you get when you try to enable the new iMessage fingerprint verification tool) then generally I behave like a non-technical person by giving up and doing something else that doesn't involve computers.
If you need to debug things we could make the debug representation print out the source or we could even add optional backtrace.
I really like this idea. We could compile out that stuff when debug_assertions
is off.
The backtrace API isn't stable until 1.65 so something to revisit, not related to this PR or anything soon.
BTW this is an example of something we could add where @tcharding's constructors would be unaffected :). They would just need to have #[track_caller]
on and they could conditionally internally generate backtraces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've defined "user" in a way that can't possibly include me, which isn't fair. I also use software. In basically every software I use when I encounter an error of some sort I dig into the source code.
Fair, I do this too. :D
Maybe the error messages can be slightly different such that it doesn't bother users but allows debugging.
We could compile out that stuff when
debug_assertions
is off.
I'd rather use a different feature flag if we want to provide a flag at all. The Backtrace
type already handles not capturing when RUST_BACKTRACE
is not set
The backtrace API isn't stable until 1.65 so something to revisit
We can use cfg
and/or replace it with core::panic::Location
in the meantime (and maybe we should do it anyway because panic location is no_std
while backtrace isn't).
There's also external library for this but we can't make it conditional on Rust version so I'd rather not use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this thread resolved @Kixunil or do you still want to see changes to the introduced error types?
Base58(base58::Error), | ||
/// Base58 decoded data was an invalid length. | ||
InvalidBase58PayloadLength(InvalidBase58PayloadLengthError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nicer to just call this InvalidLength
and present the number of chars to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind that change though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bit of a play with doing this but because one error uses expected length and one uses expected length I believe its more clear if left as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, can you explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that was bad English I wrote. Should have been - one uses expected length and one uses "should have been 33 or 34". WIP accepts data length of 33 or 34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could distinguish those internally with an enum but maybe it's too much. I suggest we at least make the error messages more user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the error messages and did not see exactly where you wanted them improved. FTR I'm not confident that all our nested errors stack error messages in a meaningful, non-duplicate way, I've kind of given up for now and am just putting something grep'able. I think we will need to come back pre-1.0 and do some deep investigation like I tried to do in rust-bitcoin/rust-bech32#151
/// Checksum was not correct. | ||
BadChecksum(BadChecksumError), | ||
/// Checked data was too short. | ||
TooShort(TooShortError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also invalid length so we should then just convert this one into InvalidLength
downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually you are correct but since I've argued above that we should not have the InvalidLength
then I believe this comment is mute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced. The parsed type has a set of valid lengths. Either the length is in the set or isn't. The only issue is how we represent the sets (although they are statically known for their types).
Thanks for the review @Kixunil.
|
3cef6a5
to
dcb771d
Compare
4463ffc
to
f6b86b3
Compare
Note, I just saw "incorrect" vs "invalid" comment, will change tomorrow. |
f4135a1
to
6221c76
Compare
dce2aab
to
d7ceb25
Compare
I do not claim that this PR makes the errors perfect but IMO it does get the crate into a state that it can be released as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding invalid length, I'm fine with not doing it in this PR.
| TooShort(_) => None, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this change if the error is reexported anyway? I find this annoying to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what is annoying you, did you mean commit: 4f6f0bf6 base58: Add error module
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kixunil I guess the point is just code organization. I do a very lax review of these things.
- Check that number of variants in one file is the same of one in another file.
- Check some random method if it is correctly copied or not.
- Check that total lines are the same.
- Check the total diff is small.
While reviewing move-only/formatting commits from contributors whose PRs I have been reviewing for years, I am looking for accidental mistakes not for deliberate/malicious errors. Sure, it is possible that @tcharding's or @apoelstra's account is hacked or they deliberately sneak in a change that causes issues. If the commit from a regular contributor cargo fmt something
or move something
, I tend to a spot sanity check and not use my brain trying to carefully review it.
If these guys wanted to break something, they just push a malicious crate to crates io that is unrelated to github code :) . I do rely on strongly on what commit messages and certainly it is possible that authors sneak in a code logic change along under a move/cargo fmt label under a PR that I ACKed.
d7ceb25
to
6957218
Compare
6957218
to
9561468
Compare
9561468
to
1784b0d
Compare
Rebased and changed |
I have a mild preference for using But this is just a nit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1784b0d
Yeah, I like that. I'll fix and re-spin tomorrow. |
1784b0d
to
33347ac
Compare
Just the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 33347ac
This whole PR is error handling, totally boring, and can be iterated upon later. However, we need to release this crate already so we can get to the RC release of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 33347ac
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct IncorrectChecksumError { | ||
/// The incorrect checksum. | ||
pub(super) incorrect: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Most of errors I have seen got
and expected
format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change all the errors across the board and just use got
/expected
, will leave it for another day (or month). Thanks man.
I only reviewed this for correctness and I think the moves the code in forward direction. There are still some unresolved issues about which error variants to emit or whether to combine them or not. I don't have a strong opinion on them and all of those decisions fall well within the acceptable good API boundary and I did not think about what would be the "perfect" solution. |
Awesome, thanks for the review man. |
Frustratingly this needs a rebase now because of #2518, which was basically trivial. |
In preparation for improving the `base58` error types crate an `error` module and move the single current error type there. Make the module public and reexport the type.
The `base58::decode` function can only return a single error type, add a `InvalidCharacterError` struct (leaf error) to use as the return type.
We are currently using the `base58::Error` type to create errors in `bitcoin`, these are bitcoin errors not `base58` errors. Note that we add what looks like duplicate `InvalidBase58PayloadLengthError` types but they are different because of the expected length. This could have been a field but I elected not to do so for two reasons: 1. We will need to do so anyways if we crate smash more 2. The `crypto::key` one can have one of two values 33 or 34. With this applied we can remove the now unused error variants from `base58::Error`.
As is convention here in `rust-bitcoin`, hide the `base58::Error` internals by adding struct error types.
33347ac
to
af49841
Compare
Done! No other changes. |
This is a good example of how something in our set up requires manual intervention when it shouldn't really, there were no conflicts, I just ran (Lazy comment, I did not spend time thinking why.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK af49841
Process idea: when there is an ack and then a force push with minor changes could a single ack from you @apoelstra mean "I ack the changes and I checked the diff with git range-diff since Dev A acked and its still reasonable to think that ack stands"? Especially when you previously acked the same hash so you are running |
Yeah, agreed. Though we should PR such a change separately to CONTRIBUTING.md as a new one-ack carveout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK af49841
| TooShort(_) => None, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kixunil I guess the point is just code organization. I do a very lax review of these things.
- Check that number of variants in one file is the same of one in another file.
- Check some random method if it is correctly copied or not.
- Check that total lines are the same.
- Check the total diff is small.
While reviewing move-only/formatting commits from contributors whose PRs I have been reviewing for years, I am looking for accidental mistakes not for deliberate/malicious errors. Sure, it is possible that @tcharding's or @apoelstra's account is hacked or they deliberately sneak in a change that causes issues. If the commit from a regular contributor cargo fmt something
or move something
, I tend to a spot sanity check and not use my brain trying to carefully review it.
If these guys wanted to break something, they just push a malicious crate to crates io that is unrelated to github code :) . I do rely on strongly on what commit messages and certainly it is possible that authors sneak in a code logic change along under a move/cargo fmt label under a PR that I ACKed.
@@ -732,11 +735,11 @@ impl FromStr for Address<NetworkUnchecked> { | |||
// If segwit decoding fails, assume its a legacy address. | |||
|
|||
if s.len() > 50 { | |||
return Err(ParseError::Base58(base58::Error::InvalidLength(s.len() * 11 / 15))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
Improve the error code in the new
base58
crate.