New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write the weight/size API #2076
Conversation
FTR I feel I may have been responsible for some of the confusion introduced recently because of my review suggestions being either incorrect or misleading. |
4cf8b56
to
3d69a9b
Compare
bitcoin/src/blockdata/transaction.rs
Outdated
// FIXME: Is this vbyte/byte comment correct, sane, and meaningful? | ||
// For outputs size is equivalent to virtual size since it is the same for legacy and segwit. | ||
// FIXME: Is this unchecked usage correct? | ||
Weight::from_vb_unchecked(self.size() as u64) // Unchecked because size() cannot overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment would be simpler as size is equivalent to virtual size since all bytes of a TxOut are non-witness bytes
.
As for the claims of overflow, strictly speaking no. When convirting from vsize to weight there is a *4 multiplier, which could cause an overflow even if size
does not overflow.
IMO we shoud panic here and document that if the script size exceeds 2^62 then it'll panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks, will work in the suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to have all the weight
functions return Option<Weight>
and add weight_unchecked
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno, this feels like a crappy API given that under normal usage weights will never come within a factor of a million of overflowing.
We could have weight
functions that panic or truncate and weight_checked
ones that don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I solved this by just using saturating add/mull and commenting that its for defensive reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think panicking would be better because if someone hits it it means something is very wrong somewhere else. Finding why the value is u64::MAX
is much harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, after discussion I have also come around to "better to panic than to saturate" viewpoint. In theory it's a DoS vector. In practice if a user manages to hit this panic, they'll also hit panics using +
on normal int types, and there's nothing we can do to prevent that.
fn size_from_script_pubkey(script_pubkey: &Script) -> usize { | ||
let len = script_pubkey.len(); | ||
Amount::SIZE + VarInt::from(len).size() + len | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3d69a9b:
Same story: this will panic if the script size is within a few bytes of 2^64. We should just document it and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth introducing an invariant on the ScriptBuf
and Script
types to limit the size?
A quick websearch brought up this https://bitcoin.stackexchange.com/questions/117594/what-are-bitcoins-transaction-and-script-limits#117595
In case anyone is not using it this search site is awesome: https://bitcoinsearch.xyz/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should restrict Script
to have a maximum 2^31 size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this for another day because I don't want to hackishly add an invariant and I don't feel like thinking hard about it right now. I used saturating add/mul to resolve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as I made elsewhere, but just saturating can be bad IMO since it leads to a silent failure. I agree with @apoelstra here to just panic and mark as panic and move on if wrapping in an error or none is too heavy to do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apoelstra why limit the script? I don't remember this limit in consensus rules. Also I suspect it could make using scripts much more annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kixunil in practice, scripts have to fit into blocks, which are limited to 4M.
Currently we have limited script to "whatever the maximum number of elements a Vec
can hold" which is at most 2^32 on 32-bit systems, but which is likely to be 2^31 in practice because 32-bit allocators want to be able to hold allocation-sized signed diffs. These system-specific limits require us to have overflow checks all over the place when manipulating scripts, such as in this case where we are trying to add 12 to a script size and are worried that we'll overflow usize
.
If we had a 2^31 limit this would:
- Never be encountered in practice.
- Not affect the API for any of the "template constructors" like p2wsh, or conversions from addresses, or parsing from transactions (which I believe already have a "maximum vector size" check which has a much lower limit).
- In the case of
script::Builder
we could defer the API impact tointo_script
so that users could continue to usepush_*
whatever without needing to check errors constantly, and only check for overflow in the end. (And we could provide a panicking variant for users who were sure they weren't going to overflow.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also have to limit input and output counts and transaction counts, since you could technically have so many of them it overflows. I'm not convinced putting all these checks everywhere is better than overflow checks everywhere. Overflow checks are at least pretty easy to understand and get right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true -- trying to bound all types to stay with in a non-overflowing window would be really hard (and probably impossible in some cases; e.g. you can have a taptree with 2^128 leaves in theory which would still be legal to use on-chain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3d69a9b
3d69a9b
to
07ad0bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 07ad0bc
07ad0bc
to
14e796e
Compare
Used saturating add/mul. |
So my effort yesterday did not adhere to the epic list of PR requirements #843, I'm not sure how we are supposed to enforce that other than everyone internalising it. |
Works for me. I wonder if we should change the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 14e796e
I had a play with it but it requires a bit of thinking and feels like there are more important things to do right now, I added #2086 to flag it. |
bitcoin/src/blockdata/block.rs
Outdated
/// > Block weight is defined as Base size * 3 + Total size. | ||
pub fn weight(&self) -> Weight { | ||
// This is the exact definition of a weight unit, as defined by BIP-141 (quote above). | ||
let wu = self.base_size().saturating_mul(3).saturating_add(self.total_size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced that using saturated_mul
is better than checked_mul
. As a downstream consumer, I'd prefer to know when the bounds are exceeded and get an error or panic instead of silently succeeding at the saturation point. There are a number of checked
helper methods in weighthttps://github.com/rust-bitcoin/rust-bitcoin/blob/master/bitcoin/src/blockdata/weight.rs#L97).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sane use case should ever hit this limit, if a user really wants to know they can use base_size()
and total_size()
themselves and use the checked
versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a downstream consumer, I'd prefer to know when the bounds are exceeded
I can tell you right now, without my reading or your writing the code, that the bounds are not exceeded. Would you really like to type a ton of untested panicking extra code every time you use the API, as a reminder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have any strong opinions on usage here, but I don't like this method returning a result/option. Any other way (including how it is done) of doing this is fine is me.
@@ -298,6 +310,9 @@ impl Sequence { | |||
/// disables relative lock time. | |||
pub const ENABLE_RBF_NO_LOCKTIME: Self = Sequence(0xFFFFFFFD); | |||
|
|||
/// The number of bytes that a sequence number contributes to the size of a transaction. | |||
const SIZE: usize = 4; // Serialized length of a u32. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the serialized length of a u32, then why not just make this a u32 instead of a usize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lengths are type usize
in Rust by convention.
/// Keep in mind that when adding a TxOut to a transaction, the total weight of the transaction | ||
/// might increase more than `TxOut::weight`. This happens when the new output added causes | ||
/// the output length `VarInt` to increase its encoding length. | ||
/// # Panics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this ever Panics now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad, thanks. Will fix.
bitcoin/src/blockdata/transaction.rs
Outdated
Some(weight) => weight, | ||
// This should not happen under normal conditions, but in case someone is doing | ||
// something malicious return max so as not to silently overflow. | ||
None => Weight::MAX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem any better to return MAX silently instead of overflow silently. Could None be returned instead of Max if the bounds are exceeded so that the consumer knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion this code shows that the Weight::from_vb
API is wrong and that it should do saturating multiplication, please see #2086.
let outputs = self.output.iter().map(|txout| txout.script_pubkey.len()); | ||
predict_weight(inputs, outputs) | ||
// This is the exact definition of a weight unit, as defined by BIP-141 (quote above). | ||
let wu = self.base_size() * 3 + self.total_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that this could overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow sloppy work by me when I re-did this version of the PR, all math operations in the PR should be handled the same way. Will fix, thanks.
This seems to conflict with a earlier PR #2069. I'll close 2069 when/if this one is merged. |
One of our stated aims is to make it possible to learn bitcoin by using our library. To help with this aim add to private consts for the segwit transaction marker and flag serialization fields.
In an attempt to help super new devs add code comments about transaction serialization formats pre and post segwit.
14e796e
to
7d1f04e
Compare
I've documented the saturating add/mull on all the size functions. I have not documented the saturating behaviour on Also fixed my mistakes highlighted in review, sorry for the Friday afternoon sloppy AF push yesterday. |
To be explicit I have gone ahead with the saturating add/mull here, can we leave it like that and argue the pros/cons of that over in #2086, the point of this PR is just to clear up the size/weight stuff before the imminent release. The saturating behaviour can and should be argued through more thoroughly before v1.0.0 |
@@ -200,9 +208,6 @@ pub struct TxIn { | |||
} | |||
|
|||
impl TxIn { | |||
/// The weight of a `TxIn` excluding the `script_sig` and `witness`. | |||
pub const BASE_WEIGHT: Weight = Weight::from_wu(32 + 4 + 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using this. Can you leave this in? There's a comment by @stevenroose which is correct in saying this should be multiplied by 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, "base [size]" is well defined term in the bip and this is not what it means. I should never have acked this const, its badly named, badly documented, and wrong. If you want this you can use Weight::from_vb(Sequence::SIZE + OutPoint::SIZE)
(or from_vb_uncheked
if you want const).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should never have acked this const
You do a lot of code review which is a thankless job most of the time and helps the project a lot. And when one of the many you reviewed has a mistake then people notice your review. There are a lot of maintainers on this project that never bother to code review as far as I can tell. Really I'm not sure what the point is of having lots of maintainers that never code review.
IMO it would be better to just fix the semantics of the API and leave all of the saturation changes for a different PR. Although I can understand wanting to make sure there is no hidden overflows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small preference for serialized_size
instead of size
in the API names. Overall, I am okay with any decision in the PR as long as:
- We don't silently overflow.
- We don't have option/result types.
Any implementation using panics/saturation is good with me.
bitcoin/src/blockdata/block.rs
Outdated
/// > Block weight is defined as Base size * 3 + Total size. | ||
pub fn weight(&self) -> Weight { | ||
// This is the exact definition of a weight unit, as defined by BIP-141 (quote above). | ||
let wu = self.base_size().saturating_mul(3).saturating_add(self.total_size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have any strong opinions on usage here, but I don't like this method returning a result/option. Any other way (including how it is done) of doing this is fine is me.
bitcoin/src/blockdata/block.rs
Outdated
let mut size = Header::SIZE; | ||
|
||
size = size.saturating_add(VarInt::from(self.txdata.len()).size()); | ||
for tx in self.txdata.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: slightly prefer more rust idiomatic
self.txdata.iter().map(|tx| tx.base_size())
.fold(size, |acc, elem| acc.saturating_add(elem))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some part of me does not like the inefficiency of having bounds check everytime in the loop when we are sure that we are not exceeding it. But I can live with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment suggesting to convert loop to rust iterators where possible. More reading if you are interested: https://ipthomas.com/blog/2023/07/n-times-faster-than-c-where-n-128/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice article. Besides a performance increase, using bitwise operations also helps prevent against timing attacks by avoiding branching. It would be possible to write a C version using bitwise operations as well I think.
Ok, this whole saturating thing is starting to piss me off. I'm going to change this whole PR back to use normal operators |
In (2) did you mean to write "We don't return option/result types"? |
Recently we introduced a bug in the weight/size code, while investigating I found that our `Transaction`/`Block` weight/size APIs were in a total mess because: - The docs were stale - The concept of weight (weight units) and size (bytes) were mixed up I audited all the API functions, read some bips (141, 144) and re-wrote the API with the following goals: - Use terminology from the bips - Use abstractions that mirror the bips where possible
7d1f04e
to
c34e3cc
Compare
This now introduces silent overflows, the whole saturating/panic thing turned out to be an epic bikeshedding event and totally out of scope for this PR which is explicitly "fix the weight/size mess we recently introduced" - leaving all arithmetic stuff for #2086 @sanket1729 this is against your review, requires your concept ack/nack to proceed please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c34e3cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c34e3cc.
Sorry @tcharding, I forgot my "don't" in the comment. I meant we don't have Option/Result types.
I have this really bad habit of sometimes omitting negation while writing. I might need some professional diagnosis :P
LOLZ |
Thanks for fixing this @tcharding. I'd have preferred to fix my own f* ups but I understand with imminent 1.0 release it's important to fix this quickly. |
LOL, not really that imminent. |
I think OP meant v0.31.0 being imminent. |
.input | ||
.iter() | ||
.map(|input| { | ||
if self.use_segwit_serialization() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice quadratic complexity we made here. How TF I didn't see this before?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How TF I didn't see this before?!
The name for this function is bad. If the core developers make mistakes like that, they will be happening downstream a lot, and there will be no one to notice them for years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be it, I'm not sure. I vaguely remember looking at it during my review but seeing that I didn't post a proper review I think I didn't actually do a proper review. Looking at my calendar there was a conference around that time so clearly I was preparing for that. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the name is bad and even after seeing this code it does not look quadratic.
Audit and re-write the weight/size API for
Block
andTransaction
. First two patches are trivial, patch 3 contains justification and explanation for this work, copied here:Please note, this PR introduces panics if a sciptPubkey overflows the calculation
weight = spk.size() * 4
.Fix #2049