Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Count sigops for Transaction #2073

Merged
merged 1 commit into from Sep 22, 2023

Conversation

junderw
Copy link
Contributor

@junderw junderw commented Sep 15, 2023

I copied over the sigop counting logic from Bitcoin Core, but I made a few adjustments.

  1. I removed 2 consensus flags that checked for P2SH and SegWit activation. This code assumes both are activated. If we were to include that, what would be a good way to go about it? (ie. If I run this method on a transaction from the 1000th block and it just so happened to have a P2SH-like input, Bitcoin Core wouldn't accidentally count those sigops because the consensus flag will stop them from running the P2SH logic. Same goes for SegWit)
  2. Since there's no guarantee that we have an index from which we can get the prevout scripts, I made it into a generic closure that looks up the prevout script for us. If the caller doesn't provide it, We can only count sigops directly in the scriptSig and scriptPubkey (no P2SH or SegWit).

TODO

  • Write tests for transaction sigop counting

Edit: The test changes are just to get the 1.48 tests passing. I'll remove them and replace them with whatever solution that is agreed upon in another PR etc.

Edit 2: This is the code I used as a guide:

https://github.com/bitcoin/bitcoin/blob/8105bce5b384c72cf08b25b7c5343622754e7337/src/consensus/tx_verify.cpp#L147-L166

Edit 3: I found a subtle bug in the implementation of count_sigops (#2073 (comment))

@junderw
Copy link
Contributor Author

junderw commented Sep 15, 2023

Another pinning issue appeared?

@junderw
Copy link
Contributor Author

junderw commented Sep 15, 2023

Weird... I verified the 1.48.0 test suite passes locally. Maybe it was a fluke (needs a re-run)

@RCasatta
Copy link
Collaborator

re-kicked

@yancyribbens
Copy link
Contributor

yancyribbens commented Sep 15, 2023

Another pinning issue appeared?

yeah I opened an issue for this a few days ago: #2071

syn v2.0.33 requires rustc +1.56

@yancyribbens
Copy link
Contributor

Fix 1.48 CI error

@junderw nice! How did you know what dep to update?

@yancyribbens
Copy link
Contributor

@junderw is this just a copy of the commit I made here: d34df0c?

@junderw
Copy link
Contributor Author

junderw commented Sep 15, 2023

@junderw is this just a copy of the commit I made here: d34df0c?

Yep. I'm going to remove it once a fix is merged into master.

@yancyribbens
Copy link
Contributor

yancyribbens commented Sep 15, 2023

Yep. I'm going to remove it once a fix is merged into master.

@junderw ah ok. Feel free to comment on #2074

@apoelstra
Copy link
Member

concept ACK. I didn't review the implementation beyond checking the "pushonly" stuff since I happened to remember how weird that computation is. We are veering into consensus-code territory that is full of nasty surprises, but OTOH getting accurate sigop counts is a pretty important thing for Script authors so I think we should try to do it.

README.md Outdated Show resolved Hide resolved
@apoelstra
Copy link
Member

Could you fold your changes into the original commits? It's hard to review code that changes the same lines across multiple commits, and I think this PR is small enough that it won't throw off in-progress reviewers.

@junderw junderw force-pushed the feat/wip-sigop-tx branch 2 times, most recently from 78cd80c to ab8242f Compare September 17, 2023 06:14
@junderw
Copy link
Contributor Author

junderw commented Sep 17, 2023

I found a bug in Script count_sigops and count_sigops_legacy.

f1220af34ed325cbe56fd76527acfb830b38c8a7edfe524165de16098a8d0f44

This Coinbase transaction has an early ending scriptSig that was unable to be parsed.

Upon looking at Bitcoin Core's logic:

if (!GetOp(pc, opcode)) break;

Then looking at the GetOp code, it returns false when it runs into an early end of script.

So Bitcoin Core returns early with the accumulated n when it hits an error. I have changed the logic to fit that.

@junderw
Copy link
Contributor Author

junderw commented Sep 17, 2023

I rebased the bugfix into a separate commit and made a new PR. This PR now depends on #2075

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, I had a lot of fun reviewing this and learned a tonne. In doing so I made a whole bunch of changes locally, in case you want to look at them I pushed them to tmp-junderw-sigop-tx branch on my tree. In case you prefer using github UI here is the code I have in transaction.rs

    // TODO: Find a better reference for how taproot sigops are limited.
    /// Counts the total number of sigops.
    ///
    /// This value is for pre-taproot transactions only.
    ///
    /// > In taproot, a different mechanism is used. Instead of having a global per-block limit,
    /// > there is a per-transaction-input limit, proportional to the size of that input.
    /// > ref: <https://bitcoin.stackexchange.com/questions/117356/what-is-sigop-signature-operation#117359>
    ///
    /// The `spent` parameter is an optional closure/function that takes in an [`OutPoint`] and
    /// returns a [`TxOut`]. Without access to the previous [`TxOut`], any sigops in redeemScripts,
    /// witnessScripts, and P2WPKH sigops will not be counted.
    pub fn total_sigop_cost<S, F>(&self, mut spent: Option<S>) -> usize
    where
        S: FnMut(&OutPoint) -> Option<TxOut>,
    {
        // TODO: Use checked multiplication here and below.
        let mut cost = self.count_p2pk_p2pkh_sigops() * 4;

        // coinbase tx is correctly handled because `spent` will always returns None.
        cost += self.count_p2sh_sigops(spent.as_mut()) * 4;
        cost + self.count_witness_sigops(spent.as_mut())
    }

    /// Gets the sigop count.
    ///
    /// Counts sigops for this transaction's input scriptSigs and output scriptPubkeys i.e., doesn't
    /// count sigops in the redeemScript for p2sh or the sigops in the witness (use
    /// `count_p2sh_sigops` and `count_witness_sigops` respectively).
    fn count_p2pk_p2pkh_sigops(&self) -> usize {
        let mut n = 0;
        for input in &self.input {
            // 0 for p2wpkh, p2wsh, and p2sh (including wrapped segwit).
            n += input.script_sig.count_sigops_legacy();
        }
        for output in &self.output {
            n += output.script_pubkey.count_sigops_legacy();
        }
        n
    }

    /// Does not include wrapped segwit (see `count_witness_sigops`).
    fn count_p2sh_sigops<S>(&self, mut spent: Option<&mut S>) -> usize
    where
        S: FnMut(&OutPoint) -> Option<TxOut>,
    {
        fn count_sigops(prevout: &TxOut, input: &TxIn) -> usize {
            let mut count = 0;
            if prevout.script_pubkey.is_p2sh() {
                if let Some(Ok(script::Instruction::PushBytes(redeem))) =
                    input.script_sig.instructions().last()
                {
                    let script = Script::from_bytes(redeem.as_bytes());
                    count += script.count_sigops_accurate();
                }
            }
            count
        }

        let mut count = 0;
        for input in &self.input {
            if let Some(Some(prevout)) = spent.as_mut().map(|s| s(&input.previous_output)) {
                count += count_sigops(&prevout, input);
            }
        }
        count
    }

    /// Includes wrapped segwit (returns 0 for taproot spends).
    fn count_witness_sigops<S>(&self, mut spent: Option<&mut S>) -> usize
    where
        S: FnMut(&OutPoint) -> Option<TxOut>,
    {
        fn count_sigops(prevout: TxOut, input: &TxIn) -> usize {
            let script_sig = &input.script_sig;
            let witness = &input.witness;

            let script = if prevout.script_pubkey.is_witness_program() {
                &prevout.script_pubkey
            } else if prevout.script_pubkey.is_p2sh() && script_sig.is_push_only() {
                // TODO: Comment this line.
                if let Some(Push::Data(push_bytes)) = script_sig.last_pushdata() {
                    Script::from_bytes(push_bytes.as_bytes())
                } else {
                    return 0;
                }
            } else {
                return 0;
            };

            witness.sig_ops(script)
        }

        let mut count = 0;
        for input in &self.input {
            if let Some(Some(prevout)) = spent.as_mut().map(|s| s(&input.previous_output)) {
                count += count_sigops(prevout, input);
            }
        }
        count
    }
}

bitcoin/src/blockdata/script/borrowed.rs Outdated Show resolved Hide resolved
bitcoin/src/blockdata/script/borrowed.rs Outdated Show resolved Hide resolved
bitcoin/src/blockdata/transaction.rs Outdated Show resolved Hide resolved
bitcoin/src/blockdata/transaction.rs Outdated Show resolved Hide resolved
bitcoin/src/blockdata/transaction.rs Outdated Show resolved Hide resolved
bitcoin/src/blockdata/transaction.rs Outdated Show resolved Hide resolved
@junderw junderw force-pushed the feat/wip-sigop-tx branch 3 times, most recently from 6b4aa88 to d949373 Compare September 20, 2023 04:05
@junderw
Copy link
Contributor Author

junderw commented Sep 20, 2023

Thanks for the review. I have accepted all of your fixes with a few minor tweaks (diff locally to see what I fixed, it was mostly comments and whatnot.

Comment on lines 854 to 858
// TODO: Use checked multiplication here and below.
let mut cost = self.count_p2pk_p2pkh_sigops() * 4;

// coinbase tx is correctly handled because `spent` will always returns None.
cost += self.count_p2sh_sigops(spent.as_mut()) * 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the library level, we do not have 16-bit support, so usize is at least 32 bits. Is it possible to have a sigop count greater than 2^30 (1,073,741,824)?

I don't see TODO notes on the addition operations in other functions...

Perhaps we move away from usize for these, since we're performing a measurement operation? IMO, it would bring more clarity to just commit to u32 or u64, whichever makes more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two paths here I can see:

  • Moving to u64 everywhere, which kinda sucks because all the stdlib .len() functions return a usize and there is no u64::from(usize). Forcing us to cast or panic.
  • Circle back to Script/ScriptBuf and refuse to allow construction of scripts that exceed 2^31 bytes (say) so we can stop worrying about this.

In practice no script can exceed 4Mb because that's the maximum number of bytes in a block.

Copy link
Contributor Author

@junderw junderw Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the biggest contributor to sigops counts for transactions on mainnet is bare multisig in the output. (20 sigops hard coded per OP_CHECKMULTISIG)

Just counting those, it would require 53687092 bare OP_CHECKMULTISIGs in the outputs in order to overflow 32 bits.

With current consensus rules, we are nowhere close to a scenario where this will overflow for 32 bits.

u32 is more than enough. The decision on u32/u64/usize should be dictated solely by ergonomics of use.

At this point I think usize is the most ergonomic.

Copy link
Member

@tcharding tcharding Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on the weight/size stuff I stumbled across #843

We should be using saturating add/mul here I believe.

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

Added tests to cover (I think) all cases. (Also removed a superfluous generic F that was there for some reason heh)

@junderw junderw marked this pull request as ready for review September 21, 2023 08:58
@yancyribbens
Copy link
Contributor

I think we should use saturating add/mul everywhere in this PR. Apologies for not realising that yesterday.

@tcharding Is there a compelling reason to use saturating add/mul instead of checked add/mul here?

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

I think we should use saturating add/mul everywhere in this PR. Apologies for not realising that yesterday.

@tcharding Is there a compelling reason to use saturating add/mul instead of checked add/mul here?

In order to go over the 32 bit threshold of 4,294,967,295, the fastest way would be to have as many bare-multi-sigs in the outputs as possible.

1-of-1 multisig (smallest to fit as many as possible in the blocksize) is 46 bytes per output. Since it's non-segwit, we can only fit up to about 1 MB in a single transaction (if a miner mines it for us) so we could fit around ~ 21730 multisigs in there.

21730 * 20 * 4 is 1738400

We need about 2470 times that to hit the 32 bit threshold.

So until we get 2.4 GB blocksizes, we won't have to worry at all. If we ever hit that, that is just the worst-case-scenario of an obviously malicious transaction on a 32 bit machine.

By the time (if at all) we get 2.4 GB blocks, we might not even support 32 bit systems anymore.

saturating will ensure that there is no wrapping while also keeping the ergonomics of the API in a better state. (Checking Options for everything when we only have 1 MB blocks is bad ergonomics)

@apoelstra
Copy link
Member

Maybe we could just mention this monstrosity in the docs, "oh yeah when you use None, write this hideous line of code."

@junderw I've run into this issue in a few contexts (taking a generic option). One solution is to provide a correctly-typed None constant with a name like NO_OUTPOINT_MAP. Another is to use a custom enum rather than Option (though to hold a closure it might still need to be generic which means you may have the same issue..).

@tcharding Is there a compelling reason to use saturating add/mul instead of checked add/mul here?

Because checked add/mul will require everything returns a Result even though the error variant is practically inaccessible.

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

@junderw I've run into this issue in a few contexts (taking a generic option). One solution is to provide a correctly-typed None constant with a name like NO_OUTPOINT_MAP. Another is to use a custom enum rather than Option (though to hold a closure it might still need to be generic which means you may have the same issue..).

What about taking Option<&'a mut (dyn FnMut(&OutPoint) -> Option<TxOut> + 'b)>? It adds a &mut to the closure unnecessarily, but it makes the None case nicer.

Actually, this is still kind of awkward for the Some case now, and the variance of 'a in this case is invariant because it's being used as the T of a &mut T so we need to separate it into a new lifetime while ensuring it lives long enough.

Using one of the functions directly with &mut my_func works fine. Also, writing a closure just requires a &mut and it works fine too. (ie. &mut |_| None works) so it seems like variance won't cause trouble for a majority of cases.

diff --git a/bitcoin/src/blockdata/transaction.rs b/bitcoin/src/blockdata/transaction.rs
index e87c562f..7c554fa8 100644
--- a/bitcoin/src/blockdata/transaction.rs
+++ b/bitcoin/src/blockdata/transaction.rs
@@ -847,10 +847,10 @@ impl Transaction {
     /// The `spent` parameter is an optional closure/function that takes in an [`OutPoint`] and
     /// returns a [`TxOut`]. Without access to the previous [`TxOut`], any sigops in redeemScripts,
     /// witnessScripts, and P2WPKH sigops will not be counted.
-    pub fn total_sigop_cost<S>(&self, mut spent: Option<S>) -> usize
-    where
-        S: FnMut(&OutPoint) -> Option<TxOut>,
-    {
+    pub fn total_sigop_cost<'a, 'b : 'a>(
+        &'a self,
+        mut spent: Option<&'a mut (dyn FnMut(&OutPoint) -> Option<TxOut> + 'b)>,
+    ) -> usize {
         // TODO: Use checked multiplication here and below.
         let mut cost = self.count_p2pk_p2pkh_sigops().saturating_mul(4);
 
@@ -1925,7 +1925,7 @@ mod tests {
 
     #[test]
     fn tx_sigop_count() {
-        let tx_hexes = [
+        let mut tx_hexes = [
             // 0 sigops (p2pkh in + p2wpkh out)
             (
                 "0200000001725aab4d23f76ad10bb569a68f8702ebfb8b076e015179ff9b9425234953\
@@ -2063,11 +2063,11 @@ mod tests {
         }
         fn return_none(_outpoint: &OutPoint) -> Option<TxOut> { None }
 
-        for (hx, expected, spent_fn, expected_none) in tx_hexes.iter() {
+        for (hx, expected, spent_fn, expected_none) in tx_hexes.iter_mut() {
             let tx_bytes = hex!(hx);
             let tx: Transaction = deserialize(&tx_bytes).unwrap();
             assert_eq!(tx.total_sigop_cost(Some(spent_fn)), *expected);
-            assert_eq!(tx.total_sigop_cost(None::<fn(&OutPoint) -> Option<TxOut>>), *expected_none);
+            assert_eq!(tx.total_sigop_cost(None), *expected_none);
         }
     }
 }

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

Using dyn and requiring Fn and only taking & could be another answer... it would cut off any users that somehow need to utilize mutable state in their lookup closure...

But tbh, they can use interior mutability, and 99% of use cases will likely just be a function with no state anyways or a non-capturing closure... or a capturing closure that only reads a HashMap etc. (which doesn't need mutable state anyways)...

This would lower the requirements for ownership of the actual closure itself, but increase the requirements by restricting what can be accessed inside the closure. Since most use cases using a state-full closure would just be temporary values anyways, not bound to a variable... so the lowering of the requirements on the closure itself seems unimportant.

Option<&'a mut (dyn FnMut(&OutPoint) -> Option<TxOut> + 'b)> where 'b: 'a feels like an ok idea. (Passing in &mut my_func for a regular function does feel awkward, though)

Let me know if there's some trap I'm not thinking of here. I'm kind of mumbling on about my thought process of whether this is good or bad for the API from a consumer perspective, but I could be missing something obvious that you all are privy to.

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

The more I think about it, the more I am warming up to the NO_OUTPOINT_MAP constant and keeping it generic idea.

I'm going to stop thinking about this now until I get some more feedback, my brain is too excited here. lol

@junderw junderw force-pushed the feat/wip-sigop-tx branch 2 times, most recently from 49c20c0 to b8aefe3 Compare September 21, 2023 21:13
@apoelstra
Copy link
Member

I've never used dyn with Fn before so it's hard for me to say what the merits are, and what compiler weirdness we're likely to hit :).

@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

Yeah, getting rid of the Option was the best option tbh.

apoelstra
apoelstra previously approved these changes Sep 21, 2023
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b8aefe3

Co-authored-by: Tobin C. Harding <me@tobin.cc>
@junderw
Copy link
Contributor Author

junderw commented Sep 21, 2023

Sorry for the constant dismissal of ACKs 😅, documentation fix.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 158ba26

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 158ba26

@tcharding
Copy link
Member

FTR I didn't put much thought into the spent parameter stuff.

@apoelstra
Copy link
Member

@junderw looks like Github hid my comment as "outdated" which talks about the lack of Taproot support. I think we should change count_sigops_internal to also count CHECKSIGADD, which I believe is all we need to add Taproot functionality. (In Taproot you compare the sigop count to some weight*50 formula rather than a fixed maximum, but the idea of counting sigops against a limit is the same. And I believe the limit check isn't actually implemented in this PR, so there's no reason to not support Taproot.)

Do you want to update this PR or should I merge this and we'll do it in a followup?

@junderw
Copy link
Contributor Author

junderw commented Sep 22, 2023

I replied in the thread (you can click to unhide and read it)

This function is a Transaction level function, but Taproot scripts don't count toward the transaction's sigop count.

We could add a taproot variant to the Script level count function, though. Since the only time sigops are tracked during taproot verification is when running the tapscript, the script interpreter holds a running budget that it ticks down every time it performs SIGADD etc.

We could also then build upon that and add a Witness method that would calculate the budget (which is serialized witness stack size + 50) and see if the sigops of the tapscript go over that.

I think this is a separate PR.

@apoelstra
Copy link
Member

Oof, I understand, you're right. I forgot that the pre-Taproot and Taproot limits are compared against different things (blocks and transactions vs individual txins) so you can't just aggregate them like this.

@apoelstra apoelstra merged commit 141d805 into rust-bitcoin:master Sep 22, 2023
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants