Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Introduce block authorship soft deadline #9663

Merged
14 commits merged into from
Oct 4, 2021
Merged

Introduce block authorship soft deadline #9663

14 commits merged into from
Oct 4, 2021

Conversation

tomusdrw
Copy link
Contributor

Currently, when packing transactions to a block we have few conditions when we decide the block is ready to be signed by the consensus engine and gossiped to the network:

  1. block size limit is reached,
  2. block weight limit is reached (indirectly through "exhausts resources" errors)
  3. we run out of transactions in the transaction pool,
  4. deadline to produce the block is reached.

The first 3 I'd consider "regular" conditions, the last one is rather a safety valve, ensuring that the block producer does not miss it's slot due to some irregular slowness.

However, since optimal block packing is a difficult problem (i.e. producing a block with optimal utilisation, while maximising the cumulative priority of all transactions in that block), we use a very simple greedy heurstic currently, with a twist, that whenever we run into first transaction that reports resources exhaustion, we attempt to insert at least MAX_SKIPPED_TRANSACTIONS more to the block before concluding it's actually full.

This heuristic is obviously sub-optimal and can be gamed, hence the PR introduces another variant of the heuristic to potentially increase block utilisation, but without reaching the hard deadline defined by the consensus engine.

The PR introduces "soft deadline" (half of the hard deadline time). Before soft deadline is reached we can try as many transactions as desired from the transaction pool, even if they report resources exhaustion. After soft deadline we switch to the previous heuristic of trying at most MAX_SKIPPED_TRANSACTIONS.

@tomusdrw tomusdrw added A0-please_review Pull request needs code review. B5-clientnoteworthy C3-medium PR touches the given topic and has a medium impact on builders. labels Aug 31, 2021
@tomusdrw tomusdrw added the D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. label Aug 31, 2021
@gilescope
Copy link
Contributor

What's the smallest size a tx can be? Maybe we should be checking if block_size + MIN_POSSIBLE_TX_SIZE > block_size_limit because there's always going to be a little bit left over but there may be zero chance that any tx can realistically fit in there.

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Sep 2, 2021

What's the smallest size a tx can be?

That's a very good point, but it's up to the runtime to decide and we currently have no good way to ask the runtime about that. Also note that any additional heuristic we add, it should be added for both the size limit and the "resource/weight" limit to make sure we cater for all kinds of runtimes.

@tomusdrw tomusdrw requested a review from bkchr September 13, 2021 13:50
@tomusdrw
Copy link
Contributor Author

@bkchr can you take a look if soft deadline is okay, or if you'd rather prefer to add transactions up to the hard deadline?

@gww-parity
Copy link

What about failed test:

failures:
---- basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping stdout ----
thread 'basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `2`', client/basic-authorship/src/basic_authorship.rs:759:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:515:5
   1: core::panicking::panic_fmt
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/panicking.rs:92:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: sc_basic_authorship::basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping::{{closure}}
   5: sc_basic_authorship::basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping
   6: core::ops::function::FnOnce::call_once
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
    basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping
test result: FAILED. 6 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.25s
error: test failed, to rerun pass '-p sc-basic-authorship --lib'

?

Copy link

@gww-parity gww-parity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

);

// when
let deadline = time::Duration::from_secs(300);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this number is hardcoded:

  • should it be explained in few words why 300 and when may user want to tweak it?
  • or maybe make const out of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still I don't know why this value for test. To me tests are documentation, and I don't know how to read it -> is it random? Or semi-random? etc. Overall, yes, it's not blocking from approval, just something imho nice to polish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing tests as docs is really a bad idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Create a proposal.
here are the docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline is that high to not have it triggered.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing tests as docs is really a bad idea.

I see we come from different background and experience ;).

Otherwise, thank you for pointers and links.

Again, I agree with you that this is just a detail, so whatever will be fine with me (may be even ignored) and as you see my approval for PR was already given regardless :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we come from different background and experience ;).

I mean, I would not say that I do not read tests from time to time, but this normally means that the docs are shit and I need some examples on how to use it :D

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:). I consider that good tests have a lot of value. They can document expectations in non-ambiguous way (contrary to human natural language), and keep assumptions in check (e.g. setting time limits with not too much slack, can help you in detecting if someone accidentally introduces performance regression (that may be not caught by benchmarking or make diagnosis easier).... maybe not in this particular piece of code but in general).
Ideally, tests may cover expected and edge cases, so I can read from them who edge cases may look like, but yeah, not always worth the effort.

Regarding code review process, asking questions about tests I consider practice motivating to rethink hidden assumptions, and if we can do better. E.g. here, maybe 300 is 10x too much and maybe we can go 10x lower? Maybe it will help to catch other issues? Or maybe it's not worth it and we keep 300 and move on. :)

TL;DR- to me documentation and tests have potential (but not always have to) document different things, from different angles :).

);

// when
let deadline = time::Duration::from_secs(300);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test.

@@ -386,6 +390,13 @@ where
MAX_SKIPPED_TRANSACTIONS - skipped,
);
continue
} else if (self.now)() < soft_deadline {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (self.now)() < soft_deadline {
} else if now < soft_deadline {

Copy link

@gww-parity gww-parity Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, isn't actually original (self.now)() intentional to get more recent reading at a time of test?

Otherwise, to me both are good enough to not block approval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion, there is nothing intensive happening between these calls, and it's worth saving this one extra syscall. That was my intention when I introduced now variable initially as well, but somehow lost it in all the refactorings.

@bkchr
Copy link
Member

bkchr commented Sep 29, 2021

What about failed test:

failures:
---- basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping stdout ----
thread 'basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `2`', client/basic-authorship/src/basic_authorship.rs:759:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:515:5
   1: core::panicking::panic_fmt
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/panicking.rs:92:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: sc_basic_authorship::basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping::{{closure}}
   5: sc_basic_authorship::basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping
   6: core::ops::function::FnOnce::call_once
             at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
    basic_authorship::tests::should_not_remove_invalid_transactions_when_skipping
test result: FAILED. 6 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.25s
error: test failed, to rerun pass '-p sc-basic-authorship --lib'

?

This is a flaky test

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Oct 4, 2021

bot merge

@ghost
Copy link

ghost commented Oct 4, 2021

Trying merge.

@ghost ghost merged commit 125092f into master Oct 4, 2021
@ghost ghost deleted the td-limit branch October 4, 2021 14:30
ordian added a commit that referenced this pull request Oct 5, 2021
* master: (125 commits)
  Update multiple dependencies (#9936)
  Speed up timestamp generation when logging (#9933)
  First word should be Substrate not Polkadot (#9935)
  Improved file not found error message (#9931)
  don't read events in elections anymore. (#9898)
  Remove incorrect sanity check (#9924)
  Require crypto scheme for `insert-key` (#9909)
  chore: refresh of the substrate_builder image (#9808)
  Introduce block authorship soft deadline (#9663)
  Rework Transaction Priority calculation (#9834)
  Do not propagate host RUSTFLAGS when checking for WASM toolchain (#9926)
  Small quoting comment fix (#9927)
  add clippy to CI (#9694)
  Ensure BeforeBestBlockBy voting rule accounts for base (#9920)
  rm `.maintain` lock (#9919)
  Downstream `node-template` pull (#9915)
  Implement core::fmt::Debug for BoundedVec (#9914)
  Quickly skip invalid transactions during block authorship. (#9789)
  Add SS58 prefix for Automata (#9805)
  Clean up sc-peerset (#9806)
  ...
@crystalin
Copy link
Contributor

@tomusdrw it looks great. Would it be possible to have either the MAX_SKIPPED_TRANSACTIONS or the soft_deadline be configurable. In our case we often have to skip transactions. With this PR we can have half the time used, which is better than before but we can't fill a block as the MAX_SKIPPED_TRANSACTIONS can easily be exceeded already when reaching half the block allocated time.

@bkchr
Copy link
Member

bkchr commented Oct 24, 2021

@crystalin why do you often have to skip transactions?

@crystalin
Copy link
Contributor

@bkchr
Because in Moonbeam, we are allowing native Ethereum transactions. To deal with weight, we simply do a multiplication of the gas_limit provided with a given ratio number.
However, this allows a user to provide a transaction with a very high gas_limit and so with a very high weight associated, even if the actual weight used would be a lot lower.

In Ethereum maintnet the logic is to try executing the transaction even if the current block gas + the current gas_limit is over the block maximum gas.
But in substrate, the transaction with an expected weight that might go over the block weight limit are skipped (which could be a good optimization)

We observed however that when someone spam few transactions with very high gas_limit, it can prevent the block from producing more than 8 transactions per block.

This PR improves the situation making sure we would use a least 1/2 of the time in block production no matter if many skipped transactions

@bkchr
Copy link
Member

bkchr commented Oct 25, 2021

@crystalin if you prepare a pr, we can merge it.

@crystalin
Copy link
Contributor

@bkchr What would the preferred way ? allow to control the MAX_SKIPPED_TRANSACTIONS or the soft_deadline (or both) ?

@bkchr
Copy link
Member

bkchr commented Oct 25, 2021

Both? IDK. What do you need?

@tomusdrw
Copy link
Contributor Author

I think soft_deadline is less of a footgun (basically it should be configured as a percentage of the hard deadline), with configuration max skipped transactions your are risking reaching the hard_deadline and potentially missing a slot.

I find it a bit surprising that the current hardcoded soft deadline is insufficient though. If we assume the block time of 6s and hard deadline of 3s, having around 1.5s (soft deadline) to produce a block should be more than enough - if block production/import takes more than this I feel that maybe the client is not running on powerful enough hardware or the maximal block weight is quite high.

@crystalin
Copy link
Contributor

@tomusdrw the issue is not with blockchain but with parachain:
The hard_deadline is set to 333ms (2/3rd of the max block time 500ms), so the soft_deadline is around 166ms. It is enough to include some transactions but still very limiting

@tomusdrw
Copy link
Contributor Author

I see, making soft_deadline(_percentage) configurable makes sense then.

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C3-medium PR touches the given topic and has a medium impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants