Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Unified Scheduler #28307

Closed
wants to merge 23 commits into from

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Oct 9, 2022

Problem

conceived as outlined at #23548, my idea improves replay time 2x as evidenced at #27666, but it still lacks any design doc for others to review/evaluate it.

(my draft pr stack is too deep; hopefully, i think i can pop them one-by-one, starting from this one.)

Summary of Changes

context #23548

@apfitzge apfitzge self-requested a review October 10, 2022 14:08

## Currnet implementation problem

- why batching isn't good?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely see that this may be the case for replay, but is it for banking as well? Still learning all the different parts of banking, but I thought one of the main benefits of batching for banking is that we record to PoH less often which is more efficient.

@carllin, any thoughts here?

Copy link
Member Author

@ryoqun ryoqun Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought one of the main benefits of batching for banking is that we record to PoH less often which is more efficient.

that concern is correct, however...:

  • poh recorder's lock contention can be reduced significantly (even can be eliminated maybe) once there is only a single unified thread recording to it. (ie. the scheduler thread)
  • also, scheduler can just buffer txes to record to poh until lock conflicting tx is encountered.

that said, I thought main benefits of batching would be AccountsDb locking overhead reduction.

- determinicity
- strict fairness, only considering priority_fee/fcfs
- approx. O(n) where n is gross total of addresses in transactions.
- strict adherance to local fee market
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems more fair from the user perspective, but it may not yield the best fee-collection results for validators. A greedy validator will not adhere to this and we have no way to enforce it.

Just making a note, not saying it shouldn't be your goal; it may be the case that you can make some optimizations this way which does result in higher throughput and fee-collection!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems more fair from the user perspective, but it may not yield the best fee-collection results for validators. A greedy validator will not adhere to this and we have no way to enforce it.

yep. greedy validator will just seek to mev revenue, rather than fee-collection maximization, considering relatively cheap tx fees. cc: @buffalu

Just making a note, not saying it shouldn't be your goal; it may be the case that you can make some optimizations this way which does result in higher throughput and fee-collection!

yep, i'll write that reasoning down later for any scrutinization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mev tries to maximize fees collected. have a feeling fully respecting priority payment won’t maximize fees

- strict adherance to local fee market
- censorship resistent
- 100k scheduling/s
- highly contended address with 1m pending txes doesn't affect overall performance at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've talked with @taozhu-chicago about this, likely we'll want to move the QoS checks for account limits up into the scheduler:

  1. scheduler will already be looking at accounts locked by the tx, so easy to fit in per-account so we could track account costs as we schedule
  2. this could allow us to drop lower-priority tx early in the scheduling stage so we don't even add them into the scheduling structures

good goal to have, but if we inlcude these changes we probably won't have to deal with 1m pending txs on the same address

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, but this would conflict with your goal of "no estimator/prediction" 😅

## further work
- rework compute unit, consider runtime account state transition verification/N addresses (i.e. scheduling cost)
- what is going with the bankless?: meh
- scrambled tx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Member Author

@ryoqun ryoqun Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this: #23837

note that this proposal is way more futuristic than even this unified scheduler thing.

- single threaded
- determinicity
- strict fairness, only considering priority_fee/fcfs
- approx. O(n) where n is gross total of addresses in transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure O(n) isn't going to be possible w/ guaranteeing priority order.
Simplest case I can think of is n tx all write-locking account A that we receive in a batch in random order. Even building a simple priority queue wiil be O(nlogn).


also, increasing replaying/banking threads doesn't linearly scale to the number of cpu cores.

## Present (and projected) transaction patterns
Copy link
Member Author

@ryoqun ryoqun Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1/2)

That means, synthesized benchmark results should be taken with a grain of salt
because they tend to be overly uniform, not reflecting the realistic usage.

## Redefined scheduler's problem space
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2/2)

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 29, 2022
@github-actions github-actions bot closed this Jan 6, 2023
@ryoqun ryoqun reopened this Jan 11, 2023
@github-actions github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 12, 2023
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 27, 2023
@github-actions github-actions bot closed this Feb 6, 2023
@tao-stones tao-stones reopened this Feb 6, 2023
@github-actions github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 7, 2023
@Huzaifa696
Copy link

"Hello! I have a question about transaction patterns. In the third paragraph of the document, It says, "when seen from the viewpoint of the on-chain state, a very few of its addresses can be highly contended." I was wondering if you could give me an estimate of what "a very few" means in terms of the total transactions that a particular leader is processing.

For instance, if a leader is handling 6,000 transactions per second that involve a total of 30,000 accounts (chosen arbitrarily), would it be reasonable to assume that, on average, around 300 of these accounts, or 0.1%, are in conflict with each other?"

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Mar 6, 2023
@github-actions github-actions bot closed this Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants