Introduce primitive threading in unified scheduler #34676

ryoqun · 2024-01-06T14:32:17Z

(no need to start to review this at weekend).

Problem

There's no threading support in the unified scheduler, which will exhibit an extremely minimized overhead for unbatched processing.

Summary of Changes

Add minimum threading model and wire it, which still lacks suspending (and resuming) threads and aborting them on transaction error in dead blocks. Also, current impl is still full of .unwrap()s.

also, note that the actual scheduling code will come in later prs. after that, the code in this pr is ready for multi threaded with proper scheduling code. yet, it's still single threaded at the time of this pr. all this breakdowns are done for bite-sized pr reviewing.

Also, patching crossbeam is planned but yet not included in this pr as well.

Context

extracted from: #33070

ryoqun · 2024-01-06T14:33:21Z

unified-scheduler-pool/src/lib.rs

@@ -31,6 +35,7 @@ use {
            atomic::{AtomicU64, Ordering::Relaxed},
            Arc, Mutex, Weak,
        },
+        thread::{self, JoinHandle},


hi, thread. welcome to the unified scheduler land. lol

ryoqun · 2024-01-06T14:35:31Z

unified-scheduler-pool/src/lib.rs

+// minimum at the cost of a single heap allocation per switching for the sake of Box-ing the Self
+// type to avoid infinite mem::size_of() due to the recursive type structure. Needless to say, such
+// an allocation can be amortized to be negligible.
+enum ChainedChannel<P1, P2> {


originally, i used condvar, but later i noticed this trick and never looked back. :) hope i explained it well in the above comment...

Appreciate the detailed comment here; I think the chained channel idea makes sense. My only real gripe is the visibility to internals within this code, the chained channel is used for synchronization but could easily be screwed up by a change in the code of the handler thread.

It seems possible we can isolate this type and offer an interface for recving that will internally mutate the receiving channel; which will guarantee each receiver can only receive it once (which is currently depending on the handler to overwrite its' receiver).

(pseudo-code with removed generics)

struct ChainedChannelReceiver<...> { receiver: Receiver<...>, context: Context, } impl ChainedChannelReceiver { /// Blocks to receive the next Payload. /// If a context-switch is recveived, the context and channel are mutated, /// and the next payload message will be blocked on. pub fn recv(&mut self) -> Result<Payload> { loop { match self.receiver.recv()? { Payload(p) => return Ok(p), PayloadAndChannel((p, new_channel)) => { self.receiver = new_channel; self.context = p; } } } } }

Not sure my suggestion works if the handler threads will receive from other channels that would actually require a select! though.

nice suggestion :) 9b5e338

Not sure my suggestion works if the handler threads will receive from other channels that would actually require a select! though.

this is good point as well. i did my best in the commit.

ryoqun · 2024-01-06T14:41:07Z

unified-scheduler-pool/src/lib.rs

@@ -577,8 +966,10 @@ mod tests {

    #[derive(Debug)]
    struct AsyncScheduler<const TRIGGER_RACE_CONDITION: bool>(
-        PooledScheduler<DefaultTaskHandler>,


now stop using the impl by wrapping as the impl is getting mature, meaning it's harder to use for this race condition test

ryoqun · 2024-01-06T14:44:47Z

unified-scheduler-pool/src/lib.rs

 }

 #[derive(Debug)]
 pub struct PooledSchedulerInner<S: SpawnableScheduler<TH>, TH: TaskHandler> {
-    id: SchedulerId,
+    thread_manager: ThreadManager<S, TH>,


note that ThreadManager is planned to be wrapped in Arc<Mutex<_>> soon. so, i didn't flatten it into PooledSchedulerInner.

well well well, I ended up not doing so with different impl.

ryoqun · 2024-01-06T14:47:29Z

unified-scheduler-pool/src/lib.rs

@@ -295,7 +677,7 @@ impl<TH: TaskHandler> InstalledScheduler for PooledScheduler<TH> {
    }

    fn pause_for_recent_blockhash(&mut self) {
-        // not surprisingly, there's nothing to do for this min impl!


ryoqun · 2024-01-06T14:51:19Z

unified-scheduler-pool/src/lib.rs

+            move || loop {
+                let mut is_finished = false;
+                while !is_finished {
+                    select! {


select!'s directive order is logically off intentionally for upcoming use of select_biased!..

ryoqun · 2024-01-06T15:00:57Z

unified-scheduler-pool/src/lib.rs

+        // High-level flow of new tasks:
+        // 1. the replay stage thread send a new task.
+        // 2. the scheduler thread accepts the task.
+        // 3. the scheduler thread dispatches the task after proper locking.
+        // 4. the handler thread processes the dispatched task.
+        // 5. the handler thread reply back to the scheduler thread as an executed task.
+        // 6. the scheduler thread post-processes the executed task.
+        // 7. the scheduler thread send the executed task to the accumulator thread.
+        // 8. the accumulator thread examines the executed task's result and accumulate its timing,
+        //    finally dropping the transaction inside the executed task.


across the new code, i used the term task instead of transaction for envisioning bundles as well along with transactions under the generic struct of Task.....

ryoqun · 2024-01-06T15:04:40Z

unified-scheduler-pool/src/lib.rs

+        );
+    }
+
+    fn start_threads(&mut self, context: &SchedulingContext) {


closures in this method is pretty full of unwraps on crossbeam channels. most of them will be addressed in later prs.

that's said, these unwraps should be acceptable even for production. That's because once scheduler is spawn, it never stops any of the backing threads in this pr.

ryoqun · 2024-01-06T15:05:18Z

unified-scheduler-pool/src/lib.rs

+    scheduler_thread: Option<JoinHandle<()>>,
+    handler_threads: Vec<JoinHandle<()>>,
+    accumulator_thread: Option<JoinHandle<()>>,


note that these join handles aren't used properly yet in this pr.

codecov · 2024-01-06T15:51:26Z

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (e2c2029) 81.7% compared to head (afd580e) 81.7%.
Report is 22 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #34676    +/-   ##
========================================
  Coverage    81.7%    81.7%            
========================================
  Files         825      825            
  Lines      223261   223447   +186     
========================================
+ Hits       182605   182778   +173     
- Misses      40656    40669    +13

apfitzge · 2024-01-08T19:46:31Z

unified-scheduler-pool/src/lib.rs

+        // 7. the scheduler thread send the executed task to the accumulator thread.
+        // 8. the accumulator thread examines the executed task's result and accumulate its timing,


Is accumulation expensive? Why not just do accumulation in the scheduling thread?

yeah, it is expensive. any offload-able work should be put elsewhere from the scheduling thread: beae861

like #34676 (comment), i actually observed this.

apfitzge · 2024-01-08T20:25:06Z

unified-scheduler-pool/src/lib.rs

+            }
+        };
+
+        // This thread is needed because .accumualte() and drop::<SanitizedTransaction>() both are


what is heavy in SanitizedTransaction::drop, it should just be free-ing some heap allocations, right?

yep, just free-ing some heap allocations is relatively heavy for the unified scheduler. Also, SanitizedTransaction is rather complex struct: beae861 i documented how it's severe in that long monologue in the commit

I observed some visible cpu usage just from SanitizedTransaction::drop when casual profiling.

apfitzge · 2024-01-08T20:48:49Z

unified-scheduler-pool/src/lib.rs

+            let finished_task_sender = finished_task_sender.clone();
+
+            move || loop {
+                let (task, sender) = select! {


there's no selection going on here, since there's only one channel receiving?

oh, nice question. select! here is intentional because the handler will soon use prioritized queue, which will be realized just by using two channels together: 1. low priority tasks (not conflicting with others), 2. high priority tasks (conflicting with others).

so, I'll leave this rather odd select! as-is in this pr.

the handler will soon use prioritized queue, ....

soon ~~anza-xyz#627 (comment)~~ (EDIT: anza-xyz#627 (comment))

apfitzge · 2024-01-08T20:53:23Z

unified-scheduler-pool/src/lib.rs

+// minimum at the cost of a single heap allocation per switching for the sake of Box-ing the Self
+// type to avoid infinite mem::size_of() due to the recursive type structure. Needless to say, such
+// an allocation can be amortized to be negligible.
+enum ChainedChannel<P1, P2> {


Appreciate the detailed comment here; I think the chained channel idea makes sense. My only real gripe is the visibility to internals within this code, the chained channel is used for synchronization but could easily be screwed up by a change in the code of the handler thread.

It seems possible we can isolate this type and offer an interface for recving that will internally mutate the receiving channel; which will guarantee each receiver can only receive it once (which is currently depending on the handler to overwrite its' receiver).

(pseudo-code with removed generics)

struct ChainedChannelReceiver<...> { receiver: Receiver<...>, context: Context, } impl ChainedChannelReceiver { /// Blocks to receive the next Payload. /// If a context-switch is recveived, the context and channel are mutated, /// and the next payload message will be blocked on. pub fn recv(&mut self) -> Result<Payload> { loop { match self.receiver.recv()? { Payload(p) => return Ok(p), PayloadAndChannel((p, new_channel)) => { self.receiver = new_channel; self.context = p; } } } } }

apfitzge · 2024-01-08T20:54:16Z

unified-scheduler-pool/src/lib.rs

+// minimum at the cost of a single heap allocation per switching for the sake of Box-ing the Self
+// type to avoid infinite mem::size_of() due to the recursive type structure. Needless to say, such
+// an allocation can be amortized to be negligible.
+enum ChainedChannel<P1, P2> {


Not sure my suggestion works if the handler threads will receive from other channels that would actually require a select! though.

ryoqun · 2024-01-09T03:17:50Z

unified-scheduler-pool/src/lib.rs

+        // and unified_scheduler both tend to process non-conflicting batches in parallel as normal
+        // operation.


wording: as part of normal functioning

done: d781743

ryoqun · 2024-01-09T03:20:19Z

unified-scheduler-pool/src/lib.rs

@@ -194,34 +199,422 @@ impl TaskHandler for DefaultTaskHandler {
    }
 }

+pub struct ExecutedTask {


nit: remove pub

done: a2eb73e

ryoqun · 2024-01-09T16:11:39Z

@apfitzge thanks for timely review! i think I've addressed all your review comments.

apfitzge · 2024-01-09T18:36:48Z

unified-scheduler-pool/src/lib.rs

+            // That's because it's the most notable bottleneck of throughput. Unified scheduler's
+            // overall throughput is largely dependant on its ultra-low latency characteristic,
+            // which is the most important design goal of the scheduler in order to reduce the
+            // transaction confirmation latency for end users.


It might be the bottleneck of just scheduling, but the main bottleneck of replay will still be the processing of transactions (load, execute, commit), right?

The scheduler's ultra-sensitivity to latency seems to be a self-imposed design choice, and I am trying to keep us from introducing additional threads if we can avoid it. We already have a huge over-use of threads in the validator; so trying to avoid introduction of new ones where possible, unless it has demonstrated significant improvement.

I think the sensitivity is most important when all or nearly all transactions conflict. Let's just assume there's a single handler and all transactions conflict for simplicity. A simple sequence diagram:

sequenceDiagram Scheduler->>+Handler: Task(Tx0) Handler->>+Scheduler: ExecutedTask(Tx0) Scheduler->>+Handler: Task(Tx1)

Loading

Correct me if I'm wrong, but it seems the sensitiviy is there because between the time of ExecutedTask(Tx0) and Task(Tx1), the handler thread is idle.
Even with the additional thread which does the accumulation and dropping, there is still going to be some amount of idle time here.

Just as a note, I also ran into this in the block-production scheduler in core, but opted to allow the queueing of txs for specific handlers. Even trying to keep just 2 txs always enqueued gives significant leeway in the latency, since as long as the scheduler can queue another in the time it takes to process a transaction there is nearly no idling time.

It might be the bottleneck of just scheduling, but the main bottleneck of replay will still be the processing of transactions (load, execute, commit), right?

Currently, yes. given ~100 handler threads, scheduler thread can be the bottleneck. and I'm developing this scheduler with that future in mind.

Correct me if I'm wrong, but it seems the sensitiviy is there because between the time of ExecutedTask(Tx0) and Task(Tx1), the handler thread is idle.
Even with the additional thread which does the accumulation and dropping, there is still going to be some amount of idle time here.

yeah. these are correct.

Just as a note, I also ran into this in the block-production scheduler in core, but opted to allow the queueing of txs for specific handlers. Even trying to keep just 2 txs always enqueued gives significant leeway in the latency, since as long as the scheduler can queue another in the time it takes to process a transaction there is nearly no idling time.

hmm, this seems odd. busy looping at the other hand of mpmc channels usually gives round-trip less-than-500ns latency, which i'm planning to employ for unified scheduler. i think those threads were yielding to the kernel (ie parking)?

as you pointed out, i can't eliminate the idle time. but then, 1us (500ns for channel + 500ns for scheduling) per a single conflicting tx is acceptable latency.

actually, i wanted to show this as some bench, but i'm running out of time.. i spend too much on the other concern... lol #34676 (comment)

i think those threads were yielding to the kernel (ie parking)?

possible.

as you pointed out, i can't eliminate the idle time. but then, 1us (500ns for channel + 500ns for scheduling) per a single conflicting tx is acceptable latency.

what's the added latency from dropping the txs? or in other words, what would be the affect on overall throughput if we did not have the separate "drop" thread? My main concern here was just that we're spawning this new thread as part of the scheduler, and if that's necessary we should be able to back that decision up with some numbers on how much it helps (or how much we are hurt by not doing that).

as you pointed out, i can't eliminate the idle time.

yeah I think without allowing conflicts to actually queue up, this cannot be resolved entirely, just mitigated.

thanks for detailing your concern.

i've finally wrote benches: 6120684 (won't push to this pr)....:

test bench_tx_throughput_drop_in_scheduler ... bench: 8,556,627 ns/iter (+/- 3,168,724) test bench_tx_throughput_drop_in_accumulator ... bench: 7,334,701 ns/iter (+/- 2,600,740) test bench_tx_throughput_drop_in_scheduler_conflicting ... bench: 11,403,197 ns/iter (+/- 3,062,017) test bench_tx_throughput_drop_in_accumulator_conflicting ... bench: 10,794,434 ns/iter (+/- 6,808,692)

so, processing 10,000 txes takes always longer when we drop txes in the scheduler thread, regardless conflicting or not.

note that you have to ensure to rebuild when repoto these results.. cargo usually skips rebuild with feature unification... (i.e use cargo clean --release -p solana-unified-scheduler-pool).

EDIT: oh, i forget to execute_timings.accumulate() in the scheduler thread. so, these numbers could go worse for dropping in the scheduler thread..

The raw numbers looks like dropping in scheduler takes longer, which isn't surprising since its doing more work; but they are well within the statistical noise.

bench_tx_throughput_drop_in_scheduler_conflicting raw shows 609us higher than bench_tx_throughput_drop_in_accumulator_conflicting, but the error bounds are 3ms and 6ms.

This seems to imply that there's not a statistically significant effect from having the separate accumulator thread. Am I missing something in these results?

edit: I think we are not really seeing eye-to-eye on this additional thread, and might just be we have different perspectives. Feel free to pull another core contributor in for their opinion on this aspect of the PR. I don't recall seeing anything else that would block this PR for me, but will take another look once the thread-issue is settled.

I appreciate the feedback. Seems casual benchmark wasn't enough.

I reverted the accumulator thread (2f592f5) for now to land this pr.

Now that I've done somewhat extended synthesized bench-marking on this pr. That said, I still think the accumulator thread performs better in the end.

However, because this pr doesn't contain any actual scheduling impl, i concluded that presenting some convincing numbers is rather difficult. ;) So, I'll try to ship the scheduling impl first, instead of finishing up the scheduler pool (was original plan). Even later, I'll try to land the accumulator thread along with other missing things in the scheduler pool with more proper justification / more realistic benchmark numbers.

In other words, landing the accumulator thread wasn't essential for now. So, I was too early to ship it.

Am I missing something in these results?

yea. blame me for not explaining enough. didn't expect the accumulator thread to be so hot...

the accumulator thread will contain other offloaded tasks.

Namely, it's used for dumping all detailed timings for individual transactions to influxdb at background fashion (the code is in the base pr). Also, I'm planning to move rpc related blockstore writes here, as this isn't essential for bank hash. These aren't happening in this pr, so the accumulator thread itself was just ahead of it.

dropping transactions can be really heavy

in the intention of a quick sanity check bench, i wasn't bothered to do so. but transactions can be really large with many instructions with each having instruction data blobs, pubkeys...

when blocks contain these transactions (it does usually), it clearly throttles the scheduler thread when dropped inside that.

medians inside overlapping deviation range don't necessarily mean statistical noise.

firstly, these multi-threaded benches tend to contain high variance in runtime duration. so, i did little attention to the deviations. I just shared the numbers after some casual multiple runs, thinking these numbers are relatively stable enough.

maybe, seeing is believing. :) here's comparable criterion bench results, which i wrote with different parameters/settings:

note that I think these periodic outliers are due to cpu freq scaling. Sans these outlifers, I think the primary convex of pdf is shifted in favor of dropping in accumlator, clearly showing statistically significant improvement. at the same time, these median differencies are close to each other within the deviations (=~ error bounds). Also, this improvement can visibly be confirmed by plotted individual samples.

I think we are not really seeing eye-to-eye on this additional thread, and might just be we have different perspectives.

i think we have a constructive discussion. thanks for patience and expressing your perspective.

Feel free to pull another core contributor in for their opinion on this aspect of the PR.

thanks for suggestion. yeah, we could seek other's opinions, should we be stuck in other topics.

I don't recall seeing anything else that would block this PR for me, but will take another look once the thread-issue is settled.

that's good to hear. :)

unified-scheduler-pool/src/lib.rs

ryoqun · 2024-01-10T05:39:23Z

unified-scheduler-pool/src/lib.rs

+    }
+
+    // P doesn't need to be `: Clone`, yet rustc derive can't handle it.
+    // see https://github.com/rust-lang/rust/issues/26925


ref: rust-lang/rust#26925

ryoqun · 2024-01-12T15:51:02Z

@apfitzge thanks for in-depth code review! it took time at my side, but i think it's ready for another review round from you.

also, i clarified comments a bit from the prev review round: d6093e1

thanks as always.

if possible, i want to land this pr soon before the feature freeze: https://discord.com/channels/428295358100013066/910937142182682656/1194783158676230268

ryoqun · 2024-01-18T15:30:57Z

@apfitzge thanks for review! I've replied to you: #34676 (comment)

also, i pushed new commit: cc5a1f0 and rebase was needed to pick 0e8f2de

apfitzge

Just a couple of minor comments, feel free to just do them as follow-up PRs.
Appreciate all the back-and-forth on the PR!

apfitzge · 2024-01-18T19:08:27Z

unified-scheduler-pool/src/lib.rs

+}
+
+// A very tiny generic message type to signal about opening and closing of subchannels, which are
+// logically segmented series of Payloads (P1) over a signle continuous time-span, potentially


Suggested change

// logically segmented series of Payloads (P1) over a signle continuous time-span, potentially

// logically segmented series of Payloads (P1) over a single continuous time-span, potentially

oops, nice!: 07de5b4

apfitzge · 2024-01-18T19:37:42Z

unified-scheduler-pool/src/lib.rs

        }
    }

    impl<const TRIGGER_RACE_CONDITION: bool> InstalledScheduler
        for AsyncScheduler<TRIGGER_RACE_CONDITION>
    {
        fn id(&self) -> SchedulerId {
-            self.0.id()
+            todo!();


probably better to use unimplemented!(), I don't believe you are intending to do this, right?

you're right: afd580e

apfitzge · 2024-01-18T19:41:49Z

unified-scheduler-pool/src/lib.rs

+            // cycles out of the scheduler thread. Thus, any kinds of unessential overhead sources
+            // like syscalls, VDSO, and even memory (de)allocation should be avoided at all costs
+            // by design or by means of offloading at the last resort.
+            move || loop {


There's no exit from this loop aside from crashing as far as I can tell.
If replay stage is shutting down, channels get disconnected on the send side and we'd hit Errs in the unwrap - instead of the scheduler loop cleaning shutting down.

Feel free to address in a follow-up PR, as I think you mentioned you were planning already.

your analysis is right.

Feel free to address in a follow-up PR, as I think you mentioned you were planning already.

yeah. I'll do. but sadly, solana-validator/solana-ledger-tool's process termination is so broken in multiple ways. so, the cascading panics won't happen actually. that's why ci is green, maybe.

namely, there's some cyclic link which prevents bank_forks from being dropped. i have a fix in #33070. note that this bug quite low-priority, imo. there's more deeper problems: solana-validator's exit subcommand and solana-ledger-tool themselvs aren't trying to do sane shutdown cleaning up.

so, I won't dig in this tempting rabbit hole before i ship the scheduler code at least.

ryoqun · 2024-01-19T15:05:30Z

Just a couple of minor comments, feel free to just do them as follow-up PRs. Appreciate all the back-and-forth on the PR!

thanks for timely review as always. according to the discord talk, this pr is blocked (https://discord.com/channels/428295358100013066/910937142182682656/1197716176764153966) and i just addressed nits meanwhile and re-requested another code review.

apfitzge · 2024-01-19T17:26:58Z

Just a couple of minor comments, feel free to just do them as follow-up PRs. Appreciate all the back-and-forth on the PR!

thanks for timely review as always. according to the discord talk, this pr is blocked (discord.com/channels/428295358100013066/910937142182682656/1197716176764153966) and i just addressed nits meanwhile and re-requested another code review.

Sorry my slowness made us miss the cutoff!

ryoqun · 2024-01-23T14:19:49Z

Sorry my slowness made us miss the cutoff!

well, i don't think so. as i said above, your reviews were very timely. missing the cutoff was mainly due to my hacking and benching, resulting in extended back-and-forth..

seems v1.18 branching is imminent (https://discord.com/channels/428295358100013066/910937142182682656/1199023180598218803), so i just humbly request another approval other then a normal comment. :)

apfitzge

lgtm

ryoqun · 2024-03-23T07:13:36Z

unified-scheduler-pool/src/lib.rs

+            // Now, this is the main loop for the scheduler thread, which is a special beast.
+            //
+            // That's because it could be the most notable bottleneck of throughput in the future
+            // when there are ~100 handler threads. Unified scheduler's overall throughput is
+            // largely dependant on its ultra-low latency characteristic, which is the most
+            // important design goal of the scheduler in order to reduce the transaction
+            // confirmation latency for end users.
+            //
+            // Firstly, the scheduler thread must handle incoming messages from thread(s) owned by
+            // the replay stage or the banking stage. It also must handle incoming messages from
+            // the multi-threaded handlers. This heavily-multi-threaded whole processing load must
+            // be coped just with the single-threaded scheduler, to attain ideal cpu cache
+            // friendliness and main memory bandwidth saturation with its shared-nothing
+            // single-threaded account locking implementation. In other words, the per-task
+            // processing efficiency of the main loop codifies the upper bound of horizontal
+            // scalability of the unified scheduler.
+            //
+            // Moreover, the scheduler is designed to handle tasks without batching at all in the
+            // pursuit of saturating all of the handler threads with maximally-fine-grained
+            // concurrency density for throughput as the second design goal. This design goal
+            // relies on the assumption that there's no considerable penalty arising from the
+            // unbatched manner of processing.
+            //
+            // Note that this assumption isn't true as of writing. The current code path
+            // underneath execute_batch() isn't optimized for unified scheduler's load pattern (ie.
+            // batches just with a single transaction) at all. This will be addressed in the
+            // future.
+            //
+            // These two key elements of the design philosophy lead to the rather unforgiving
+            // implementation burden: Degraded performance would acutely manifest from an even tiny
+            // amount of individual cpu-bound processing delay in the scheduler thread, like when
+            // dispatching the next conflicting task after receiving the previous finished one from
+            // the handler.
+            //
+            // Thus, it's fatal for unified scheduler's advertised superiority to squeeze every cpu
+            // cycles out of the scheduler thread. Thus, any kinds of unessential overhead sources
+            // like syscalls, VDSO, and even memory (de)allocation should be avoided at all costs
+            // by design or by means of offloading at the last resort.


here context: anza-xyz#129 (comment)

ryoqun · 2024-06-21T04:10:27Z

unified-scheduler-pool/src/lib.rs

-            result_with_timings: Mutex::new((Ok(()), ExecuteTimings::default())),
-        }
+    fn from_inner(mut inner: Self::Inner, context: SchedulingContext) -> Self {
+        inner.thread_manager.start_session(&context);


ryoqun · 2024-06-21T04:11:37Z

unified-scheduler-pool/src/lib.rs

+        // we're hard-coding the number of handler thread to 1, meaning this impl is currently
+        // single-threaded still.
+        let handler_count = 1;
+
        Self::from_inner(


this from_inner() is especially problematic. (back ref: anza-xyz#1815)

ryoqun requested a review from apfitzge January 6, 2024 14:32

ryoqun commented Jan 6, 2024

View reviewed changes

apfitzge reviewed Jan 8, 2024

View reviewed changes

apfitzge self-requested a review January 8, 2024 20:54

ryoqun commented Jan 9, 2024

View reviewed changes

ryoqun force-pushed the unified-scheduler-min-threading branch from 36632b4 to 9b5e338 Compare January 9, 2024 15:58

apfitzge reviewed Jan 9, 2024

View reviewed changes

ryoqun commented Jan 10, 2024

View reviewed changes

ryoqun mentioned this pull request Jan 12, 2024

Bench execute batch no more overhead hacks apfitzge/agave#3

Closed

ryoqun requested a review from apfitzge January 12, 2024 15:47

ryoqun force-pushed the unified-scheduler-min-threading branch 2 times, most recently from 305c2c5 to 5b62918 Compare January 18, 2024 07:06

ryoqun added 6 commits January 18, 2024 16:14

Introduce primitive threading in unified scheduler

0ba181c

Make the internal struct ExecutedTask not pub

2dd6344

Improve wording a bit

7550bb9

Explain scheduler main loop's overhead sensitivity

30f6abe

Improve wording a bit

3623a34

Define ChainedChannel{Sender, Receiver} wrappers

1334811

ryoqun added 5 commits January 18, 2024 16:14

Clean up a bit

f9fbbfb

Use derivative to avoid manual Clone impl

2a046e4

Clarify comment

ebab9a7

Remove extra whitespace in comment

4225cba

Remove unneeded dyn trait for ChainedChannel

cc5a1f0

ryoqun force-pushed the unified-scheduler-min-threading branch from 5b62918 to cc5a1f0 Compare January 18, 2024 07:15

Remove the accumulator thread for now

2f592f5

apfitzge previously approved these changes Jan 18, 2024

View reviewed changes

ryoqun added 2 commits January 19, 2024 23:47

Fix typo

07de5b4

Use unimplemented!() to convey intention better

afd580e

ryoqun dismissed apfitzge’s stale review via afd580e January 19, 2024 14:52

ryoqun requested a review from apfitzge January 19, 2024 15:02

apfitzge approved these changes Jan 24, 2024

View reviewed changes

ryoqun merged commit bd10386 into solana-labs:master Jan 24, 2024
46 checks passed

ryoqun mentioned this pull request Feb 14, 2024

Add --unified-scheduler-handler-threads #35195

Merged

ryoqun commented Mar 23, 2024

View reviewed changes

ryoqun mentioned this pull request Apr 6, 2024

Prioritize blocked task messaging over idle tasks anza-xyz/agave#627

Merged

ryoqun mentioned this pull request Apr 30, 2024

Simplify {session_,}result_with_timings types anza-xyz/agave#1126

Merged

ryoqun mentioned this pull request Jun 21, 2024

Avoid unneeded start_session() when spawning anza-xyz/agave#1815

Merged

ryoqun commented Jun 21, 2024

View reviewed changes

mergify bot mentioned this pull request Jun 25, 2024

v2.0: Avoid unneeded start_session() with cleanups (bp: #1815, #1861) anza-xyz/agave#1854

Merged

		// 7. the scheduler thread send the executed task to the accumulator thread.
		// 8. the accumulator thread examines the executed task's result and accumulate its timing,

		// and unified_scheduler both tend to process non-conflicting batches in parallel as normal
		// operation.

	// logically segmented series of Payloads (P1) over a signle continuous time-span, potentially
	// logically segmented series of Payloads (P1) over a single continuous time-span, potentially

Introduce primitive threading in unified scheduler #34676

Introduce primitive threading in unified scheduler #34676

Conversation

ryoqun commented Jan 6, 2024 • edited Loading

Problem

Summary of Changes

Context

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 6, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

ryoqun Jan 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 9, 2024 • edited Loading

Choose a reason for hiding this comment

ryoqun Apr 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Jan 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

apfitzge Jan 15, 2024 • edited Loading

Choose a reason for hiding this comment

ryoqun Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Jan 12, 2024

ryoqun commented Jan 18, 2024 • edited Loading

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

ryoqun commented Jan 19, 2024 • edited Loading

apfitzge commented Jan 19, 2024

ryoqun commented Jan 23, 2024

apfitzge left a comment

Choose a reason for hiding this comment

ryoqun Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

ryoqun commented Jan 6, 2024 •

edited

Loading

ryoqun Jan 6, 2024 •

edited

Loading

ryoqun Jan 6, 2024 •

edited

Loading

codecov bot commented Jan 6, 2024 •

edited

Loading

ryoqun Jan 9, 2024 •

edited

Loading

ryoqun Jan 9, 2024 •

edited

Loading

ryoqun Jan 9, 2024 •

edited

Loading

ryoqun Apr 6, 2024 •

edited

Loading

ryoqun Jan 13, 2024 •

edited

Loading

apfitzge Jan 15, 2024 •

edited

Loading

ryoqun Jan 18, 2024 •

edited

Loading

ryoqun commented Jan 18, 2024 •

edited

Loading

ryoqun Jan 19, 2024 •

edited

Loading

ryoqun commented Jan 19, 2024 •

edited

Loading

ryoqun Mar 23, 2024 •

edited

Loading

ryoqun Jun 21, 2024 •

edited

Loading