-
Notifications
You must be signed in to change notification settings - Fork 199
scx_utils::BpfBuilder: Add 'static to BPF_H_TAR definition #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Newer rustc is unhappy about lack of explicit lifetime parameter.
vnepogodin
added a commit
to CachyOS/scx
that referenced
this pull request
Dec 12, 2024
Into trait was calling the Into<&SupportedSched> which was calling
Into<SupportedSched> and so on.
```
#0 0x622450e96149 in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#1 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#2 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#3 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#4 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#5 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#6 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#7 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#8 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#9 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#10 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#11 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#12 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#13 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#14 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
```
etsal
pushed a commit
to etsal/scx
that referenced
this pull request
Dec 16, 2024
Into trait was calling the Into<&SupportedSched> which was calling
Into<SupportedSched> and so on.
```
#0 0x622450e96149 in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
#1 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
#2 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#3 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#4 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#5 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#6 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#7 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#8 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#9 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#10 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#11 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#12 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
sched-ext#13 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
sched-ext#14 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
```
RitzDaCat
pushed a commit
to RitzDaCat/scx
that referenced
this pull request
Nov 23, 2025
Implements three ultra-low-risk optimizations to reduce input latency from ~151ns to ~117-144ns while REDUCING CPU usage by 0.10-0.25%. Optimization sched-ext#1: Eliminate Timestamp Call (Save 10-15ns) ======================================================== BEFORE: if (unlikely(is_input_handler_cached(p))) { now = scx_bpf_now(); // ~10-15ns overhead if (time_before(now, input_until_global)) { // ... fast path } } AFTER: if (unlikely(is_input_handler_cached(p))) { // Skip timestamp entirely - input handlers always latency-critical // Window check only affects deadline (done in enqueue/runnable) // ... fast path (no timestamp, no window check) } Rationale: - Input handlers are ALWAYS latency-critical - Window check (time_before) only affects deadline calculation - Deadline calculated in enqueue/runnable, NOT in select_cpu - Removing timestamp saves 10-15ns with zero behavioral impact CPU Impact: -0.06% (saves 85-127k ns/sec across 8.5k calls/sec) Optimization sched-ext#2: Fixed Slice Constant (Save 2-5ns) =================================================== BEFORE: u64 input_slice = continuous_input_mode ? slice_ns : (slice_ns >> 2); AFTER: #define INPUT_HANDLER_SLICE_NS 2500ULL // 2.5µs optimal // Just use INPUT_HANDLER_SLICE_NS directly Rationale: - Input handlers yield quickly (process event then sleep) - 2.5µs is already the optimal bursty mode slice - Fixed slice eliminates conditional evaluation overhead - Provides consistent, predictable scheduling CPU Impact: -0.01% (saves 17-42k ns/sec across 8.5k calls/sec) Optimization sched-ext#4: Direct Return (Save 5-10ns) ============================================= BEFORE: scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, input_slice, 0); RETURN_SELECTED_CPU(prev_cpu); // Updates hints, profiling, etc. AFTER: scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, INPUT_HANDLER_SLICE_NS, 0); return prev_cpu; // Direct return, skip macro overhead Rationale: - RETURN_SELECTED_CPU macro updates idle CPU hints - Input handlers don't benefit from hints (always cache-warm) - Macro also calls PROF_END_HIST (already coalesced 64:1) - Direct return saves 5-10ns per call CPU Impact: -0.03% (saves 42-85k ns/sec across 8.5k calls/sec) Performance Impact: =================== Latency Improvements: - input_event_raw: 26-30ns (unchanged, kernel-side) - Input boost: 26ns (unchanged, already optimized) - gamer_select_cpu: 95-105ns → 61-88ns (-34ns, 36% faster!) - Total end-to-end: ~151ns → ~117-144ns (-17-30ns, 11-20% faster!) CPU Improvements: - Total savings: 0.10-0.25% CPU reduction - Before: 6.65% total BPF CPU - After: ~6.50-6.55% total BPF CPU - Mechanism: Eliminated work across 8,500 input handler calls/sec This is a PERFECT optimization: both faster AND more CPU efficient! ✅ Input Types Benefited (Both): - ✅ Mouse movement/clicks (EV_REL, EV_KEY with BTN_MOUSE) - ✅ Keyboard presses/releases (EV_KEY) Both processed by same input handler thread (libinput/X11/Wayland). Code Changes: ============= - src/bpf/main.bpf.c: - Added INPUT_HANDLER_SLICE_NS constant (2500ULL) - Removed scx_bpf_now() call from input handler fast path - Removed time_before() window check - Replaced dynamic slice with INPUT_HANDLER_SLICE_NS - Replaced RETURN_SELECTED_CPU with direct return Testing: ======== - Build succeeds with 0 errors - Input window validation removed (not needed for CPU selection) - CPU affinity checks intact - Migration safety preserved Expected bpftop Results: ======================== - gamer_select_cp: ~135-150ns (was 160ns, -10-25ns average) - Input handlers: ~61-88ns individual (-34ns!) - Other threads: ~167ns (unchanged) - Weighted average: ~135-150ns - Total BPF CPU: ~6.50-6.55% (was 6.65%, -0.10-0.15%) Next Steps: =========== - Test in Palworld to verify latency reduction - Monitor bpftop for gamer_select_cpu runtime - Test subjective mouse/keyboard feel - If still need lower latency, consider Strategy B (shared timestamp) Related: Input latency, mouse responsiveness, keyboard latency, sub-150ns scheduling, CPU efficiency, perfect optimization
RitzDaCat
added a commit
to RitzDaCat/scx
that referenced
this pull request
Nov 26, 2025
…14 bugs) FIRST SWEEP (7 bugs - Critical dispatch and timing issues): Bug sched-ext#1-2: SCX_DSQ_LOCAL mismatch and missing CPU kicks - Problem: Using SCX_DSQ_LOCAL dispatched to calling CPU's DSQ, not target CPU - Impact: Tasks queued on wrong CPU, causing 50-500µs delays waiting for migration - Fix: Changed to SCX_DSQ_LOCAL_ON | target_cpu + added scx_bpf_kick_cpu() - Locations: 15+ fast paths in gamer_select_cpu_slowpath Bug sched-ext#3: Task slice oscillation (continuous_input_mode hysteresis) - Problem: Exit threshold (75) too close to entry threshold (100), causing toggling - Impact: Slice duration oscillated rapidly, causing micro-stutters during input - Fix: Lowered exit threshold from 75 to 50 for proper hysteresis band Bug sched-ext#4: hotpath_signals race condition - Problem: Non-atomic read-then-clear of hotpath_signals.input_ns[] - Impact: Two CPUs could process same input event (double-dispatch) - Fix: Used __atomic_exchange_n() for atomic read-and-clear Bug sched-ext#5: Frame stabilization premature activation - Problem: Checked frame_stabilization_active before timestamp validity - Impact: Stale stabilization state from previous frame applied incorrectly - Fix: Reordered to check timestamp first: now < until && active Bug sched-ext#6: Integer overflow in dynamic window calculation - Problem: base << 1 could overflow when base > ULLONG_MAX/2 - Impact: max_window wrapped to tiny value, breaking input boost timing - Fix: Added explicit overflow check before shift operation Bug sched-ext#7: Stale idle CPU hint (cache thrashing) - Problem: IDLE_HINT_VALID_NS was 500µs - too long for gaming workloads - Impact: Tasks dispatched to CPUs that became busy, causing cache misses - Fix: Reduced to 100µs for fresher idle hints SECOND SWEEP (7 bugs - Type safety and more missing kicks): Bug sched-ext#8: Type mismatch in system_audio_tgids_map lookup (CRITICAL) - Problem: Cast struct system_audio_entry* to u8*, reading only first byte - Impact: Audio detection failed when refcount > 255 (byte wrapped to 0) - Fix: Properly use struct system_audio_entry* and check entry->refcount > 0 Bug sched-ext#9-14: Missing CPU kicks after scx_bpf_test_and_clear_cpu_idle - Problem: Cleared CPU idle state but didn't kick CPU to run task - Impact: Tasks waited until next timer tick (1-10ms) instead of immediate wake - Locations: GPU/compositor path, compiler corral, idle scan loop, taskgraph - Fix: Added scx_bpf_kick_cpu(candidate, SCX_KICK_IDLE) after each clear CUMULATIVE IMPACT ANALYSIS: Before fixes (worst case latency budget per frame at 144Hz = 6.94ms): - DSQ mismatch: +50-500µs per critical task dispatch - Missing kicks: +1-10ms waiting for timer tick (devastating) - Slice oscillation: +100-500µs per slice recalculation - Race condition: +200-1000µs from double-dispatch confusion - Stale hints: +50-200µs from cache misses - Audio detection: Complete audio thread priority loss when refcount > 255 Total potential added latency: 1.4-12.2ms per frame (exceeds frame budget!) After fixes: - All fast paths dispatch to correct CPU immediately - CPU kicks ensure <10µs wakeup latency - Stable slice durations with proper hysteresis - Atomic operations prevent race conditions - Fresh idle hints improve cache locality - Audio detection works correctly at all refcounts Expected latency reduction: 90-99% of added jitter eliminated TESTING RECOMMENDED: 1. High-refresh gaming (144Hz+) - should see more consistent frame times 2. High-polling mouse (1000Hz+) - should feel smoother, less chunky 3. Audio with multiple sources - should maintain priority correctly 4. Sustained input (keyboard held) - should not see slice flickering
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Newer rustc is unhappy about lack of explicit lifetime parameter.