Skip to content

Conversation

@htejun
Copy link
Contributor

@htejun htejun commented Dec 6, 2023

Newer rustc is unhappy about lack of explicit lifetime parameter.

Newer rustc is unhappy about lack of explicit lifetime parameter.
@htejun htejun requested a review from Byte-Lab December 6, 2023 00:19
@Byte-Lab Byte-Lab merged commit 9eb1f36 into main Dec 6, 2023
vnepogodin added a commit to CachyOS/scx that referenced this pull request Dec 12, 2024
Into trait was calling the Into<&SupportedSched> which was calling
Into<SupportedSched> and so on.

```
    #0 0x622450e96149 in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#1 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#2 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#3 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#4 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#5 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#6 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#7 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#8 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#9 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#10 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#11 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#12 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#13 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#14 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
```
etsal pushed a commit to etsal/scx that referenced this pull request Dec 16, 2024
Into trait was calling the Into<&SupportedSched> which was calling
Into<SupportedSched> and so on.

```
    #0 0x622450e96149 in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    #1 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    #2 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#3 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#4 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#5 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#6 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#7 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#8 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#9 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#10 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#11 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#12 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#13 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#14 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
```
RitzDaCat pushed a commit to RitzDaCat/scx that referenced this pull request Nov 23, 2025
Implements three ultra-low-risk optimizations to reduce input latency
from ~151ns to ~117-144ns while REDUCING CPU usage by 0.10-0.25%.

Optimization sched-ext#1: Eliminate Timestamp Call (Save 10-15ns)
========================================================
BEFORE:
  if (unlikely(is_input_handler_cached(p))) {
      now = scx_bpf_now();  // ~10-15ns overhead
      if (time_before(now, input_until_global)) {
          // ... fast path
      }
  }

AFTER:
  if (unlikely(is_input_handler_cached(p))) {
      // Skip timestamp entirely - input handlers always latency-critical
      // Window check only affects deadline (done in enqueue/runnable)
      // ... fast path (no timestamp, no window check)
  }

Rationale:
- Input handlers are ALWAYS latency-critical
- Window check (time_before) only affects deadline calculation
- Deadline calculated in enqueue/runnable, NOT in select_cpu
- Removing timestamp saves 10-15ns with zero behavioral impact

CPU Impact: -0.06% (saves 85-127k ns/sec across 8.5k calls/sec)

Optimization sched-ext#2: Fixed Slice Constant (Save 2-5ns)
===================================================
BEFORE:
  u64 input_slice = continuous_input_mode ? slice_ns : (slice_ns >> 2);

AFTER:
  #define INPUT_HANDLER_SLICE_NS 2500ULL  // 2.5µs optimal
  // Just use INPUT_HANDLER_SLICE_NS directly

Rationale:
- Input handlers yield quickly (process event then sleep)
- 2.5µs is already the optimal bursty mode slice
- Fixed slice eliminates conditional evaluation overhead
- Provides consistent, predictable scheduling

CPU Impact: -0.01% (saves 17-42k ns/sec across 8.5k calls/sec)

Optimization sched-ext#4: Direct Return (Save 5-10ns)
=============================================
BEFORE:
  scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, input_slice, 0);
  RETURN_SELECTED_CPU(prev_cpu);  // Updates hints, profiling, etc.

AFTER:
  scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, INPUT_HANDLER_SLICE_NS, 0);
  return prev_cpu;  // Direct return, skip macro overhead

Rationale:
- RETURN_SELECTED_CPU macro updates idle CPU hints
- Input handlers don't benefit from hints (always cache-warm)
- Macro also calls PROF_END_HIST (already coalesced 64:1)
- Direct return saves 5-10ns per call

CPU Impact: -0.03% (saves 42-85k ns/sec across 8.5k calls/sec)

Performance Impact:
===================
Latency Improvements:
- input_event_raw: 26-30ns (unchanged, kernel-side)
- Input boost: 26ns (unchanged, already optimized)
- gamer_select_cpu: 95-105ns → 61-88ns (-34ns, 36% faster!)
- Total end-to-end: ~151ns → ~117-144ns (-17-30ns, 11-20% faster!)

CPU Improvements:
- Total savings: 0.10-0.25% CPU reduction
- Before: 6.65% total BPF CPU
- After: ~6.50-6.55% total BPF CPU
- Mechanism: Eliminated work across 8,500 input handler calls/sec

This is a PERFECT optimization: both faster AND more CPU efficient! ✅

Input Types Benefited (Both):
- ✅ Mouse movement/clicks (EV_REL, EV_KEY with BTN_MOUSE)
- ✅ Keyboard presses/releases (EV_KEY)

Both processed by same input handler thread (libinput/X11/Wayland).

Code Changes:
=============
- src/bpf/main.bpf.c:
  - Added INPUT_HANDLER_SLICE_NS constant (2500ULL)
  - Removed scx_bpf_now() call from input handler fast path
  - Removed time_before() window check
  - Replaced dynamic slice with INPUT_HANDLER_SLICE_NS
  - Replaced RETURN_SELECTED_CPU with direct return

Testing:
========
- Build succeeds with 0 errors
- Input window validation removed (not needed for CPU selection)
- CPU affinity checks intact
- Migration safety preserved

Expected bpftop Results:
========================
- gamer_select_cp: ~135-150ns (was 160ns, -10-25ns average)
  - Input handlers: ~61-88ns individual (-34ns!)
  - Other threads: ~167ns (unchanged)
  - Weighted average: ~135-150ns
- Total BPF CPU: ~6.50-6.55% (was 6.65%, -0.10-0.15%)

Next Steps:
===========
- Test in Palworld to verify latency reduction
- Monitor bpftop for gamer_select_cpu runtime
- Test subjective mouse/keyboard feel
- If still need lower latency, consider Strategy B (shared timestamp)

Related: Input latency, mouse responsiveness, keyboard latency, sub-150ns scheduling, CPU efficiency, perfect optimization
RitzDaCat added a commit to RitzDaCat/scx that referenced this pull request Nov 26, 2025
…14 bugs)

FIRST SWEEP (7 bugs - Critical dispatch and timing issues):

Bug sched-ext#1-2: SCX_DSQ_LOCAL mismatch and missing CPU kicks
- Problem: Using SCX_DSQ_LOCAL dispatched to calling CPU's DSQ, not target CPU
- Impact: Tasks queued on wrong CPU, causing 50-500µs delays waiting for migration
- Fix: Changed to SCX_DSQ_LOCAL_ON | target_cpu + added scx_bpf_kick_cpu()
- Locations: 15+ fast paths in gamer_select_cpu_slowpath

Bug sched-ext#3: Task slice oscillation (continuous_input_mode hysteresis)
- Problem: Exit threshold (75) too close to entry threshold (100), causing toggling
- Impact: Slice duration oscillated rapidly, causing micro-stutters during input
- Fix: Lowered exit threshold from 75 to 50 for proper hysteresis band

Bug sched-ext#4: hotpath_signals race condition
- Problem: Non-atomic read-then-clear of hotpath_signals.input_ns[]
- Impact: Two CPUs could process same input event (double-dispatch)
- Fix: Used __atomic_exchange_n() for atomic read-and-clear

Bug sched-ext#5: Frame stabilization premature activation
- Problem: Checked frame_stabilization_active before timestamp validity
- Impact: Stale stabilization state from previous frame applied incorrectly
- Fix: Reordered to check timestamp first: now < until && active

Bug sched-ext#6: Integer overflow in dynamic window calculation
- Problem: base << 1 could overflow when base > ULLONG_MAX/2
- Impact: max_window wrapped to tiny value, breaking input boost timing
- Fix: Added explicit overflow check before shift operation

Bug sched-ext#7: Stale idle CPU hint (cache thrashing)
- Problem: IDLE_HINT_VALID_NS was 500µs - too long for gaming workloads
- Impact: Tasks dispatched to CPUs that became busy, causing cache misses
- Fix: Reduced to 100µs for fresher idle hints

SECOND SWEEP (7 bugs - Type safety and more missing kicks):

Bug sched-ext#8: Type mismatch in system_audio_tgids_map lookup (CRITICAL)
- Problem: Cast struct system_audio_entry* to u8*, reading only first byte
- Impact: Audio detection failed when refcount > 255 (byte wrapped to 0)
- Fix: Properly use struct system_audio_entry* and check entry->refcount > 0

Bug sched-ext#9-14: Missing CPU kicks after scx_bpf_test_and_clear_cpu_idle
- Problem: Cleared CPU idle state but didn't kick CPU to run task
- Impact: Tasks waited until next timer tick (1-10ms) instead of immediate wake
- Locations: GPU/compositor path, compiler corral, idle scan loop, taskgraph
- Fix: Added scx_bpf_kick_cpu(candidate, SCX_KICK_IDLE) after each clear

CUMULATIVE IMPACT ANALYSIS:

Before fixes (worst case latency budget per frame at 144Hz = 6.94ms):
- DSQ mismatch: +50-500µs per critical task dispatch
- Missing kicks: +1-10ms waiting for timer tick (devastating)
- Slice oscillation: +100-500µs per slice recalculation
- Race condition: +200-1000µs from double-dispatch confusion
- Stale hints: +50-200µs from cache misses
- Audio detection: Complete audio thread priority loss when refcount > 255
Total potential added latency: 1.4-12.2ms per frame (exceeds frame budget!)

After fixes:
- All fast paths dispatch to correct CPU immediately
- CPU kicks ensure <10µs wakeup latency
- Stable slice durations with proper hysteresis
- Atomic operations prevent race conditions
- Fresh idle hints improve cache locality
- Audio detection works correctly at all refcounts
Expected latency reduction: 90-99% of added jitter eliminated

TESTING RECOMMENDED:
1. High-refresh gaming (144Hz+) - should see more consistent frame times
2. High-polling mouse (1000Hz+) - should feel smoother, less chunky
3. Audio with multiple sources - should maintain priority correctly
4. Sustained input (keyboard held) - should not see slice flickering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants