Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event processing seems to be slow and dependent on the SDL_Delay interval between frames #6479

Closed
furious-programming opened this issue Nov 2, 2022 · 15 comments

Comments

@furious-programming
Copy link

furious-programming commented Nov 2, 2022

I have created a fairly complex main loop in my engine. This loop calls the logic update and rendering functions at approximately equal intervals (fixed/limited frame rate). I have neither the target logic update code nor the rendering code yet, so I implemented spinlocks that mimic the actual code.

These spinlocks consume a certain number of milliseconds over and over again by calling SDL_GetPerformanceCounter, so they wait a given number of milliseconds with a very high precision (usually down to a microsecond). I added code for updating and rendering time measurements, which are represented as milliseconds (as floats). If the spinlock of the logic update is to be 5ms and the rendering time is 10ms, the measurement results look something like this:

Update: 5.00059ms | Render: 10.00030ms

A high enough precision. This is where the problem arises. Logic update is also event queue processing, so the update measurement relates to event processing and the update function (the one with a spinlock for 5ms). The results are as follows:

Update: 5.14063ms | Render: 10.00047ms

If I exclude the 5ms spinlock from the measurement and measure only the event queue processing time, the results look like this:

Update: 0.18426ms | Render: 10.00049ms

Thus, processing the event queue itself takes from a few tenths of a millisecond to even more than one millisecond (sometimes almost two milliseconds). It is unacceptable for event processing to take up to 5-10% of the time devoted to generating a frame (at 60fps). I measured the SDL event processing time by running the program and not generating any events (I didn't move the mouse, I didn't press any keys), so the events queue was empty almost all the time.


I thought the problem was with my loop, so I created a minimal test program to test the event processing only measurement. It looks like this:

{$MODE OBJFPC}

uses
  SDL2;
var
  Window: PSDL_Window;
  Event:  TSDL_Event;
var
  Stopped: Boolean = False;
var
  TicksLast:    Int64;
  TicksCurrent: Int64;
var
  CounterBegin:     Int64;
  CounterEnd:       Int64;
  CounterSample:    Int64;
  CounterPerSecond: Int64;
begin
  SDL_Init(SDL_INIT_EVERYTHING);

  Event  := Default(TSDL_Event);
  Window := SDL_CreateWindow(nil, SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 800, 600, SDL_WINDOW_SHOWN);

  CounterPerSecond := SDL_GetPerformanceFrequency();
  TicksLast        := SDL_GetTicks64() div (1000 div 4);

  repeat
    CounterBegin := SDL_GetPerformanceCounter();

      while SDL_PollEvent(@Event) = 1 do
        if Event.Type_ = SDL_QUITEV then
          Stopped := True;

    CounterEnd    := SDL_GetPerformanceCounter();
    CounterSample := CounterEnd - CounterBegin;

    TicksCurrent := SDL_GetTicks64() div (1000 div 4);

    if TicksCurrent <> TicksLast then
    begin
      WriteLn('time: ', (CounterSample * 1000 / CounterPerSecond):1:3, 'ms | sample: ', CounterSample);
      TicksLast := TicksCurrent;
    end;

    SDL_Delay(1);
  until Stopped;

  SDL_DestroyWindow(Window);
  SDL_Quit();
end.

Operation is simple:

  1. get the number of ticks before processing the event loop,
  2. process all queued events
  3. get the number of ticks after processing the event loop,
  4. display the measurement in the console (limited to four times per second)
  5. wait a millisecond for the next measurement (it saves CPU) and go back to step 1.

The results are as follows:

time: 0.046ms | sample: 461
time: 0.010ms | sample: 100
time: 0.047ms | sample: 465
time: 0.010ms | sample: 103
time: 0.047ms | sample: 473
time: 0.010ms | sample: 103
time: 0.047ms | sample: 469
time: 0.010ms | sample: 103
time: 0.047ms | sample: 468
time: 0.010ms | sample: 101
time: 0.124ms | sample: 1238
time: 0.047ms | sample: 467
time: 0.010ms | sample: 101
time: 0.046ms | sample: 463
time: 0.136ms | sample: 1359
time: 0.136ms | sample: 1362
time: 0.045ms | sample: 449

As you can see, the very processing of events (despite not generating them) takes from a few hundredths of a millisecond to a few tenths of a millisecond. Now the most interesting — I change SDL_Delay(1) to SDL_Delay(10):

time: 0.091ms | sample: 905
time: 0.068ms | sample: 681
time: 0.379ms | sample: 3794
time: 0.103ms | sample: 1033
time: 0.069ms | sample: 686
time: 0.176ms | sample: 1764
time: 0.294ms | sample: 2942
time: 0.175ms | sample: 1754
time: 0.121ms | sample: 1210
time: 0.093ms | sample: 933
time: 0.095ms | sample: 949
time: 0.090ms | sample: 901
time: 0.089ms | sample: 893
time: 0.076ms | sample: 761
time: 0.179ms | sample: 1793

Now event processing takes from a few hundredths of a millisecond to even 0.4ms. Now for SDL_Delay(100):

time: 0.295ms | sample: 2945
time: 0.345ms | sample: 3451
time: 0.794ms | sample: 7941
time: 0.323ms | sample: 3227
time: 0.298ms | sample: 2982
time: 0.295ms | sample: 2953
time: 0.469ms | sample: 4691
time: 0.295ms | sample: 2950
time: 0.323ms | sample: 3230
time: 0.615ms | sample: 6154
time: 0.340ms | sample: 3403
time: 0.339ms | sample: 3388
time: 0.356ms | sample: 3556
time: 0.337ms | sample: 3370
time: 0.343ms | sample: 3433
time: 1.159ms | sample: 11585
time: 0.660ms | sample: 6601
time: 0.341ms | sample: 3411
time: 0.352ms | sample: 3523
time: 0.342ms | sample: 3415

Processing of events takes from a few tenths of a millisecond to over one millisecond. For SDL_Delay(500):

time: 2.073ms | sample: 20731
time: 1.399ms | sample: 13985
time: 3.798ms | sample: 37983
time: 4.745ms | sample: 47452
time: 2.030ms | sample: 20296
time: 2.040ms | sample: 20396
time: 2.259ms | sample: 22590
time: 4.019ms | sample: 40187
time: 2.196ms | sample: 21957
time: 1.975ms | sample: 19749
time: 2.307ms | sample: 23074
time: 2.301ms | sample: 23013
time: 2.593ms | sample: 25927
time: 1.972ms | sample: 19722
time: 2.032ms | sample: 20317
time: 3.758ms | sample: 37582

Now it takes a few milliseconds all the time. For SDL_Delay(1000):

time: 4.549ms | sample: 45487
time: 4.860ms | sample: 48596
time: 4.597ms | sample: 45973
time: 4.699ms | sample: 46986
time: 5.158ms | sample: 51582
time: 4.550ms | sample: 45502
time: 5.308ms | sample: 53084
time: 5.304ms | sample: 53037
time: 2.319ms | sample: 23192
time: 4.635ms | sample: 46354
time: 4.761ms | sample: 47608
time: 4.454ms | sample: 44542
time: 4.637ms | sample: 46371
time: 4.949ms | sample: 49490
time: 4.469ms | sample: 44686
time: 4.661ms | sample: 46608
time: 4.728ms | sample: 47281

As you can see, the larger the SDL_Delay interval between individual measurements, the longer it takes to process the event queue, or at least this is what the measurements show. The larger the interval, the greater the difference in the number of ticks taken before and after the event loop.

Why does the interval of the SDL_Delay function called between measurements affect the measurement of event processing time? After all, this function is called between measurements, not during them, so it should not affect them (only increase or decrease the number of measurements per second).

The same is true if I explicitly use the QueryPerformanceFrequency, QueryPerformanceCounter and Sleep functions, so this is not a problem with SDL_Delay (especially since it uses Sleep internally).


I know guys that you use C language and I use Free Pascal, so in the attachment I added test programs compiled in FPC so that you can test them on your own (Win 64-bit). Attached is the last stable library SDL2.dll, downloaded from this repository.

There are several executable files in the attachment — the number in the executable name is the interval used in the SDL_Delay function called between measurements (no number in the name is a version without using SDL_Delay).

Attachment — timming_test.zip

@peppy
Copy link
Contributor

peppy commented Nov 5, 2022

Looking at this from the outside with no testing, would it not make sense for the polling for events to take longer if there were ongoing events arriving during this period? I know you say that you didn't move the mouse, but your output isn't showing how many events are being processed, so I'd suggest you check that as a starting point (especially for the longer intervals).

Also you may want to try switching to SDL_PeepEvents (see discussion in #4794 and probably elsewhere) as a more efficient way to process larger numbers of events.

@furious-programming
Copy link
Author

furious-programming commented Nov 5, 2022

Before creating this issue, I checked if the queue was actually empty, so I did not include it in the logs. Anyway, it is logical that if I do not move the mouse over the window and do not press the keys, then no events are generated.

Okay, here are the logs with added event count tracking and comments that indicate problematic lines. This time, I alternately did nothing and tested the event generation by moving the cursor over the window. I removed the excess lines showing 0 events to shorten the log. This is a test for SDL_Delay(1):

time: 0.010ms | sample:   104 | events:  0
time: 0.148ms | sample:  1484 | events: 11 // 11 events, 0.148ms - bad
time: 0.048ms | sample:   477 | events: 22
time: 0.010ms | sample:   103 | events:  0
time: 0.046ms | sample:   464 | events:  0
time: 0.046ms | sample:   461 | events:  0
time: 0.046ms | sample:   462 | events:  0
time: 0.310ms | sample:  3104 | events: 28 // 0.310ms to process 28 events
time: 0.175ms | sample:  1748 | events: 29
time: 0.026ms | sample:   257 | events:  0
time: 0.201ms | sample:  2010 | events:  0 // 0.201ms but no events processed...
time: 0.013ms | sample:   129 | events:  0
time: 0.048ms | sample:   480 | events:  0
time: 0.010ms | sample:   103 | events:  0
time: 0.010ms | sample:   101 | events: 11 // 11 events, only 0.010ms
time: 0.011ms | sample:   109 | events:  0
time: 0.043ms | sample:   425 | events:  5
time: 0.011ms | sample:   108 | events: 12 // 0.011ms to process 12 events
time: 0.010ms | sample:   102 | events:  0
time: 0.047ms | sample:   470 | events:  0
time: 0.010ms | sample:   102 | events:  0
time: 0.047ms | sample:   473 | events:  0
time: 0.047ms | sample:   466 | events:  0
time: 0.012ms | sample:   118 | events: 21 // 21 events, only 0.012ms to process
time: 0.047ms | sample:   467 | events: 26 // 26 events, only 0.047ms
time: 0.011ms | sample:   109 | events:  0
time: 0.042ms | sample:   417 | events: 25 // 25 events, only 0.042ms
time: 0.010ms | sample:   100 | events: 17 // 17 events, only 0.010ms
time: 0.162ms | sample:  1615 | events: 26 // 26 events, but many times longer processing
time: 0.047ms | sample:   471 | events: 16 // 16 events, 0.047ms
time: 0.010ms | sample:   100 | events: 14
time: 0.010ms | sample:   102 | events:  0
time: 0.047ms | sample:   466 | events:  0
time: 0.010ms | sample:   103 | events:  0
time: 0.047ms | sample:   465 | events:  0
time: 0.049ms | sample:   488 | events:  0
time: 0.048ms | sample:   481 | events: 16 // 16 events - again 0.048ms
time: 0.010ms | sample:   100 | events:  0
time: 0.049ms | sample:   486 | events:  0
time: 0.010ms | sample:   100 | events:  0
time: 0.011ms | sample:   110 | events:  0
time: 0.047ms | sample:   470 | events:  0
time: 0.011ms | sample:   106 | events:  0
time: 0.192ms | sample:  1925 | events: 12 // 0.192ms to process 12 events?!
time: 0.010ms | sample:   103 | events: 31 // 0.010ms to process 31 events - what a difference!
time: 0.048ms | sample:   483 | events: 26
time: 0.010ms | sample:   100 | events:  0
time: 0.047ms | sample:   465 | events:  2
time: 0.009ms | sample:    93 | events:  3
time: 0.047ms | sample:   473 | events: 19 // 19 events, 0.047ms
time: 0.009ms | sample:    93 | events:  0
time: 0.049ms | sample:   489 | events:  0
{...}

As can be clearly seen in the logs, when events appear in the queue, their processing time varies greatly. SDL can return 30 events in 0.01 ms, and then returns a similar number of events in several dozen times longer. Besides, once in a while it takes even 0.2 ms to process an empty queue (so calling the SDL_PollEvent function to get the result 0).

Another problem I noticed is that when you drag a window with the mouse, by grabbing its title bar, it adds moving time to adding the event processing time itself. Example for SDL_Delay(1):

time: 0.024ms | sample:   241 | events: 10
time: 0.346ms | sample:  3462 | events: 22
time: 0.010ms | sample:   104 | events: 27
time: 0.050ms | sample:   500 | events: 17
time: 0.051ms | sample:   508 | events: 14
time: 0.046ms | sample:   461 | events:  0
time: 2160.101ms | sample: 21601013 | events:  2 // time of moving window is added to the sample
time: 0.287ms | sample:  2870 | events:  0
time: 0.042ms | sample:   417 | events: 10
time: 0.047ms | sample:   468 | events: 23
time: 0.047ms | sample:   470 | events:  0
time: 0.010ms | sample:   103 | events:  5

I held the window by the title bar and moved it for about two seconds. The time of holding and moving the window has been added to the measurement — why? This happens at any SDL_Delay interval and every time I move the window.


And this is the test for SDL_Delay(100) between measurements:

// doing nothing:
time: 1.131ms | sample: 11307 | events:  0 // A la la la la long long li long long long, oh yeah!
time: 0.454ms | sample:  4544 | events:  0
time: 0.441ms | sample:  4410 | events:  0
time: 0.653ms | sample:  6534 | events:  0
time: 0.433ms | sample:  4333 | events:  0
time: 0.746ms | sample:  7463 | events:  0
time: 0.548ms | sample:  5476 | events:  0
time: 0.456ms | sample:  4562 | events:  0
time: 0.620ms | sample:  6199 | events:  0
time: 0.439ms | sample:  4390 | events:  0
time: 0.656ms | sample:  6565 | events:  0

// moving the mouse over the window:
time: 0.622ms | sample:  6224 | events:  2
time: 0.619ms | sample:  6186 | events:  3
time: 2.047ms | sample: 20474 | events:  2
time: 1.361ms | sample: 13611 | events:  3
time: 0.660ms | sample:  6595 | events:  2
time: 1.066ms | sample: 10658 | events:  2
time: 0.887ms | sample:  8870 | events:  3
time: 0.654ms | sample:  6542 | events:  2
time: 0.617ms | sample:  6168 | events:  3
time: 0.672ms | sample:  6719 | events:  2
time: 0.602ms | sample:  6019 | events:  3
time: 1.153ms | sample: 11528 | events:  2
time: 1.354ms | sample: 13538 | events:  3
time: 0.688ms | sample:  6879 | events:  2
time: 0.725ms | sample:  7253 | events:  3
time: 0.646ms | sample:  6463 | events:  2
time: 0.856ms | sample:  8562 | events:  3
time: 0.978ms | sample:  9775 | events:  2
time: 0.446ms | sample:  4458 | events:  2
time: 0.444ms | sample:  4444 | events:  0
time: 0.437ms | sample:  4370 | events:  0
time: 0.672ms | sample:  6721 | events:  1
time: 2.953ms | sample: 29532 | events:  3
time: 0.473ms | sample:  4729 | events:  0
time: 0.443ms | sample:  4434 | events:  0
time: 0.441ms | sample:  4413 | events:  0
time: 0.444ms | sample:  4444 | events:  0
time: 0.446ms | sample:  4461 | events:  0
time: 0.446ms | sample:  4462 | events:  0
time: 1.435ms | sample: 14351 | events:  0 // WTF is this...
time: 0.451ms | sample:  4511 | events:  0
time: 0.448ms | sample:  4476 | events:  0
time: 0.450ms | sample:  4503 | events:  0

When no events is processed, the times are significantly longer. For the SDL_Delay(1000):

// doing nothing:
time: 4.594ms | sample: 45936 | events:  0
time: 5.578ms | sample: 55778 | events:  0
time: 3.030ms | sample: 30298 | events:  0
time: 2.894ms | sample: 28938 | events:  0
time: 4.830ms | sample: 48301 | events:  0
time: 4.991ms | sample: 49907 | events:  0
time: 4.583ms | sample: 45831 | events:  0
time: 4.654ms | sample: 46535 | events:  0
time: 4.562ms | sample: 45616 | events:  0
time: 6.130ms | sample: 61298 | events:  0
time: 6.193ms | sample: 61930 | events:  0
time: 4.664ms | sample: 46643 | events:  0
time: 4.570ms | sample: 45696 | events:  0
time: 5.418ms | sample: 54185 | events:  0
time: 7.045ms | sample: 70447 | events:  0

// moving the cursor over the window:
time: 7.582ms | sample: 75824 | events:  2
time: 6.830ms | sample: 68295 | events:  1
time: 5.856ms | sample: 58558 | events:  1
time: 6.802ms | sample: 68017 | events:  1
time: 5.739ms | sample: 57389 | events:  1
time: 5.226ms | sample: 52257 | events:  0
time: 4.910ms | sample: 49103 | events:  1
time: 4.972ms | sample: 49716 | events:  1
time: 7.504ms | sample: 75042 | events:  1
time: 2.953ms | sample: 29532 | events:  1
time: 5.022ms | sample: 50224 | events:  1
time: 5.323ms | sample: 53233 | events:  1
time: 5.271ms | sample: 52711 | events:  1

The times are ridiculous. It's also interesting that I'm moving the cursor all the time, making circles, but the queue only gets one event each second. The situation is different if I also click LMB while making circles with the cursor:

time: 6.046ms | sample: 60461 | events: 11
time: 4.670ms | sample: 46701 | events:  5
time: 7.254ms | sample: 72539 | events:  9
time: 11.297ms | sample: 112965 | events: 20
time: 12.228ms | sample: 122276 | events: 21
time: 15.361ms | sample: 153605 | events: 17
time: 10.515ms | sample: 105149 | events: 15
time: 8.147ms | sample: 81474 | events:  3
time: 15.132ms | sample: 151317 | events: 17
time: 12.836ms | sample: 128358 | events: 21
time: 12.371ms | sample: 123712 | events: 21
time: 12.695ms | sample: 126953 | events: 21
time: 16.724ms | sample: 167241 | events: 26
time: 12.847ms | sample: 128474 | events: 23
time: 10.596ms | sample: 105959 | events: 17
time: 10.694ms | sample: 106942 | events: 13

There are many events in the queue, but I didn't click 21 times per second, but only few times per second (3-5), so the rest of the events must be related to the mouse movement.

It should also be taken into account that if the SDL_Delay interval is less than a quarter of a second (e.g. 1 or 10), the number of events displayed in the console does not apply to the number of events per frame, but to the sum of events in the last 250 ms. If the event counter is reset every frame (below printing data, at the end of the loop), the number of events in the console will be smaller, but the time samples will be the same (so much the worse for SDL).


It is difficult for me to say why the measurements look like this, and it is especially difficult for me to understand why manipulating the SDL_Delay interval (which is called between frames) has a real impact on the amount of time used by SDL_PollEvent. I am adding a new package of test programs to the attachments, which also displays the number of processed events every quarter second (events counter is reset 4 times per second).

Attachment — timming_test_2.zip

Here is the code template of these programs:

{$MODE OBJFPC}

uses
  SDL2;
var
  Window: PSDL_Window;
  Event:  TSDL_Event;
var
  Stopped: Boolean = False;
var
  TicksLast:    Int64;
  TicksCurrent: Int64;
var
  CounterBegin:     Int64;
  CounterEnd:       Int64;
  CounterSample:    Int64;
  CounterPerSecond: Int64;
var
  EventsTime: Single = 0.0;
  EventsAccu: Int32  = 0;
begin
  SDL_Init(SDL_INIT_EVERYTHING);

  Event  := Default(TSDL_Event);
  Window := SDL_CreateWindow(nil, SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 800, 600, SDL_WINDOW_SHOWN);

  CounterPerSecond := SDL_GetPerformanceFrequency();
  TicksLast        := SDL_GetTicks64() div (1000 div 4);

  repeat
    CounterBegin := SDL_GetPerformanceCounter();

      while SDL_PollEvent(@Event) = 1 do
      begin
        EventsAccu += 1;

        if Event.Type_ = SDL_QUITEV then
          Stopped := True;
      end;

    CounterEnd    := SDL_GetPerformanceCounter();
    CounterSample := CounterEnd - CounterBegin;

    TicksCurrent := SDL_GetTicks64() div (1000 div 4);

    if TicksCurrent <> TicksLast then
    begin
      EventsTime := CounterSample * 1000 / CounterPerSecond;

      WriteLn('time: ', EventsTime:1:3, 'ms | sample: ', CounterSample:5, ' | events:',  EventsAccu:3); // new print format

      TicksLast  := TicksCurrent;
      EventsAccu := 0;
    end;

    SDL_Delay(1); // delay between frames
  until Stopped;

  SDL_DestroyWindow(Window);
  SDL_Quit();
end.

@pionere
Copy link
Contributor

pionere commented Nov 5, 2022

I'm not sure if it matters or not in this case (probably not), but the implementation of SDL_AtomicGet and SDL_AtomicGetPrt are a bit ugly in my opinion. In a worse case scenario those could lead to an infinite/long loop.

E.g. in case of HAVE_MSC_ATOMICS, I would change the code to: return _InterlockedCompareExchange((long*)&a->value, 0, 0);

Edit: using int* as a long* is quite risky. Maybe an SDL_COMPILE_TIME_ASSERT(atomic, sizeof(long) == sizeof(int)); would be appropriate?

@meyraud705
Copy link
Contributor

SDL_GetPerformanceCounter() uses CLOCK_MONOTONIC_RAW on Linux and QueryPerformanceCounter() on Windows which measure system time. We don't know where the time is spent, my guess about the spike is that your OS decided to let another process run.

To measure only time spent in SDL, on Linux, you could use clock_gettime() with CLOCK_PROCESS_CPUTIME_ID that measures CPU time consumed by the process or CLOCK_THREAD_CPUTIME_ID that measures CPU time consumed by the thread.

@furious-programming
Copy link
Author

furious-programming commented Nov 8, 2022

@meyraud705: for clarification, I'm using Windows 10 64-bit.

We don't know where the time is spent […]

I know where this time is spent — this goddamn SDL_PumpEvents eats it. ;)

I converted my test program to one that uses SDL_PeekEvents instead of SDL_PollEvent and the result is unambiguous. Time is measured in exactly the same way as before, ie using SDL_GetPerformanceCounter which uses QueryPerformanceCounter internally. The program now looks like this:

{$MODE OBJFPC}

uses
  SDL2;
var
  Window:     PSDL_Window;
  Event:      PSDL_Event;
  EventsNum:  Integer;
  EventsBuff: array [0 .. 255] of TSDL_Event;
var
  Stopped: Boolean = False;
var
  TicksLast:    Int64;
  TicksCurrent: Int64;
var
  CounterBegin:     Int64;
  CounterEnd:       Int64;
  CounterSample:    Int64;
  CounterPerSecond: Int64;
var
  EventsTime: Single = 0.0;
  EventsAccu: Int32  = 0;
begin
  SDL_Init(SDL_INIT_EVERYTHING);
  Window := SDL_CreateWindow(nil, SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 800, 600, SDL_WINDOW_SHOWN);

  CounterPerSecond := SDL_GetPerformanceFrequency();
  TicksLast        := SDL_GetTicks64() div (1000 div 4);

  repeat
    CounterBegin := SDL_GetPerformanceCounter();

      SDL_PumpEvents();

      EventsNum := SDL_PeepEvents(@EventsBuff, 256, SDL_GETEVENT, SDL_FIRSTEVENT, SDL_LASTEVENT);
      Event := @EventsBuff[0];

      while EventsNum > 0 do
      begin
        if Event^.Type_ = SDL_QUITEV then
        begin
          Stopped := True;
          Break;
        end;

        Event      += 1;
        EventsAccu += 1;
        EventsNum  -= 1;
      end;

    CounterEnd    := SDL_GetPerformanceCounter();
    CounterSample := CounterEnd - CounterBegin;

    TicksCurrent := SDL_GetTicks64() div (1000 div 4);

    if TicksCurrent <> TicksLast then
    begin
      EventsTime := CounterSample * 1000 / CounterPerSecond;

      WriteLn('time: ', EventsTime:1:3, 'ms | sample: ', CounterSample:5, ' | events:',  EventsAccu:3);

      TicksLast  := TicksCurrent;
      EventsAccu := 0;
    end;

    SDL_Delay(1);
  until Stopped;

  SDL_DestroyWindow(Window);
  SDL_Quit();
end.

As you can see, the measurement is about calling SDL_PumpEvents and SDL_PeepEvents and then processing the retrieved events in a loop (which are now in the local buffer). Example output in the console for the above code:

time: 0.029ms | sample:   294 | events: 13
time: 0.008ms | sample:    83 | events:  3
time: 0.075ms | sample:   754 | events:  0
time: 0.335ms | sample:  3349 | events: 16 // too long
time: 0.040ms | sample:   400 | events: 31
time: 0.008ms | sample:    84 | events: 31
time: 0.168ms | sample:  1679 | events: 32 // too long
time: 0.163ms | sample:  1631 | events: 31 // too long
time: 0.191ms | sample:  1905 | events: 31 // too long
time: 0.009ms | sample:    87 | events: 31
time: 0.349ms | sample:  3491 | events: 32 // too long
time: 0.162ms | sample:  1624 | events: 31 // too long
time: 0.040ms | sample:   398 | events: 31
time: 0.008ms | sample:    82 | events: 31
time: 0.174ms | sample:  1735 | events: 32 // too long
time: 0.008ms | sample:    84 | events: 14
time: 0.045ms | sample:   449 | events:  0
time: 0.009ms | sample:    86 | events:  0
time: 0.085ms | sample:   851 | events:  2
time: 0.008ms | sample:    83 | events:  0
time: 0.044ms | sample:   438 | events:  0

If I exclude the SDL_PumpEvents call from the measurement:

  repeat
    SDL_PumpEvents(); // before starting the measurement

    CounterBegin := SDL_GetPerformanceCounter();

      EventsNum := SDL_PeepEvents(@EventsBuff, 256, SDL_GETEVENT, SDL_FIRSTEVENT, SDL_LASTEVENT);
      Event := @EventsBuff[0];

      while EventsNum > 0 do
      begin
        if Event^.Type_ = SDL_QUITEV then
        begin
          Stopped := True;
          Break;
        end;

        Event      += 1;
        EventsAccu += 1;
        EventsNum  -= 1;
      end;

    CounterEnd    := SDL_GetPerformanceCounter();
    CounterSample := CounterEnd - CounterBegin;

  {...}

the times are stable and very low (usually 1μs, very rarely 2μs) even though I am generating tons of events:

time: 0.001ms | sample:     5 | events: 30
time: 0.001ms | sample:     5 | events: 20
time: 0.001ms | sample:     6 | events:  0
time: 0.002ms | sample:    21 | events:  2
time: 0.001ms | sample:     6 | events:  0
time: 0.001ms | sample:     5 | events: 30
time: 0.001ms | sample:     5 | events: 38
time: 0.001ms | sample:     6 | events: 40
time: 0.001ms | sample:     5 | events: 42
time: 0.001ms | sample:    14 | events: 50
time: 0.001ms | sample:     5 | events: 43
time: 0.001ms | sample:     6 | events: 53
time: 0.001ms | sample:     5 | events: 47
time: 0.001ms | sample:     6 | events: 47
time: 0.001ms | sample:     5 | events: 45
time: 0.001ms | sample:     6 | events: 53
time: 0.001ms | sample:     6 | events: 47
time: 0.001ms | sample:     9 | events: 48
time: 0.001ms | sample:     6 | events: 40
time: 0.001ms | sample:     6 | events: 51
time: 0.001ms | sample:     6 | events: 47
time: 0.001ms | sample:    14 | events: 52
time: 0.001ms | sample:     6 | events: 25
time: 0.001ms | sample:     5 | events:  7
time: 0.001ms | sample:     5 | events:  0
time: 0.001ms | sample:     6 | events:  0
time: 0.001ms | sample:     5 | events:  6
time: 0.001ms | sample:     5 | events: 40
time: 0.001ms | sample:     6 | events: 47
time: 0.001ms | sample:     6 | events: 48
time: 0.001ms | sample:     6 | events: 47
time: 0.000ms | sample:     4 | events: 44
time: 0.001ms | sample:     5 | events: 43
time: 0.001ms | sample:     5 | events: 52
time: 0.001ms | sample:     7 | events: 44
time: 0.000ms | sample:     4 | events: 46
time: 0.000ms | sample:     4 | events: 46
time: 0.001ms | sample:     5 | events: 29
time: 0.001ms | sample:     6 | events: 10
time: 0.001ms | sample:     6 | events:  6

Conclusion — SDL_PeepEvents works very efficiently, while SDL_PumpEvents is incredibly slow and this function call takes so long (irrelevant whether called manually or internally by SDL_PollEvent). Now measuring only the SDL_PumpEvents calls:

repeat
  CounterBegin := SDL_GetPerformanceCounter();

  SDL_PumpEvents();

  CounterEnd    := SDL_GetPerformanceCounter();
  CounterSample := CounterEnd - CounterBegin;

  {...}

the results are unambiguous:

time: 0.193ms | sample:  1934 | events: 26 // too long
time: 0.008ms | sample:    82 | events: 31
time: 0.039ms | sample:   392 | events: 31
time: 0.008ms | sample:    79 | events: 31
time: 0.206ms | sample:  2062 | events: 32 // too long
time: 0.008ms | sample:    79 | events: 31
time: 0.039ms | sample:   390 | events: 31
time: 0.178ms | sample:  1777 | events: 31 // too long
time: 0.232ms | sample:  2316 | events: 32 // too long
time: 0.055ms | sample:   546 | events: 25
time: 0.039ms | sample:   391 | events: 30
time: 0.008ms | sample:    82 | events: 30
time: 0.952ms | sample:  9519 | events: 43 // too long
time: 1.323ms | sample: 13228 | events: 47 // too long
time: 0.040ms | sample:   396 | events: 42
time: 0.008ms | sample:    82 | events: 44
time: 0.219ms | sample:  2190 | events: 51 // too long
time: 0.008ms | sample:    80 | events: 43
time: 0.039ms | sample:   391 | events: 48
time: 0.007ms | sample:    67 | events: 45
time: 0.097ms | sample:   968 | events: 43
time: 0.008ms | sample:    79 | events: 49
time: 1.818ms | sample: 18183 | events: 47 // too looooooooong!
time: 0.008ms | sample:    78 | events: 44
time: 0.045ms | sample:   445 | events: 18
time: 0.008ms | sample:    80 | events:  0
time: 0.045ms | sample:   449 | events:  0
time: 0.008ms | sample:    80 | events:  0
time: 0.045ms | sample:   447 | events:  0
time: 0.008ms | sample:    79 | events:  0
time: 0.045ms | sample:   445 | events:  0
time: 0.005ms | sample:    52 | events:  0
time: 0.008ms | sample:    81 | events:  0
time: 0.008ms | sample:    84 | events:  0
time: 0.040ms | sample:   402 | events:  9
time: 0.269ms | sample:  2687 | events: 32 // too long
time: 0.272ms | sample:  2717 | events: 13 // too long
time: 0.041ms | sample:   408 | events: 31

Sorry guys, but I don't have any evidence to think that the problem lies elsewhere than with the SDL_PumpEvents function. I have exhausted all the possibilities to improve my code — I even tried to set the highest priority and change the process affinity (use only one CPU logical core), but it does absolutely nothing and the performance still is the same as before.

And the situation is the same as before, i.e. the greater the interval for SDL_Delay called between frames/measurements, the longer it takes to call the SDL_PumpEvents function (even without events generation).

@peppy
Copy link
Contributor

peppy commented Nov 9, 2022

If you are looking for the most efficient processing (and have benchmarked PeekEvents to perform as expected), ignoring the fact that PumpEvents may have an issue PeekEvents does look like an amicable path forward for your use case, correct?

@pionere
Copy link
Contributor

pionere commented Nov 9, 2022

@peppy: even if PeekEvents works for their case, I think it is worth to check why PumpEvents is slow. At some point PumpEvents must be called either directly or indirectly. It does not make sense to expect the application to change its main loop.

@furious-programming : do you have some special sensor or what kind of videocard are you using? I think it would help to find out the bottleneck. Have you tried to set SDL_JOYSTICK_DISABLED or SDL_SENSOR_DISABLED?

@peppy
Copy link
Contributor

peppy commented Nov 9, 2022

Yep, I don't disagree but was curious if there is a specific reason PumpEvents is preferred here.

@meyraud705
Copy link
Contributor

You implemented the SDL_PeepEvents version incorrectly.SDL_PeepEvents need to be inside the while EventsNum > 0 do loop so you can read all events that you just pumped. This should not affect your profiling result as you always get less than 256 events.

Since 8bf32e1 , calling SDL_PollEvents should be as fast as the SDL_PumpEvents + SDL_PeepEvents version.

We don't know where the time is spent […]

I know where this time is spent — this goddamn SDL_PumpEvents eats it. ;)

No, to read event from devices, SDL_PumpEvents calls system library. Most of its time is spent in system library. Here an example on Linux:
pumpevent_profile

You can see that:

  • SDL_PumpEvents take 1% of cpu time but 60% of this is in libX11 function XCheckIfEvent.
  • I moved my mouse a lot but it took only 5% of SDL_PumpEvents time.
  • joystick update took 10% of SDL_PumpEvents time with no joystick connected, that could be improved.

Unless you show more in depth profiling I going to say your system library are slow and the spike are because your system let another process run.

@furious-programming
Copy link
Author

furious-programming commented Nov 9, 2022

@furious-programming : do you have some special sensor […]

As far as I know, I don't have any sensors and I don't really know what they are and what they are for. Anyway, SDL_NumSensors returns 0 if I use the SDL_INIT_EVERYTHING flag in the SDL_Init function (it returns 0, so no SDL initialization errors). So I don't have any sensors.

[…] or what kind of videocard are you using?

The graphics card is a typical integrated Intel® HD Graphic. If you need more information on the processor see documentation — Intel® Core™ i7-640LM. My laptop is Lenovo X201 Tablet, all features enabled (Turbo Boost, Hyper-Threading and so on) + 8GB RAM and SSD installed.

Have you tried to set SDL_JOYSTICK_DISABLED or SDL_SENSOR_DISABLED?

I also tried to initialize only SDL_INIT_VIDEO, so only videos and events subsystems, but it doesn't change anything.


@meyraud705: I pointed out that SDL_PumpEvents is slow, because this is how it comes out of the measurements that are unambiguous. I'm not saying this function has bugs, but that it works slowly at random times. I will keep looking for the reason.

However, I would like to point out right away that the problem does not only concern my laptop, because similar results were obtained on another PC, with an AMD processor — on Windows and Linux (via Wine). And for this I have prepared packages with compiled test programs, so that you can also perform the tests yourself.

You implemented the SDL_PeepEvents version incorrectly.SDL_PeepEvents need to be inside the while EventsNum > 0 do loop so you can read all events that you just pumped. This should not affect your profiling result as you always get less than 256 events.

And this is intentional and expected, because SDL_PeepEvents is to copy all events from the queue to the local buffer (as one operation), and then the loop is to iterate over all return ones. If there are no events, the loop will not iterate once. The event buffer is large enough to always hold all queued events, so it's enough for testing purposes. Besides, I always know that SDL_PeepEvents will be called once per frame, so the measurement is always for one call.

@furious-programming
Copy link
Author

furious-programming commented Nov 9, 2022

Somehow I made a report. Looks like Windows-related bullshit — considerable spin time:

image

The problem is SleepEx and WaitForMultipleObjectsEx.

@meyraud705
Copy link
Contributor

The problem is SleepEx and WaitForMultipleObjectsEx.

Sleep is probably called by SDL_Delay, so that not really a problem.

WaitForMultipleObjectsEx is not called from the main thread. It would be nice to have full call stack to see if SDL can do something about it.

You should also check which functions take the most "effective time" to see if they could be optimised.

@slouken
Copy link
Collaborator

slouken commented Nov 10, 2022

Here is a set of x86 and x64 DLLs with symbols that you can drop in and get better profiling information:
https://www.libsdl.org/tmp/SDL2-x86-2022-11-10.zip
https://www.libsdl.org/tmp/SDL2-x64-2022-11-10.zip

@furious-programming
Copy link
Author

I haven't found any additional information to suggest that SDL is at fault. I checked those dll files with symbols and found nothing new. Unfortunately for me, fortunately for SDL. If you don't see anything in the SDL sources that could slow down event pumping, there's probably nothing you can do about it and everything is in the hands of the operating system.

@slouken
Copy link
Collaborator

slouken commented Nov 15, 2022

I'll go ahead and close this, but if you find new information, please let us know!

@slouken slouken closed this as completed Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants