Kivy clock experiments

matham edited this page Jun 19, 2016 · 5 revisions
Clone this wiki locally

The Kivy clock works as follows; when a user schedules a callback, or calls a previously created a trigger event, the event gets added to a collection of events that the kivy thread executes.

To be useful, the clock needs the ability to schedule and unschedule events. When unscheduling, we need to be able to unschedule from the event, or the callabck, the latter requires a lot more processing, since we need to search all the events to find the one associated with the callback. Of course, if an event is unscheduled before it's executed it should not be called, increasing the computational cost of the kivy thread since it has to check every event before executing that it is still scheduled.

The code towards the end was used to profile the various containers used to store the events, each with their own issues. Their profile data is listed under each of them.

Note, for testing purposes, it's important to schedule and unschedule callables in random order, because some containers perform better with some order. The following test code randomizes the callbacks. Also note, that unschedule_by_callback only seems faster because it was run 1/10th the number of times as the other tests, so it's in fact typically slower than unschedule_by_event.

  1. The obvious one to try is a list:

    schedule_once: 5.87352774516
    trigger: 11.3946813228
    unschedule_by_callback: 50.218935854
    unschedule_by_event: 254.751554172
    tick: 16.543492186
    

    As is obvious, to unschedule, we have to search the list which is quite prohibitive.

  2. Next we try blist which is supposed to have better insertion/deletion performance for longer lists:

    schedule_once: 4.53894710476
    trigger: 11.3415820118
    unschedule_by_callback: 49.5523988043
    unschedule_by_event: 212.49588195
    tick: 13.8035579663
    

    It only does a little better, but is still prohibitively slow, probably because it's membership testing that is so expensive here.

  3. Next we try a set:

    schedule_once: 5.18649122519
    trigger: 11.0784686579
    unschedule_by_callback: 51.8641645246
    unschedule_by_event: 12.2605310054
    tick: 13.1667899876
    

    As is expected, unscheduling by event is very fast now. However, unscheduling by the callback is still slow because we have to iterate over all the events in the set to find the event associated with the callback.

    As a side note, when iterating over a set, e.g. events = set(range(1000)), it is faster to do for x in list(events) than for x in events.

    Further issues with sets, is that they are unordered. Using a ordered set, such as the OrderedDict would only reduce performance, both in memory and speed.

  4. Next we try like it used to be in Kivy master. For each callable we compute its hash, the name of the callable (callable.__name__) in this case. We then construct a dict whose keys are the hash and values is a list. The list contains the events scheduled with this hash. The effect is that when searching for a callback, we only have to look within the events with that hash:

    schedule_once: 5.31102250983
    trigger: 11.7890550961
    unschedule_by_callback: 0.68023717038
    unschedule_by_event: 15.8265765507
    tick: 15.0017295042
    

    As can be seen, unscheduling by event is comparable to a set, while unscheduling by callback is faster than all of them. There is one issue, though. By using the name for the hash, over time the dict of the hashes will likely grow into many empty lists from previous events, requiring periodic cleanup, which cannot be done in a thread safe manner without locks. Also, all lambdas share the same name, as are likely many layout callbacks.

  5. Before moving on, there's one more improvement to be done, but not much, and that is to use defaultdict with list, which in general is a lot faster than otherwise:

    schedule_once: 5.05421659043
    trigger: 11.581389352
    unschedule_by_callback: 0.443185868324
    unschedule_by_event: 14.0717239199
    tick: 13.8969306638
    
  6. The final method as implemented now in master (https://github.com/kivy/kivy/pull/2310) is to use a more random hash. For this we use the address of the event (id(event) in CPython) as follows: (id(cb) & 0xFF00) >> 8. This creates an 8 bit number from the second byte of the address (for class methods, we use the id of the class itself and not the id of the method because class methods get regenerated).

    Now, we can statically create a list of 256 items and index it with the hash. The randomness of the hash ensures a fairly equal distribution of events among the 256 slots. Furthermore, instead of a tree depth of 2, we can increase the depth by splitting the 8 bits in two 4 bits for another layer etc.

    Note, the reason for using the second byte is that many python callable objects may have similar size or be similarly aligned, therefore reducing the number of slots from 256 to a much lower than number:

    schedule_once: 5.13333771321
    trigger: 11.7416551011
    unschedule_by_callback: 0.381202112196
    unschedule_by_event: 13.9372451116
    tick: 13.8185367376
    
  7. June 16 The most recent attempt in PR https://github.com/kivy/kivy/pull/4412 is to use a doubly liked list which ensures determinism of the callbacks in the order they were scheduled. It also fully protects against multi-threading.

    The benchmarks for master was:

    schedule_once: 3.04238663305
    trigger:  5.99657834953
    unschedule_by_callback: 0.28418420994
    unschedule_by_event: 7.20933188983
    tick: 7.33034039193
    

    For the PR it's:

    schedule_once: 2.14062707372
    trigger:  4.78726804852
    unschedule_by_callback: 48.8890041079
    unschedule_by_event: 6.2745237454
    tick: 7.08543473329
    

Test code:

from random import shuffle
import timeit
from kivy.clock import Clock
count = 10000

funcs = [None, ] * count
for i in range(count):
    funcs[i] = f = lambda dt: None
    f.__name__ += str(i / 50)
shuffle(funcs)


def clear_events():
    Clock._root_event = None
    Clock._last_event = None
    Clock._events = [[] for i in range(256)]


def test_schedule():
    schedule = Clock.schedule_once
    clear_events()
    shuffle(funcs)
    for f in funcs:
        schedule(f)


def test_trigger():
    schedule = Clock.create_trigger
    clear_events()
    events = [None, ] * count
    f = funcs
    shuffle(f)
    for i in range(count):
        events[i] = schedule(f[i])
    for event in events:
        event()


def test_unschedule_by_callback():
    schedule = Clock.create_trigger
    unschedule = Clock.unschedule
    clear_events()
    events = [None, ] * count
    f = funcs
    for i in range(count):
        events[i] = schedule(f[i])
    for event in events:
        event()
    shuffle(f)
    for callback in f:
        unschedule(callback)


def test_unschedule_by_event():
    schedule = Clock.create_trigger
    unschedule = Clock.unschedule
    clear_events()
    events = [None, ] * count
    f = funcs
    for i in range(count):
        events[i] = schedule(f[i])
    for event in events:
        event()
    shuffle(events)
    for event in events:
        unschedule(event)


def test_tick():
    schedule = Clock.schedule_once
    clear_events()
    for f in funcs:
        schedule(f)
    Clock.tick()

t0 = timeit.Timer(test_schedule).repeat(1, 100)[0]
print 'schedule_once:', t0
t1 = timeit.Timer(test_trigger).repeat(1, 100)[0]
print 'trigger: ',  t1
t2 = timeit.Timer(test_unschedule_by_callback).repeat(1, 1)[0]
print 'unschedule_by_callback: {}'.format(t2)
t3 = timeit.Timer(test_unschedule_by_event).repeat(1, 100)[0]
print 'unschedule_by_event: {}'.format(t3)

for i in range(10):
    Clock.tick()
t4 = timeit.Timer(test_tick).repeat(1, 100)[0]
print 'tick: {}'.format(t4)