Research: how much overhead does the n=1 time limit have? #1679

simonw · 2022-03-21T19:27:46Z

Lines 181 to 200 in 1a7750e

    
           @contextmanager 
        
           def sqlite_timelimit(conn, ms): 
        
               deadline = time.perf_counter() + (ms / 1000) 
        
               # n is the number of SQLite virtual machine instructions that will be 
        
               # executed between each check. It's hard to know what to pick here. 
        
               # After some experimentation, I've decided to go with 1000 by default and 
        
               # 1 for time limits that are less than 50ms 
        
               n = 1000 
        
               if ms < 50: 
        
                   n = 1 
        
               def handler(): 
        
                   if time.perf_counter() >= deadline: 
        
                       return 1 
        
               conn.set_progress_handler(handler, n) 
        
               try: 
        
                   yield 
        
               finally: 
        
                   conn.set_progress_handler(None, n)

@contextmanager
def sqlite_timelimit(conn, ms):
    deadline = time.perf_counter() + (ms / 1000)
    # n is the number of SQLite virtual machine instructions that will be
    # executed between each check. It's hard to know what to pick here.
    # After some experimentation, I've decided to go with 1000 by default and
    # 1 for time limits that are less than 50ms
    n = 1000
    if ms < 50:
        n = 1

    def handler():
        if time.perf_counter() >= deadline:
            return 1

    conn.set_progress_handler(handler, n)
    try:
        yield
    finally:
        conn.set_progress_handler(None, n)

How often do I set a time limit of 50 or less? How much slower does it go thanks to this code?

simonw · 2022-03-21T19:30:05Z

datasette/datasette/app.py

Lines 118 to 122 in 1a7750e

    
           Setting( 
        
               "facet_suggest_time_limit_ms", 
        
               50, 
        
               "Time limit for calculating a suggested facet", 
        
           ),

sets it to 50ms for facet suggestion but that's not going to pass ms < 50:

    Setting(
        "facet_suggest_time_limit_ms",
        50,
        "Time limit for calculating a suggested facet",
    ),

simonw · 2022-03-21T19:30:44Z

So it looks like even for facet suggestion n=1000 always - it's never reduced to n=1.

simonw · 2022-03-21T19:31:10Z

How long does it take for SQLite to execute 1000 opcodes anyway?

simonw · 2022-03-21T19:37:08Z

This is weird:

import sqlite3

db = sqlite3.connect(":memory:")

i = 0

def count():
    global i
    i += 1


db.set_progress_handler(count, 1)

db.execute("""
with recursive counter(x) as (
  select 0
    union
  select x + 1 from counter
)
select * from counter limit 10000;
""")

print(i)

Outputs 24. But if you try the same thing in the SQLite console:

sqlite> .stats vmstep
sqlite> with recursive counter(x) as (
   ...>   select 0
   ...>     union
   ...>   select x + 1 from counter
   ...> )
   ...> select * from counter limit 10000;
...
VM-steps: 200007

simonw · 2022-03-21T19:42:08Z

Here's the Python-C implementation of set_progress_handler: https://github.com/python/cpython/blob/4674fd4e938eb4a29ccd5b12c15455bd2a41c335/Modules/_sqlite/connection.c#L1177-L1201

It calls sqlite3_progress_handler(self->db, n, progress_callback, ctx);

https://www.sqlite.org/c3ref/progress_handler.html says:

The parameter N is the approximate number of virtual machine instructions that are evaluated between successive invocations of the callback X

So maybe VM-steps and virtual machine instructions are different things?

simonw · 2022-03-21T19:48:59Z

Posed a question about that here: https://sqlite.org/forum/forumpost/de9ff10fa7

simonw · 2022-03-21T21:28:58Z

David Raymond solved it there: https://sqlite.org/forum/forumpost/330c8532d8a88bcd

Don't forget to step through the results. All .execute() has done is prepared it.
db.execute(query).fetchall()

Sure enough, adding that gets the VM steps number up to 190,007 which is close enough that I'm happy.

simonw · 2022-03-21T21:38:27Z

OK here's a microbenchmark script:

import sqlite3
import timeit

db = sqlite3.connect(":memory:")
db_with_progress_handler_1 = sqlite3.connect(":memory:")
db_with_progress_handler_1000 = sqlite3.connect(":memory:")

db_with_progress_handler_1.set_progress_handler(lambda: None, 1)
db_with_progress_handler_1000.set_progress_handler(lambda: None, 1000)

def execute_query(db):
    cursor = db.execute("""
    with recursive counter(x) as (
    select 0
        union
    select x + 1 from counter
    )
    select * from counter limit 10000;
    """)
    list(cursor.fetchall())


print("Without progress_handler")
print(timeit.timeit(lambda: execute_query(db), number=100))

print("progress_handler every 1000 ops")
print(timeit.timeit(lambda: execute_query(db_with_progress_handler_1000), number=100))

print("progress_handler every 1 op")
print(timeit.timeit(lambda: execute_query(db_with_progress_handler_1), number=100))

Results:

% python3 bench.py
Without progress_handler
0.8789225700311363
progress_handler every 1000 ops
0.8829826560104266
progress_handler every 1 op
2.8892734259716235

So running every 1000 ops makes almost no difference at all, but running every single op is a 3.2x performance degradation.

simonw · 2022-03-21T21:48:02Z

Here's another microbenchmark that measures how many nanoseconds it takes to run 1,000 vmops:

import sqlite3
import time

db = sqlite3.connect(":memory:")

i = 0
out = []

def count():
    global i
    i += 1000
    out.append(((i, time.perf_counter_ns())))

db.set_progress_handler(count, 1000)

print("Start:", time.perf_counter_ns())
all = db.execute("""
with recursive counter(x) as (
  select 0
    union
  select x + 1 from counter
)
select * from counter limit 10000;
""").fetchall()
print("End:", time.perf_counter_ns())

print()
print("So how long does it take to execute 1000 ops?")

prev_time_ns = None
for i, time_ns in out:
    if prev_time_ns is not None:
        print(time_ns - prev_time_ns, "ns")
    prev_time_ns = time_ns

Running it:

% python nanobench.py
Start: 330877620374821
End: 330877632515822

So how long does it take to execute 1000 ops?
47290 ns
49573 ns
48226 ns
45674 ns
53238 ns
47313 ns
52346 ns
48689 ns
47092 ns
87596 ns
69999 ns
52522 ns
52809 ns
53259 ns
52478 ns
53478 ns
65812 ns

87596ns is 0.087596ms - so even a measure rate of every 1000 ops is easily finely grained enough to capture differences of less than 0.1ms.

If anything I could bump that default 1000 up - and I can definitely eliminate the if ms < 50 branch entirely.

simonw · 2022-03-21T21:53:47Z

Oh interesting, it turns out there is ONE place in the code that sets the ms to less than 20 - this test fixture:

datasette/tests/fixtures.py

Lines 224 to 226 in 4e47a2d

    
           def app_client_shorter_time_limit(): 
        
               with make_app_client(20) as client: 
        
                   yield client

simonw · 2022-03-21T21:55:45Z

I'm going to change the original logic to set n=1 for times that are <= 20ms - and update the comments to make it more obvious what is happening.

simonw added research datasette labels Mar 21, 2022

simonw transferred this issue from another repository Mar 21, 2022

simonw transferred this issue from simonw/temp Mar 21, 2022

simonw removed the datasette label Mar 21, 2022

simonw added this to the Datasette 1.0 milestone Mar 21, 2022

simonw closed this as completed in 72bfd75 Mar 21, 2022

$@fractaledmind$ fractaledmind mentioned this issue Jan 26, 2024

Implement progress handler interface sparklemotion/sqlite3-ruby#458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: how much overhead does the n=1 time limit have? #1679

Research: how much overhead does the n=1 time limit have? #1679

simonw commented Mar 21, 2022 •

edited

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

Research: how much overhead does the n=1 time limit have? #1679

Research: how much overhead does the n=1 time limit have? #1679

Comments

simonw commented Mar 21, 2022 • edited

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022

simonw commented Mar 21, 2022 •

edited