# Benchmark load_by_run_spec

The time required to load a dataset using `load_by_run_spec`when compared to the time taken by `load_by_ guid` or `load_by_id` is two orders of magnitude higher for a large database, in this case 10 gb. 

Once the databased is "vacuum"ed, the time to retrieve dataset decreases by 4 times. However, it takes around 10min to vacuum a database each time you do it. This is irrespective of the fact if you just vacuumed your db. 

The time to load dataset using `load_by_run_spec` comes at par with other methods once the proper indexing is done. Also, the total time to upgrade the database is just around 10 sec.  

It was not possible to lower the search time from 300ms because, as Jens have mentioned below, the connection to db is significantly slower for larger db. In this case connection time is around 250 ms. Time for actual search is hence only few ms. 

In [1]:
from qcodes import initialise_or_create_database_at, load_by_id, load_by_guid, load_by_run_spec

Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename       : C:\Users\a-halakh\.qcodes\logs\command_history.log
Mode           : append
Output logging : True
Raw input log  : False
Timestamping   : True
State          : active
Qcodes Logfile : C:\Users\a-halakh\.qcodes\logs\200120-14524-qcodes.log


In [2]:
initialise_or_create_database_at(r"C:\Users\a-halakh\qt5-experiments-2019-08-29.db")

In [3]:
%timeit load_by_id(1234)

276 ms ± 30.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
%timeit load_by_guid('aaaaaaaa-0000-0000-0000-01673807fd53')

274 ms ± 23.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
%time load_by_run_spec(captured_run_id=1234)

Wall time: 31 s


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

# Observation: Initial Connection taking longer time

In [6]:
from qcodes.dataset.data_set import get_DB_location,connect

In [7]:
conn = connect(get_DB_location())

In [8]:
%time load_by_run_spec(captured_run_id=1234, conn = conn)

Wall time: 7.21 s


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [9]:
%timeit load_by_id(run_id = 1234, conn = conn)

965 µs ± 39.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [10]:
%timeit load_by_guid(guid = 'aaaaaaaa-0000-0000-0000-01673807fd53', conn = conn)

1.02 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [11]:
curs = conn.cursor()

# Check if sql vaccum helps

In [11]:
%time conn.execute("VACUUM")

Wall time: 11min 12s


<sqlite3.Cursor at 0x28bb437bdc0>

In [12]:
%time conn.execute("VACUUM")

Wall time: 9min 57s


<sqlite3.Cursor at 0x28bb437b960>

In [13]:
%time conn.execute("VACUUM")

Wall time: 12min 10s


<sqlite3.Cursor at 0x28bb437b260>

In [16]:
%time load_by_run_spec(captured_run_id=1234, conn = conn)

Wall time: 7.51 s


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [17]:
%time load_by_run_spec(captured_run_id=1234)

Wall time: 7.64 s


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [18]:
%timeit load_by_id(1234)

263 ms ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%timeit load_by_guid('aaaaaaaa-0000-0000-0000-01673807fd53')

277 ms ± 49.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [20]:
%timeit load_by_id(run_id = 1234, conn = conn)

913 µs ± 67.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Doing "Vaccum" do increase the efficiency of output. Still it can not reduce less that few hundred ms because the first the new connection takes that much time. 

In [21]:
%timeit load_by_guid(guid = 'aaaaaaaa-0000-0000-0000-01673807fd53', conn = conn)

936 µs ± 102 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# Profiling load_by_run_spec

In [22]:
import snakeviz

In [23]:
%load_ext snakeviz

In [24]:
%snakeviz load_by_run_spec(captured_run_id=1234)

 
*** Profile stats marshalled to file 'C:\\Users\\a-halakh\\AppData\\Local\\Temp\\tmp_um6vd8p'. 
Embedding SnakeViz in this document...


In [24]:
rows = curs.fetchall()

In [25]:
len(rows)

0

# Db Upgrade to index captured run id

In [25]:
from qcodes.dataset.sqlite.connection import atomic_transaction, atomic, transaction
from tqdm import tqdm
import sys

In [26]:

def perform_db_upgrade_8_to_9(conn) -> None:
    """
    Perform the upgrade from version 1 to version 2

    Add two indeces on the runs table, one for exp_id and one for GUID
    """

    sql = "SELECT name FROM sqlite_master WHERE type='table' AND name='runs'"
    cur = atomic_transaction(conn, sql)
    n_run_tables = len(cur.fetchall())

    pbar = tqdm(range(1), file=sys.stdout)
    pbar.set_description("Upgrading database; v1 -> v2")

    if n_run_tables == 1:
        _IX_runs_capture_id = """
                          CREATE INDEX
                          IF NOT EXISTS IX_runs_capture_id
                          ON runs (captured_run_id DESC)
                          """
        with atomic(conn) as conn:
            # iterate through the pbar for the sake of the side effect; it
            # prints that the database is being upgraded
            for _ in pbar:
                transaction(conn, _IX_runs_capture_id)
    else:
        raise RuntimeError(f"found {n_run_tables} runs tables expected 1")

In [27]:
perform_db_upgrade_8_to_9(conn) 

Upgrading database; v1 -> v2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.93s/it]


In [28]:
%time load_by_run_spec(captured_run_id=1234, conn = conn)

Wall time: 14 ms


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [35]:
%time load_by_run_spec(captured_run_id=1234)

Wall time: 352 ms


results #1234@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [36]:
%time load_by_run_spec(captured_counter=1234)

Wall time: 7.35 s


results #2287@C:\Users\a-halakh\qt5-experiments-2019-08-29.db
-------------------------------------------------------------
scanner_b - array
scanner_pl - array
s_x - array
s_y - array

In [37]:
%timeit load_by_id(1234)

272 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [39]:
%timeit load_by_guid('aaaaaaaa-0000-0000-0000-01673807fd53')

233 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
