# Uporaba indeksov v bazah

Če želimo delati hitre poizvedbe po določenih stolpcih, lahko na njih uvedemo **indekse**.

In [1]:
import sqlite3
from urllib.request import urlretrieve

Poglejmo si dve bazi, ki vsebujeta podatke o nekaj več kot 200000 grafih iz projekta [*discrete*ZOO](https://discretezoo.xyz/). Prva vsebuje indekse in je velika 426 MB, druga pa vsebuje iste podatke, a so bili indeksi odstranjeni (z izjemo ključev in `UNIQUE`), tako da je njena velikost 385 MB.

In [None]:
_ = urlretrieve("http://baza.fmf.uni-lj.si/discretezoo.db", "discretezoo.db")
_ = urlretrieve("http://baza.fmf.uni-lj.si/discretezoo-noindex.db", "discretezoo-noindex.db")

Vzpostavimo povezavi na obe bazi in napišimo funkcijo, ki bo primerjala trajanje izvajanja iste poizvedbe na vsaki od baz.

In [2]:
c1 = sqlite3.connect("discretezoo.db")
c2 = sqlite3.connect("discretezoo-noindex.db")

In [3]:
def primerjaj(*largs):
    %timeit c1.execute(*largs).fetchall()
    %timeit c2.execute(*largs).fetchall()

Poglejmo si, kateri indeksi obstajajo v vsaki bazi.

In [4]:
c1.execute("SELECT sql FROM sqlite_master WHERE type = 'index'").fetchall()

[(None,),
 ('CREATE INDEX "idx_object_alias_alias" ON "object_alias"("alias")',),
 ('CREATE INDEX "idx_object_unique_id_unique_id" ON "object_unique_id"("unique_id")',),
 ('CREATE INDEX "idx_graph_order" ON "graph"("order")',),
 ('CREATE INDEX "idx_graph_average_degree" ON "graph"("average_degree")',),
 ('CREATE INDEX "idx_graph_vt_vt_index" ON "graph_vt"("vt_index")',),
 ('CREATE INDEX "idx_graph_cvt_cvt_index" ON "graph_cvt"("cvt_index")',),
 ('CREATE INDEX "idx_graph_cvt_symcubic_index" ON "graph_cvt"("symcubic_index")',),
 ('CREATE UNIQUE INDEX "idx_object_alias_object_id_alias_unique" ON "object_alias"("object_id", "alias")',),
 ('CREATE UNIQUE INDEX "idx_object_unique_id_object_id_algorithm_unique" ON "object_unique_id"("object_id", "algorithm")',),
 ('CREATE UNIQUE INDEX "idx_graph_spx_spx_r_spx_s_unique" ON "graph_spx"("spx_r", "spx_s")',),
 ('CREATE UNIQUE INDEX "idx_change_zooid_table_column_commit_unique" ON "change"("zooid", "table", "column", "commit")',)]

In [5]:
c2.execute("SELECT sql FROM sqlite_master WHERE type = 'index'").fetchall()

[(None,),
 ('CREATE UNIQUE INDEX "idx_object_alias_object_id_alias_unique" ON "object_alias"("object_id", "alias")',),
 ('CREATE UNIQUE INDEX "idx_object_unique_id_object_id_algorithm_unique" ON "object_unique_id"("object_id", "algorithm")',),
 ('CREATE UNIQUE INDEX "idx_graph_spx_spx_r_spx_s_unique" ON "graph_spx"("spx_r", "spx_s")',),
 ('CREATE UNIQUE INDEX "idx_change_zooid_table_column_commit_unique" ON "change"("zooid", "table", "column", "commit")',)]

Poskusimo sedaj prvo poizvedbo - štetje vrstic v tabeli `graph` z določeno vrednostjo v stolpcu `order`.

In [6]:
sql = """
    SELECT COUNT(*) FROM graph WHERE `order` = ?
"""
primerjaj(sql, [512])

50.7 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
156 ms ± 4.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Opazimo, da je poizvedba na bazi z indeksi za nekaj velikostnih razredov hitrejša kot na bazi brez indeksov. SQLite si sicer poizvedbo zapomni in ustvari začasen indeks, zaradi česar so naslednje poizvedbe hitrejše. Poskusimo sedaj z branjem celotnih vrstic.

In [7]:
sql = """
    SELECT * FROM graph WHERE `order` = ?
"""
primerjaj(sql, [512])

11.4 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
174 ms ± 7.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Razlika je še vedno očitna. Poskusimo še z združevanjem.

In [8]:
sql = """
    SELECT `order`, AVG(diameter) FROM graph GROUP BY `order`
"""
primerjaj(sql)

482 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
491 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
