# Single Column Indexing

In order to optimize the speed of SQL, we must learn how to properly make use of indexing. Indexing will allow our SQL queries to take advantage of binary search, rather than scanning an entire table

In [6]:
import sqlite3
conn = sqlite3.connect('factbook.db')
schema = conn.execute("pragma table_info(facts);").fetchall()
for s in schema:
    print(s)

(0, 'id', 'INTEGER', 1, None, 1)
(1, 'code', 'varchar(255)', 1, None, 0)
(2, 'name', 'varchar(255)', 1, None, 0)
(3, 'area', 'integer', 0, None, 0)
(4, 'area_land', 'integer', 0, None, 0)
(5, 'area_water', 'integer', 0, None, 0)
(6, 'population', 'integer', 0, None, 0)
(7, 'population_growth', 'float', 0, None, 0)
(8, 'birth_rate', 'float', 0, None, 0)
(9, 'death_rate', 'float', 0, None, 0)
(10, 'migration_rate', 'float', 0, None, 0)


### Viewing Query Plans to check for efficiency

In [7]:
conn.execute("CREATE INDEX IF NOT EXISTS name_idx ON facts(name)")

<sqlite3.Cursor at 0x13f7592a500>

In [15]:
query_plan_one = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE area > 40000
;''').fetchall()
print(query_plan_one)

query_plan_two = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT area 
    FROM facts 
    WHERE area > 40000
;''').fetchall()
print(query_plan_two)

query_plan_three = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE name = 'Czech Republic'
;''').fetchall()
print(query_plan_three)

query_plan_four = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE id=20
;''').fetchall()
print(query_plan_four)


[(0, 0, 0, 'SCAN TABLE facts')]
[(0, 0, 0, 'SCAN TABLE facts')]
[(0, 0, 0, 'SEARCH TABLE facts USING INDEX name_idx (name=?)')]
[(0, 0, 0, 'SEARCH TABLE facts USING INTEGER PRIMARY KEY (rowid=?)')]


More query plans. You may wonder where query_plan_five went. I wonder the same thing. 

In [9]:
query_plan_six = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE population > 10000
;''').fetchall()
print(query_plan_six)

conn.execute("CREATE INDEX IF NOT EXISTS pop_idx ON facts(population)")

query_plan_seven = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE population > 10000
;''').fetchall()
print(query_plan_seven)

[(0, 0, 0, 'SEARCH TABLE facts USING INDEX pop_idx (population>?)')]
[(0, 0, 0, 'SEARCH TABLE facts USING INDEX pop_idx (population>?)')]


Knowing the query plans will allow for increased efficiency when searching databases. At least, it will allow for some knowledge of what needs to be optimized. If the query is searching the entire table, perhaps there should be an index addressing the specific column that we are searching.

# Multi-Column Indexing

In [10]:
query_plan_one = conn.execute('''
EXPLAIN QUERY PLAN 
SELECT * 
FROM facts 
WHERE 
    population_growth < .05 
    AND 
    population > 1000000
;''').fetchall()
print(query_plan_one)

[(0, 0, 0, 'SEARCH TABLE facts USING INDEX pop_idx (population>?)')]


In order to increase efficiency over two columns, we can create two different indexes

In [11]:
conn.execute("CREATE INDEX IF NOT EXISTS pop_idx ON facts(population)")
conn.execute("CREATE INDEX IF NOT EXISTS pop_growth_idx ON facts(population_growth)")

query_plan_two = conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE 
        population > 100000 
        AND 
        population_growth < 0.05
;''').fetchall()
print(query_plan_two)

[(0, 0, 0, 'SEARCH TABLE facts USING INDEX pop_growth_idx (population_growth<?)')]


**The attempt above has an issue. Only one of the two indices created was used.** 

Our next attempt will create a single index over two separate columns.

In [14]:
conn.execute("CREATE INDEX IF NOT EXISTS pop_pop_growth_idx ON facts(population, population_growth);")

query_plan_three =  conn.execute('''
EXPLAIN QUERY PLAN 
    SELECT * 
    FROM facts 
    WHERE 
        population > 1000000 
        AND 
        population_growth < 0.05;
''').fetchall()
print(query_plan_three)

[(0, 0, 0, 'SEARCH TABLE facts USING INDEX pop_pop_growth_idx (population>?)')]


In [13]:
query_plan_four = conn.execute('''
EXPLAIN QUERY PLAN 
SELECT 
    population, 
    population_growth 
FROM facts 
WHERE 
    population > 1000000 
    AND 
    population_growth < 0.05
;''').fetchall()
print(query_plan_four)

[(0, 0, 0, 'SEARCH TABLE facts USING COVERING INDEX pop_pop_growth_idx (population>?)')]
