# Example for a DB-API - Using `sqlite3` with SQL and Python

`sqlite3` is an example for a DB-API. This is a low-level interface to interact with a DB. Another concept/interface is ORM (Object Relational Mapper). An example is shown in the [db-sqalchemy-notebook](db-sqalchemy.ipynb).
`sqlite3` executes plain SQL queries directly. Connections and results need to be handled and managed manually. With such a low-level DB-API there is no mapping between the DB tables and the Python objects.

**Use**, if you have:
- Small use cases or demo/tutorials
- Static use cases, i.e. if you create a table once and never touch the schema again
- Profound SQL skills
- (Need for absolute control over DB connections)

**Don't use**, if you have:
- Complex DB schema
- Evolving DB
- Need for migration and version control
- Need for Python objects

## What do we want to do now?

We organise a company running event ("Firmenlauf") and want to know the participants including their shoe size and team shoe color, thus that we can order the shoes in the correct color and size. We started with the following very "inefficient" single table, which got larger with every new information and idea what to keep track of. 

![start_table](start_table.png)

The table includes a lot of dublicated information. We know that a **relational database** is a collection of structured data. Thus, we found logical units in our single table, that we now want to insert into a database.

We want to create a DB containing the tables `teams`, `runners` and `trainings` like:

![RDB_example](RDB_example.png)

Note, this code will only work with sqlite DBs, since we using the Python API for sqlite: https://docs.python.org/3/library/sqlite3.html#

For PostgreSQL, use `psycopg2`: https://www.psycopg.org/docs/index.html

# Set up a DB connection
First, we create a simple `sqlite` DB and open a connection to it.

In [None]:
import sqlite3

In [None]:
!rm firmenlauf_demo.db

In [None]:
database = "firmenlauf_demo.db"
connection = sqlite3.connect(database)

In [None]:
# create a DB cursor to be able to execute SQL statements
cur = connection.cursor()

# Create tables

To define our tables, we have to write plain SQL statements.
Each table has a `PRIMARY KEY`, that identifies every row and a `UNIQUE` constraint which defines which combinations of values have to be unique in this table. This prevents us from inserting duplicates into the DB.

We also define relationships via the `FOREIGN KEY ... REFERENCES` statement.

Learn more about SQL: https://www.w3schools.com/sql/default.asp

In [None]:
# Define queries to create new tables in plain SQL

sql_create_table_teams = """
    CREATE TABLE IF NOT EXISTS teams (
    	id INTEGER NOT NULL, 
    	size INTEGER NOT NULL, 
    	shoe_color VARCHAR(20) NOT NULL, 
    	PRIMARY KEY (id), 
    	UNIQUE (shoe_color)
    )
"""

sql_create_table_runners = """
    CREATE TABLE IF NOT EXISTS runners (
    	id INTEGER NOT NULL, 
    	first_name VARCHAR(30) NOT NULL, 
    	last_name VARCHAR(30) NOT NULL, 
    	shoe_size INTEGER NOT NULL, 
    	shirt_size INTEGER NOT NULL, 
    	distance FLOAT, 
    	team_id INTEGER, 
    	PRIMARY KEY (id), 
    	UNIQUE (first_name, last_name, team_id), 
    	FOREIGN KEY(team_id) REFERENCES teams (id)
    )
"""

sql_create_table_trainings = """
    CREATE TABLE IF NOT EXISTS trainings (
    	id INTEGER NOT NULL, 
    	date DATETIME NOT NULL, 
    	time DATETIME NOT NULL, 
    	distance FLOAT NOT NULL, 
    	runner_id INTEGER, 
    	PRIMARY KEY (id), 
    	UNIQUE (runner_id, date), 
    	FOREIGN KEY(runner_id) REFERENCES runners (id)
    )
"""

### Now, we execute these SQL statements and commit it to the DB.

In [None]:
cur.execute(sql_create_table_teams)

In [None]:
cur.execute(sql_create_table_runners)

In [None]:
cur.execute(sql_create_table_trainings)

In [None]:
connection.commit()

### We can now query the tables but they will be empty

In [None]:
cur.execute("SELECT * FROM teams").fetchall()

# Insert data into the DB

Again, we first have to write our `INSERT` statements in plain SQL and the execute them and commit the changes.

Note, if we execute the same `INSERT` twice (meaning running the cell twice), we will get an `IntegrityError` for trying to insert duplicates into the DB. This is caused by the `UNIQUE` constraint we set in our `CREATE TABLE` statement.

In [None]:
sql_insert_teams = [
    "INSERT INTO teams (id, size, shoe_color) VALUES (3, 16, 'Red')",
    "INSERT INTO teams (id, size, shoe_color) VALUES (5, 15, 'Green')",
    "INSERT INTO teams (id, size, shoe_color) VALUES (8, 11, 'Purple')",
]

In [None]:
for insert_team in sql_insert_teams:
    cur.execute(insert_team)
connection.commit()

### Now, we can query the inserted data

In [None]:
cur.execute("SELECT * FROM teams").fetchall()

### We will insert some more data into the DB

In [None]:
sql_insert_runners = """
    INSERT INTO runners (
        first_name, last_name, shoe_size, shirt_size, distance, team_id
    )
    VALUES (?, ?, ?, ?, ?, ?)
"""

runners = [
    ('Anna', 'Einstein', 38, 38, 5.0, 3),
    ('Marius', 'Fermi', 44, 60, 2.0, 5),
    ('James', 'Pauli', 44, 42, 10.0, 8),
    ('Selma', 'Meitner', 41, 40, 10.0, 3),
]

In [None]:
for runner in runners:
    cur.execute(sql_insert_runners, runner)

In [None]:
cur.execute("SELECT * FROM runners").fetchall()

In [None]:
sql_insert_trainings = """
    INSERT INTO trainings (
        date, time, distance, runner_id
    )
    VALUES (?, ?, ?, ?)
"""

trainings = [
    ("2023-07-15", "39:00", 4.500, 1),
    ("2023-08-05", "58:00", 3.000, 2),
    ("2023-08-07", "34:45", 1.600, 3),
    ("2023-07-08", "32:00", 4.050, 4),
    ("2023-07-18", "35:00", 4.500, 1),
    ("2023-07-25", "30:00", 4.500, 1),
    ("2023-09-07", "37:00", 5.456, 4),
    ("2023-07-19", "41:51", 2.240, 3),
    ("2023-07-28", "32:06", 1.600, 3),
]

In [None]:
for training in trainings:
    cur.execute(sql_insert_trainings, training)
connection.commit()

In [None]:
cur.execute("SELECT * FROM trainings").fetchall()

# `JOIN` tables

We can now join tables and retrieve a list of shoes we have to order for the runners. 

In [None]:
sql_query_runner_shoe_color = """
    SELECT runners.first_name, runners.shoe_size, teams.shoe_color 
    FROM runners
    JOIN teams
    ON runners.team_id = teams.id
"""

In [None]:
cur.execute(sql_query_runner_shoe_color).fetchall()

# Show DB schema information

The `sqlite_master` element contains all information of the DB schema.

In [None]:
cur.execute("SELECT * FROM sqlite_master ;") 
print(cur.fetchall())

Filter available tables:

In [None]:
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cur.fetchall())

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © [Point 8 GmbH](https://point-8.de)_