<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
    MySQL and MariaDB for Python Developers
</h1>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:55%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:62%; left:10%;">
    Data Scientist
</h3>
</div>

# Python DB-API

Python defines a standard interface for all SQL relational database system, called the DB-API.  Most database drivers within the Python ecosystem follow this API standard; any features specific to a particular Relational Database Management System (RDBMS), such as MySQL, are communicated at the SQL level rather than with special Python methods.

The Python Enhancement Proposal (PEP) 249 describes the requirements of the DB-API 2.0.  Details of the degree of support and choice among optional features are exposed in module interfaces.

## Adapter capabilities

For comparison, let us inspect adapters to an SQLite database and a MySQL database.  Some parameters are coded compactly.

---

| threadsafety | Meaning
|-------------:|:--------------------------------------
| 0            | Threads may not share the module.
| 1            | Threads may share the module, but not connections.
| 2            | Threads may share the module and connections.
| 3            | Threads may share the module, connections and cursors.

---

| paramstyle | Meaning
|-----------:|:----------------------------------------
| qmark      | Question mark style, e.g. ...WHERE name=?
| numeric    | Numeric, positional style, e.g. ...WHERE name=:1
| named      | Named style, e.g. ...WHERE name=:name
| format     | ANSI C printf format codes, e.g. ...WHERE name=%s
| pyformat   | Python extended format codes, e.g. ...WHERE name=%(name)s

In [1]:
import sqlite3
print(f"API level       | {sqlite3.apilevel}")
print(f"Parameter style | {sqlite3.paramstyle}")
print(f"Thread safety   | {sqlite3.threadsafety}")

API level       | 2.0
Parameter style | qmark
Thread safety   | 1


In [2]:
import mysql.connector
print(f"API level       | {mysql.connector.apilevel}")
print(f"Parameter style | {mysql.connector.paramstyle}")
print(f"Thread safety   | {mysql.connector.threadsafety}")

API level       | 2.0
Parameter style | pyformat
Thread safety   | 1


## Preparing the server

Before you get started with the steps in the lesson, you will need to configure your MySQL or MariaDB server as an administrator.  Specifically, we need to run these SQL commands to create the user and database we will work as:

```sql
CREATE DATABASE ine;
CREATE USER 'ine_student'@'localhost' IDENTIFIED BY 'ine-password';
GRANT ALL PRIVILEGES ON ine.* TO 'ine_student'@'localhost';
```

If the server is on a remote machine, a host address other than `localhost` will be needed.

## A connection and cursors

We can see from the threadsafety level our `mysql.connector` adaptor provides, that we can create a single connection for all the threads we may wish to use.  The cursors should remain distinct between threads.  This lesson will not use Python threading, which is a separate course, but we can create multiple cursors in the main thread, if we wish.  For this lesson, we simply assume that a database called `ine` exists, and the MySQL user and password configured will work.

In [11]:
user = 'ine_student'
pwd = 'ine-password'
host = 'localhost'
port = '3306'
db = 'ine'
conn = mysql.connector.connect(database=db, host=host, user=user, password=pwd, port=port)

If it is convenient, we can work with multiple cursor.  Keep in mind, however, that performing a commit or a rollback will happen at the connection level.  However, it may be useful, for example, to create temporary cursors within a function, and only pass around a connection object.

In [4]:
cur = conn.cursor()

The main action we peform with a cursor is to `.execute()` SQL statements.

In [5]:
# Create the table with cursor#1
sql_create = """
CREATE TABLE IF NOT EXISTS users (
  user_id SERIAL PRIMARY KEY,
  username VARCHAR(50) UNIQUE NOT NULL,
  password VARCHAR(50) NOT NULL,
  age SMALLINT,
  created_on TIMESTAMP NOT NULL
);
"""
cur.execute(sql_create)
# Remove any rows if table had existed
cur.execute("DELETE FROM users;")

MySQL allows an SQL extension of `IF NOT EXISTS` in SQL statements.  The table may or may not have existed initially, but this will not fail if it did.  However, if a table already exists, a second `CREATE TABLE` with this option will ignore the field names and data types in the new SQL statement.

At this point, the table has not actually been created, but rather the action has been placed in the transaction queue.  It may or may not be committed.  In fact, if we attempt to commit it, it is *possible* that some other action by another connection would be inconsistent with this, and the transaction would be rolled back.  In this case, and most cases, a commit will succeed.

In [6]:
conn.commit()

As described, a `CREATE TABLE IF NOT EXISTS` can succeed at the query level, but not alter a table.

In [7]:
sql_bad_create = """
CREATE TABLE IF NOT EXISTS users (
  not_an_id SERIAL PRIMARY KEY,
  not_a_user INTEGER UNIQUE NOT NULL,
  not_a_password VARCHAR(30) NOT NULL
);
"""
cur.execute(sql_bad_create)
conn.commit()

We can check the table structure using a query, and verify which version exists in the database.

In [8]:
sql_schema = """
SELECT column_name, data_type, character_maximum_length, 
       column_default, is_nullable
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE table_name = 'users';
"""
cur.execute(sql_schema)
cur.fetchall()

[('user_id', 'bigint', None, None, 'NO'),
 ('username', 'varchar', 50, None, 'NO'),
 ('password', 'varchar', 50, None, 'NO'),
 ('age', 'smallint', None, None, 'YES'),
 ('created_on', 'timestamp', None, None, 'NO')]

## Working with data

With the table we created above, let us write some data to it.  Remember that the `mysql.connector` adapter uses the `pyformat` parameter style.  Note also that MySQL is only thread-safe in sharing the adapter module, but not in sharing the connections.  To handle that, we might create a *connection pool* to draw from.

In [9]:
from queue import Queue
pool = Queue(maxsize=5)  # Keep around 5 connections objects
for _ in range(5):
    conn = mysql.connector.connect(
                database=db, host=host, user=user, password=pwd, port=port)
    pool.put(conn)

A thread-safe function utilizing the pool can add users.

In [10]:
from datetime import datetime
from threading import Thread

def add_user(pool, user):
    # Need to get a connection from the pool
    conn = pool.get()
    cursor = conn.cursor()
    user['now'] = datetime.now().isoformat()
    user['age'] = user.get('age')
    sql = """INSERT INTO users (username, password, age, created_on) 
             VALUES (%(username)s, %(password)s, %(age)s, %(now)s)"""
    cursor.execute(sql, user)
    # When we are done with connection, put it back in the pool
    pool.put(conn)

We can call this function with user data a few times, each in a separate thread.

In [11]:
users_info = [
  dict(username='Alice', password='bad_pw_1', age=37),
  dict(username='Bob', password='bad_pw_2'),
  dict(username='Carlos', password='bad_pw_3', age=62)
]
for user_info in users_info:
    t = Thread(target=add_user, args=(pool, user_info))
    t.start()
    t.join()

So far, so good.  However, these data have not actually been stored in the database, only queued as a transaction.  The global cursor, from the connection not from the pool, cannot see them.

In [12]:
cur = conn.cursor()
cur.execute("SELECT * FROM users;")
print([f[0] for f in cur.description])
print('-----\nTuples:')
for row in cur:
    print(row)

['user_id', 'username', 'password', 'age', 'created_on']
-----
Tuples:


To make the data available to all connections, we want to commit it.  However, since we are using a thread pool, we want to get all the connections to commit.  This could be simpler if we know we will only will use single threading; but the approach here can allow scaling to heavier load, if we need it.  For writing a small number of rows at a time, the threaded approach is overhead without gain; in other scenarios it can be a huge improvement.

In [13]:
def commit_all(pool):
    for _ in range(pool.qsize()):
        conn = pool.get()
        conn.commit()
        pool.put(conn)

commit_all(pool)

Checking again, we see our committed data.

In [14]:
cur = conn.cursor()
cur.execute("SELECT * FROM users;")
print([f[0] for f in cur.description])
print('-----\nTuples:')
for row in cur:
    print(row)

['user_id', 'username', 'password', 'age', 'created_on']
-----
Tuples:
(1, 'Alice', 'bad_pw_1', 37, datetime.datetime(2021, 1, 9, 22, 19, 3))
(2, 'Bob', 'bad_pw_2', None, datetime.datetime(2021, 1, 9, 22, 19, 3))
(3, 'Carlos', 'bad_pw_3', 62, datetime.datetime(2021, 1, 9, 22, 19, 3))


## Uncommitted data

A batch of SQL statements may not succeed.  In such a case, we may not wish for *any* of them to be recorded.  In such a case, we want to call `.rollback()` on the connection to inform the server to discard the transaction from the queue.  We might rollback because of a problem the server reports, or we may do so because of something we determine at an application level.

In [15]:
def add_many(pool, users_info):
    sql = """INSERT INTO users (username, password, age, created_on) 
         VALUES (%(username)s, %(password)s, %(age)s, %(now)s)"""
    try:
        conn = pool.get()
        cur = conn.cursor()
        for user in users_info:
            if 'password' in user['password']:
                raise ValueError(f"Terrible password for {user['username']}")
            user['age'] = user.get('age')   # Default to None if not age
            user['now'] = datetime.now().isoformat()
        # Insert batch all at once
        cur.executemany(sql, users_info)
    except Exception as err:
        conn.rollback()
        print("Transaction rolled back because of:", type(err).__name__)
        print(err)
        return False
    else:
        conn.commit()
        return True
    finally:
        pool.put(conn)

Perhaps the datatypes are wrong:

In [16]:
users_bad_data = [
    dict(username='Dave', password='insecure_1'),
    dict(username='Erin', password='insecure_2'),
    dict(username='Faythe', password='insecure_3', age="ABC")
]
add_many(pool, users_bad_data)

Transaction rolled back because of: DatabaseError
1366 (HY000): Incorrect integer value: 'ABC' for column 'age' at row 3


False

Or perhaps a uniqueness constraint is violated:

In [17]:
users_dup_data = [
    dict(username='Dave', password='insecure_1'),
    dict(username='Erin', password='insecure_2'),
    dict(username='Carlos', password='bad_pw_4')
]
add_many(pool, users_dup_data)

Transaction rolled back because of: IntegrityError
1062 (23000): Duplicate entry 'Carlos' for key 'users.username'


False

Or it might be that the application itself is able to exclude some data:

In [18]:
users_app_rules = [
    dict(username='Grace', password='insecure_77'),
    dict(username='Heidi', password='insecure_88'),
    dict(username='Ivan', password='password_55')
]
add_many(pool, users_app_rules)

Transaction rolled back because of: ValueError
Terrible password for Ivan


False

## Working in batches

For the last few cells, we will configure the connection to AUTOCOMMIT.  That is, every time an insertion is made, a COMMIT is implictly performed afterwards.

In [19]:
conn.autocommit = True

We have seen several ways to fetch the results from a query.  We can use `.fetchone()`, or `.fetchmany()`, or `.fetchall()`.  We can also loop over the cursor object to bind each row, or using the same Python iterator protocol, call `next(cursor)`.

A similar capability is available for excucting statements.  In concept, this could be many SELECT queries, but more commonly, it is many INSERT or UPDATE commands.

In [20]:
now = datetime.now().isoformat()
users_more = [
    dict(username='Sybil', password='M7c&sd31&0hA', age=44, created_on=now),
    dict(username='Trudy', password='y9bD6SA2O%$t', age=22, created_on=now),
    dict(username='Vanna', password='9$Ts9HK*3!tR', age=55, created_on=now)
]
sql = """
INSERT INTO users (username, password, age, created_on) 
VALUES (%(username)s, %(password)s, %(age)s, %(created_on)s)
"""
cur.executemany(sql, users_more)

In practice, you probably want to catch exceptions and do conditional rollbacks and remediation around your `.executemany()` calls.  But we assume it succeeded, and was automatically committed.

Querying it again, we can explicitly ask for details on the columns returned by a query, and the number of them.

In [21]:
cur.execute('SELECT user_id, username, age FROM users;')
for item in cur.description:
    print(item)
cur.fetchall()

('user_id', 8, None, None, None, None, 0, 49703)
('username', 253, None, None, None, None, 0, 20485)
('age', 2, None, None, None, None, 1, 32768)


[(1, 'Alice', 37),
 (2, 'Bob', None),
 (3, 'Carlos', 62),
 (10, 'Sybil', 44),
 (11, 'Trudy', 22),
 (12, 'Vanna', 55)]

One thing to notice is that the SERIAL `user_id` column was incremented on the various failures that were not committed.  This makes sense since a unique sequential number has to be assigned before the server can know whether that transaction will be committed.

Depending on the particular adapter, the `cursor.rowcount` attribute will either return an advanced indicator of how much is available, or in the case of `mysql.connector`, only return how many were actually fetched.  E.g.:

In [22]:
cur.execute('SELECT username FROM users;')
for name in cur:
    print(cur.rowcount, name[0])

1 Alice
2 Bob
3 Carlos
4 Sybil
5 Trudy
6 Vanna


## Summary

To use adapters that follow the DB-API requires learning only a few fairly simple APIs, while offering flexibility at the Python level.  Once you have mastered that, everything else you really need to know is specific to MySQL as an RDBMS, and is accessed via SQL interfaces rather than Python functions or methods.