The goal here is to create the **Sakila database**. The files needed are placed on `sakila-db/`

* `sakila-schema.sql` - defines the database schema.
* `sakila-data.sql` - contains the data inserts.
* `sakila.mwb` - a MySQL Workbench model of the Sakila database. (i don't use this)

Those files were downloaded from here https://dev.mysql.com/doc/sakila/en/sakila-installation.html, and it also lists the steps required for installation. Instead of using that mysql cli tho, I'll be creating it from here directly. 

To make that work, I had to make some changes to the sakila-schema.sql and created the `sakila-schema-corrected.sql` (it just removes the delimiters, which aren't allowed with the `mysql-connector-python` driver i'm using). I also changed to docker compose to include a mysql database service, which I'll use... There's also a mariadb in there, if you want to test work with that instead (from my experience, the scripts won't work directly, and you'll need to disable some key constraints)

In [1]:
# install packages on the fly
# this shouldn't be necessary
# %pip install ipykernel ipython numpy pandas pyspark==3.3.4 jupyter findspark mysql-connector-python

In [2]:
import mysql.connector

In [3]:
# test the MySQL hostname resolution

# this will (probably?) fail
import socket
try:
    print(socket.gethostbyname("mysql"))
except Exception as e:
    print(f"Hostname resolution error: {e}")

Hostname resolution error: [Errno -3] Temporary failure in name resolution


for some reason, the hostname resolution does not work. 

The work-arround is to use the resolved IP address...

<div style="color:red;">you have to look this value up</div> - in my case it was 172.18.0.2

In [7]:
try:
    connection = mysql.connector.connect(
        # host='mysql',
        host='172.18.0.2',  # Use the resolved IP address
        user='root',
        password='rootpassword',
        database='sakila'
    )
    print("Connected to MySQL successfully.")
except mysql.connector.Error as err:
    print(f"Error: {err}")

Connected to MySQL successfully.


In [8]:
# Connect to the MySQL database
db_config = {
    # 'host': 'mysql',  # The name of the MySQL service in your Docker Compose file
    'host': '172.18.0.2',
    'user': 'root',
    'password': 'rootpassword',
    'database': 'sakila'
}

In [9]:
try:
    connection = mysql.connector.connect(**db_config)
    cursor = connection.cursor()

    # Run your SQL queries as needed...
    cursor.execute("SELECT DATABASE();")
    print(cursor.fetchone())

except mysql.connector.Error as err:
    print(f"Error: {err}")

finally:
    if connection.is_connected():
        cursor.close()
        connection.close()

('sakila',)


This is the useful part of the notebook

In [10]:
try:
    connection = mysql.connector.connect(**db_config)
    cursor = connection.cursor()

    # Run sakila-schema.sql
    with open('./sakila-db/sakila-schema-corrected.sql', 'r') as schema_file:
        schema_sql = schema_file.read()
        for result in cursor.execute(schema_sql, multi=True):
            # Consume all results
            print(result)

    connection.commit()  # Commit schema changes
    
    # Run sakila-data.sql
    with open('./sakila-db/sakila-data.sql', 'r') as data_file:
        data_sql = data_file.read()
        for result in cursor.execute(data_sql, multi=True):
            # Consume all results
            print(result)

    connection.commit()  # Commit data changes
    print("Sakila database loaded successfully.")

except mysql.connector.Error as err:
    print(f"Error: {err}")

finally:
    if connection.is_connected():
        cursor.close()
        connection.close()

CMySQLCursor: LOSS OF USE, DATA, OR
-- PROFITS
CMySQLCursor: OR BUSINESS INTERRUPTION) HOWEVER CAUSED..
CMySQLCursor: SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, ..
CMySQLCursor: SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KE..
CMySQLCursor: SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='..
CMySQLCursor: DROP SCHEMA IF EXISTS sakila
CMySQLCursor: CREATE SCHEMA sakila
CMySQLCursor: USE sakila
CMySQLCursor: SET @@default_storage_engine = 'MyISAM'
CMySQLCursor: /*!50610 SET @@default_storage_engine = ..
CMySQLCursor: CREATE TABLE film_text (
  film_id SMALL..
CMySQLCursor: SET @@default_storage_engine = @old_defa..
CMySQLCursor: END
CMySQLCursor: CREATE TRIGGER `upd_film` AFTER UPDATE O..
CMySQLCursor: END IF
CMySQLCursor: END
CMySQLCursor: CREATE TRIGGER `del_film` AFTER DELETE O..
CMySQLCursor: END
CMySQLCursor: DECLARE last_month_end DATE
CMySQLCursor: /* Some sanity checks... */
    IF min_m..
CMySQLCursor: LEAVE proc
CMySQLCursor: END IF
CMySQLCursor: IF min_dollar_amount_purchased = 0.00 TH..
CMySQL

In [11]:
try:
    connection = mysql.connector.connect(**db_config)
    cursor = connection.cursor()

    # Query to get table schema
    query = """
    SELECT
        COLUMNS.TABLE_NAME AS `Table Name`,
        COLUMNS.COLUMN_NAME AS `Column Name`,
        COLUMNS.COLUMN_TYPE AS `Data Type`,
        COLUMNS.IS_NULLABLE AS `Nullable`,
        COLUMNS.COLUMN_DEFAULT AS `Default`,
        COLUMNS.EXTRA AS `Extra`,
        KEY_COLUMN_USAGE.CONSTRAINT_NAME AS `Constraint Name`,
        KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME AS `Referenced Table`,
        KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME AS `Referenced Column`
    FROM
        INFORMATION_SCHEMA.COLUMNS
    LEFT JOIN
        INFORMATION_SCHEMA.KEY_COLUMN_USAGE
        ON COLUMNS.TABLE_NAME = KEY_COLUMN_USAGE.TABLE_NAME
        AND COLUMNS.COLUMN_NAME = KEY_COLUMN_USAGE.COLUMN_NAME
    WHERE
        COLUMNS.TABLE_SCHEMA = 'sakila'  -- Replace with your database name
    ORDER BY
        COLUMNS.TABLE_NAME, COLUMNS.COLUMN_NAME;
    """

    cursor.execute(query)

    # Fetch and print results
    for row in cursor.fetchall():
        print(row)

except mysql.connector.Error as err:
    print(f"Error: {err}")

finally:
    if connection.is_connected():
        cursor.close()
        connection.close()

('actor', 'actor_id', 'smallint unsigned', 'NO', None, 'auto_increment', 'PRIMARY', None, None)
('actor', 'first_name', 'varchar(45)', 'NO', None, '', None, None, None)
('actor', 'last_name', 'varchar(45)', 'NO', None, '', None, None, None)
('actor', 'last_update', 'timestamp', 'NO', 'CURRENT_TIMESTAMP', 'DEFAULT_GENERATED on update CURRENT_TIMESTAMP', None, None, None)
('actor_info', 'actor_id', 'smallint unsigned', 'NO', '0', '', None, None, None)
('actor_info', 'film_info', 'text', 'YES', None, '', None, None, None)
('actor_info', 'first_name', 'varchar(45)', 'NO', None, '', None, None, None)
('actor_info', 'last_name', 'varchar(45)', 'NO', None, '', None, None, None)
('address', 'address', 'varchar(50)', 'NO', None, '', None, None, None)
('address', 'address2', 'varchar(50)', 'YES', None, '', None, None, None)
('address', 'address_id', 'smallint unsigned', 'NO', None, 'auto_increment', 'PRIMARY', None, None)
('address', 'city_id', 'smallint unsigned', 'NO', None, '', 'fk_address_ci

In [12]:
try:
    connection = mysql.connector.connect(**db_config)
    cursor = connection.cursor()

    # Query to get simplified table schema
    query = """
    SELECT
        COLUMNS.TABLE_NAME AS `Table Name`,
        COLUMNS.COLUMN_NAME AS `Column Name`,
        COLUMNS.COLUMN_TYPE AS `Data Type`
    FROM
        INFORMATION_SCHEMA.COLUMNS
    WHERE
        COLUMNS.TABLE_SCHEMA = 'sakila'  -- Replace with your database name
    ORDER BY
        COLUMNS.TABLE_NAME, COLUMNS.ORDINAL_POSITION;
    """

    cursor.execute(query)

    # Group the results by table and format the output
    current_table = None
    for row in cursor.fetchall():
        table_name = row[0]
        column_name = row[1]
        data_type = row[2]

        # Check if we've moved to a new table
        if table_name != current_table:
            if current_table is not None:
                print()  # Print a blank line between tables
            print(f"Table: {table_name}")
            current_table = table_name

        print(f"  Column: {column_name} Type: {data_type}")

except mysql.connector.Error as err:
    print(f"Error: {err}")

finally:
    if connection.is_connected():
        cursor.close()
        connection.close()

Table: actor
  Column: actor_id Type: smallint unsigned
  Column: first_name Type: varchar(45)
  Column: last_name Type: varchar(45)
  Column: last_update Type: timestamp

Table: actor_info
  Column: actor_id Type: smallint unsigned
  Column: first_name Type: varchar(45)
  Column: last_name Type: varchar(45)
  Column: film_info Type: text

Table: address
  Column: address_id Type: smallint unsigned
  Column: address Type: varchar(50)
  Column: address2 Type: varchar(50)
  Column: district Type: varchar(20)
  Column: city_id Type: smallint unsigned
  Column: postal_code Type: varchar(10)
  Column: phone Type: varchar(20)
  Column: location Type: geometry
  Column: last_update Type: timestamp

Table: category
  Column: category_id Type: tinyint unsigned
  Column: name Type: varchar(25)
  Column: last_update Type: timestamp

Table: city
  Column: city_id Type: smallint unsigned
  Column: city Type: varchar(50)
  Column: country_id Type: smallint unsigned
  Column: last_update Type: timest