# üìä Automating the Collection of EXPLAIN Plans and Runtime Metrics in Db2 LUW
This notebook automates the process of running a SQL query in Db2 and capturing its execution details using EXPLAIN tables and activity event monitors. The workflow includes:

1. **Setup of Activity Event Monitor:** Configures an event monitor in Db2 to collect runtime metrics such as execution time and resource usage during query execution.
2. **Creation of EXPLAIN Tables:** Sets up EXPLAIN tables in Db2 to store detailed execution plans for SQL queries.
3. **Query Execution:** Runs a SQL query to gather execution metrics and analyze its performance.
4. **EXPLAIN Plan Generation:** Captures the execution plan of the query to understand how Db2 will process the operation.
5. **Exporting Data:** Exports the collected EXPLAIN and activity monitor data as CSV files for external analysis or reporting.

For detailed setup instructions and guidance on running this notebook, refer to the [README.md](./README.md) file in the same directory.

This notebook provides a fully automated approach for analyzing query performance and gathering runtime metrics, helping database administrators and developers optimize their queries effectively.


In [1]:
import os
from dotenv import dotenv_values
import json
import shutil
import pandas as pd
import papermill as pm
import re

In [2]:
# Set display options to show the full content in the 'sql' column
pd.set_option('display.max_colwidth', None)  # Show full column width without truncation
pd.set_option('display.expand_frame_repr', False)  # Prevent DataFrame from wrapping across lines

# üîç Enter the Filename of a List of SQL Queries - each query needs to be in a single line

In [3]:
# Define the input SQL file
input_file = "queries.sql"  # Replace with your actual file name

with open(input_file, "r") as file:
    sql_queries = []
    for line in file:
        # Strip whitespace and remove trailing semicolons
        query = line.strip().rstrip(';')
        if query:  # Skip empty lines
            # Remove schema names (pattern: schema_name.table_name)
            query = re.sub(r'\b\w+\.', '', query)
            sql_queries.append(query)

df_queries = pd.DataFrame({
    "queryid": range(1, len(sql_queries) + 1),
    "sql": sql_queries
})

In [4]:
df_queries

Unnamed: 0,queryid,sql
0,1,"SELECT SS_WHOLESALE_COST , SR_NET_LOSS FROM STORE_SALES INNER JOIN STORE_RETURNS ON SR_TICKET_NUMBER = SS_TICKET_NUMBER AND SR_ITEM_SK = SS_ITEM_SK WHERE SS_STORE_SK = 2 AND SR_RETURNED_DATE_SK = 2451680"
1,2,"SELECT CR_RETURNING_ADDR_SK , CS_SALES_PRICE FROM CATALOG_RETURNS INNER JOIN CATALOG_SALES ON CS_ORDER_NUMBER = CR_ORDER_NUMBER AND CS_ITEM_SK = CR_ITEM_SK WHERE CS_NET_PAID <= +68 AND CR_REFUNDED_HDEMO_SK <= 939"
2,3,"SELECT SS_EXT_LIST_PRICE , SR_STORE_CREDIT FROM STORE_SALES INNER JOIN STORE_RETURNS ON SR_CUSTOMER_SK = SS_CUSTOMER_SK AND SR_ITEM_SK = SS_ITEM_SK WHERE SS_ADDR_SK >= 8970 AND SR_CUSTOMER_SK <= 93014"
3,4,"SELECT SR_ADDR_SK, SR_CDEMO_SK, SR_RETURN_AMT, SR_RETURN_AMT_INC_TAX, SR_RETURN_TIME_SK, SR_REVERSED_CHARGE, SR_STORE_CREDIT, SS_EXT_DISCOUNT_AMT, SS_SOLD_DATE_SK, SS_LIST_PRICE, SS_QUANTITY, SS_SALES_PRICE, SS_SOLD_TIME_SK, SS_WHOLESALE_COST FROM STORE_RETURNS JOIN STORE_SALES ON SR_ITEM_SK = SS_ITEM_SK AND SR_CUSTOMER_SK = SS_CUSTOMER_SK AND SS_TICKET_NUMBER = SR_TICKET_NUMBER"


In [5]:
# Define the top-level output directory
top_level_dir = os.path.join(os.getcwd(), "output")

# If the top-level directory exists, delete it and recreate
if os.path.exists(top_level_dir):
    shutil.rmtree(top_level_dir)
os.makedirs(top_level_dir)

In [6]:
# Define the output notebooks directory
output_notebooks_dir = os.path.join(os.getcwd(), "output_notebooks")

# If the directory exists, delete and recreate it
if os.path.exists(output_notebooks_dir):
    shutil.rmtree(output_notebooks_dir)
os.makedirs(output_notebooks_dir)

Loading Db2 Magic Commands Notebook Extension

In [7]:
# Enable Db2 Magic Commands Extensions for Jupyter Notebook
if not os.path.isfile('db2.ipynb'):
    os.system('wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb')
%run db2.ipynb

  firstCommand = "(?:^\s*)([a-zA-Z]+)(?:\s+.*|$)"
  pattern = "\?\*[0-9]+"


Db2 Extensions Loaded. Version: 2024-09-16


Connect to Db2

In [8]:
db2creds = dotenv_values('.env')
%sql CONNECT CREDENTIALS db2creds

Connection successful. tpcds @ localhost 


In [9]:
%sql CALL ADMIN_CMD("UPDATE DATABASE CONFIGURATION USING SECTION_ACTUALS BASE")

b. Deactivate event monitor, `ACTEVMON`

In [10]:
%sql SET EVENT MONITOR ACTEVMON STATE 0

Command completed.


b. Drop Existing Tables for event monitor `ACTEVMON`

In [11]:
%%sql -q
DROP TABLE ACTIVITYMETRICS_ACTEVMON;
DROP TABLE ACTIVITYSTMT_ACTEVMON;
DROP TABLE ACTIVITYVALS_ACTEVMON;
DROP TABLE ACTIVITY_ACTEVMON;
DROP TABLE CONTROL_ACTEVMON;

c. Install Explain Tables

In [12]:
%%sql
CALL SYSINSTALLOBJECTS('EXPLAIN', 'D', NULL, 'DB2INST1');
CALL SYSINSTALLOBJECTS('EXPLAIN', 'C', NULL, 'DB2INST1');

Command completed.


In [13]:
%sql CREATE THRESHOLD MAXDBACTIVITYTIME FOR DATABASE ACTIVITIES ENFORCEMENT DATABASE WHEN ACTIVITYTOTALTIME > 10 SECONDS STOP EXECUTION

Command completed.


d. Alter Workload to Collect Activity Data


In [14]:
%sql ALTER WORKLOAD SYSDEFAULTUSERWORKLOAD COLLECT ACTIVITY DATA ON ALL WITH DETAILS, SECTION

Command completed.


e. Drop and re-create a New Event Monitor, `ACTEVMON`

In [15]:
%%sql 
DROP EVENT MONITOR ACTEVMON;
CREATE EVENT MONITOR ACTEVMON FOR ACTIVITIES WRITE TO TABLE;

Command completed.


In [16]:
# Step 2: Loop through DataFrame and call `single-query.ipynb`
for index, row in df_queries.iterrows():
    query_id = f"query{row['queryid']}"  # Convert queryid to "query1", "query2", etc.
    sql = row['sql']

    # Define the output path for the executed notebook
    output_notebook_path = os.path.join(output_notebooks_dir, f"{query_id}.ipynb")

    # Execute the notebook with parameters
    pm.execute_notebook(
        'single-query.ipynb',       # Path to the target notebook
        output_notebook_path,       # Path for the output notebook
        parameters={
            'queryid': query_id,
            'sql_statement': sql
        }
    )

    print(f"Executed notebook for {query_id} and saved to {output_notebook_path}")

Executing:   0%|          | 0/24 [00:00<?, ?cell/s]

DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2019
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connect to Database Successful.
Output is in /home/db2inst1/db2-labs/explain/batch-queries/output/2025-02-24-19.54.16.324073/explain.out.
Executing Connect Reset -- Connect Reset was Successful.


Connecting to the Database.
Executed notebook for query1 and saved to /home/db2inst1/db2-labs/explain/batch-queries/output_notebooks/query1.ipynb


Executing:   0%|          | 0/24 [00:00<?, ?cell/s]

DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2019
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connect to Database Successful.
Output is in /home/db2inst1/db2-labs/explain/batch-queries/output/2025-02-24-19.54.31.672438/explain.out.
Executing Connect Reset -- Connect Reset was Successful.


Connecting to the Database.
Executed notebook for query2 and saved to /home/db2inst1/db2-labs/explain/batch-queries/output_notebooks/query2.ipynb


Executing:   0%|          | 0/24 [00:00<?, ?cell/s]

DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2019
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connect to Database Successful.
Output is in /home/db2inst1/db2-labs/explain/batch-queries/output/2025-02-24-19.54.49.677516/explain.out.
Executing Connect Reset -- Connect Reset was Successful.


Connecting to the Database.
Executed notebook for query3 and saved to /home/db2inst1/db2-labs/explain/batch-queries/output_notebooks/query3.ipynb


Executing:   0%|          | 0/24 [00:00<?, ?cell/s]

DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2019
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connect to Database Successful.
Output is in /home/db2inst1/db2-labs/explain/batch-queries/output/2025-02-24-19.55.07.396569/explain.out.
Executing Connect Reset -- Connect Reset was Successful.


Connecting to the Database.
Executed notebook for query4 and saved to /home/db2inst1/db2-labs/explain/batch-queries/output_notebooks/query4.ipynb


In [17]:
%sql DROP THRESHOLD MAXDBACTIVITYTIME

Command completed.


In [19]:
%sql CONNECT RESET

Connection closed.
