# Running SQL with the Db2 Data Management Console

This Jupyter Notebook contains examples of how to use the Open APIs to run SQL and use Visual Exaplain in the Db2 Data Management Console. 

### Import Helper Classes
For more information on these classes, see the Lab on Db2 Data Management Console Overview

In [53]:
%run ./dmc_setup.ipynb

### Db2 Data Management Console Connection
To connect to the Db2 Data Management Console service you need to provide the URL, the service name (v4) and profile the console user name and password as well as the name of the connection profile used in the console to connect to the database you want to work with. For this lab we are assuming that the following values are used for the connection:
* Userid: db2inst1
* Password: db2inst1
* Connection: sample

In [54]:
# Connect to the Db2 Data Management Console service
Console  = 'http://localhost:11080'
profile  = 'SAMPLE'
user     = 'DB2INST1'
password = 'db2inst1'

# Set up the required connection
profileURL = "?profile="+profile
databaseAPI = Db2(Console+'/dbapi/v4')
databaseAPI.authenticate(user, password, profile)
database = Console

### Confirm the connection
To confirm that your connection is working you can check the status of the moitoring service. You can also check your console connection to get the details of the specific database connection you are working with. Since your console user id and password are may be limited as to which databases they can access you need to provide the connection profile name to drill down on any detailed information for the database.

In [57]:
# List Monitoring Profile
r = databaseAPI.getProfile(profile)
json = databaseAPI.getJSON(r)
print(json)

{'name': 'SAMPLE', 'disableDataCollection': 'false', 'databaseVersion': '11.5.0', 'databaseName': 'SAMPLE', 'timeZone': '-50000', 'DB2Instance': 'db2inst1', 'db2license': 'AESE,DEC', 'isInstPureScale': 'false', 'databaseVersion_VRMF': '11.5.0.0', 'sslConnection': 'false', 'userProfileRole': 'OWNER', 'isCredentialsValid': 'true', 'timeZoneDiff': '0', 'host': 'localhost', '_PROFILE_INIT_': 'true', 'dataServerType': 'DB2LUW', 'port': '50000', 'URL': 'jdbc:db2://localhost:50000/SAMPLE', 'edition': 'AESE,DEC', 'isInstPartitionable': 'false', 'dataServerExternalType': 'DB2LUW', 'capabilities': '["DSM_ENTERPRISE_LUW"]', 'OSType': 'Linux', 'location': ''}


### Running SQL
You can use the console API to run single SQL statement or sets of statements in a single call. In the following example we run three statements in a single API call. When the statement is started we get a run handle we can use to access the results of the SQL execution. This example runs each statement five times. 

In [58]:
#Run SQL Statements
runtimes = 5
sqlList = ['select SYSIBM.SYSTABLES.* from SYSIBM.SYSTABLES, SYSIBM.SYSCOLUMNS,SYSIBM.SYSINDEXES,SYSIBM.SYSVIEWS,SYSIBM.SYSVIEWDEP,SYSIBM.SYSPLAN,SYSIBM.SYSPLANDEP,SYSIBM.SYSSTMT,SYSIBM.SYSPLANAUTH',
           'select * from SYSIBM.SYSINDEXES','select * from syscat.tables', 
           'Select * from syscat.indexes']
runIndex = []
for x in range(0, runtimes):
    for sqlText in sqlList:
        r = databaseAPI.runSQL(sqlText)
        statusCode = databaseAPI.getStatusCode(r);
        if (databaseAPI.getStatusCode(r)==201):
            print(databaseAPI.getJSON(r)['id']+': running')
            runIndex.append(databaseAPI.getJSON(r)['id'])     
print('Done')

1574793472823_180266582: running
1574793473265_404614086: running
1574793473423_1795940241: running
1574793473582_1645737640: running
1574793473719_1153904568: running
1574793474367_186070770: running
1574793474551_75098522: running
1574793474722_1987475525: running
1574793474881_1185956597: running
1574793475331_780686608: running
1574793475496_1360113948: running
1574793475652_1830756861: running
1574793475836_1115375890: running
1574793476598_287610537: running
1574793476746_1849704723: running
1574793476913_477834377: running
1574793477145_973890715: running
1574793478592_1311322513: running
1574793478769_769393703: running
1574793478933_1848881579: running
Done


You can then cycle through the list of run handles and collect the results of each run.

In [59]:
#Collect the run results of each statement into an array
indexLength = len(runIndex)
jsonArray = []
for x in range(0, indexLength ):
    r = databaseAPI.getSQLJobResult(runIndex[x])
    jsonArray.append(databaseAPI.getJSON(r))

For each statement we collect the runtime can convert to ms. We can then collect a dataframe to show how each statement performs over several iterations. 

In [60]:
resultsDF = pd.DataFrame(columns=['SQL Text', 'Runtime'])
for x in range(0, indexLength):
    runtime = jsonArray[x]['results'][0]['runtime_seconds'] * 1000
    sql = jsonArray[x]['results'][0]['command']
    s = pd.Series([sql,runtime], index=['SQL Text','Runtime'])
    resultsDF = resultsDF.append(s, ignore_index=True)    
display(resultsDF.sort_values(by=['SQL Text','Runtime']))

Unnamed: 0,SQL Text,Runtime
7,Select * from syscat.indexes,20.0
3,Select * from syscat.indexes,26.000001
19,Select * from syscat.indexes,35.0
11,Select * from syscat.indexes,39.000001
15,Select * from syscat.indexes,41.999999
1,select * from SYSIBM.SYSINDEXES,6.0
5,select * from SYSIBM.SYSINDEXES,6.0
9,select * from SYSIBM.SYSINDEXES,7.0
13,select * from SYSIBM.SYSINDEXES,7.0
17,select * from SYSIBM.SYSINDEXES,15.0


We can then collect the most recent package cache information. 

In [61]:
# Retrieve the current package cache list 
# Show the first ten as sorted by the statement execution time
r = databaseAPI.getCurrentPackageCacheList("false")
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    if json['count'] > 0:  
        df = pd.DataFrame(json_normalize(json['resources']))
        print('Available Data Frame Columns')
        print(', '.join(list(df)))
        df = df.sort_values(by='stmt_exec_time_ms', ascending=False)
        display(df[['stmt_text','stmt_exec_time_ms','stmtid']].head(100))
    else: 
        print('No data returned')  
else:
    print(databaseAPI.getStatusCode(r))

Available Data Frame Columns
timestamp, sql_hash_id, num_exec_with_metrics, stmtid, planid, semantic_env_id, stmt_text, estimated_sort_shrheap_top, sort_shrheap_top, sort_overflows, hash_join_overflows, hash_grpby_overflows, olap_func_overflows, col_vector_consumer_overflows, post_threshold_sorts, post_threshold_hash_joins, post_threshold_olap_funcs, post_threshold_hash_grpbys, post_threshold_col_vector_consumers, pool_writes, total_cpu_time_ms, rows_read, rows_returned, logical_reads, physical_reads, temp_reads, pool_data_l_reads, pool_index_l_reads, lock_escals, stmt_type_id, fed_rows_deleted, fed_rows_inserted, fed_rows_updated, fed_rows_read, fed_waits_total, lock_waits, rows_modified, ext_table_read_volume_kb, ext_table_write_volume_kb, ext_table_send_volume_kb, ext_table_recv_volume_kb, sql_text_summary, coord_stmt_exec_time_ms, estimated_runtime_ms, pool_read_time_ms, pool_write_time_ms, prefetch_wait_time_ms, stmt_exec_time_ms, total_act_wait_time_ms, lock_wait_time_ms, fed_wai

Unnamed: 0,stmt_text,stmt_exec_time_ms,stmtid
0,"call SYSIBM.SQLCAMESSAGECCSID(?,?,?,?,?,?,?,?,...",1,8110287721056492873


Using one statement we can compare the statements we ran with the contents of the package cache to just see the performance of the statements we ran and filter out any other statements. 

In [62]:
display(df[['stmt_text','stmt_exec_time_ms','stmtid']].loc[df['stmt_text'].isin(sqlList)])

Unnamed: 0,stmt_text,stmt_exec_time_ms,stmtid


You can always also use the console user interface to see the latest statements in the package cache.

In [26]:
IFrame(database+'/console/?mode=compact#monitor/package_cache'+profileURL, width=1400, height=480)

### Analyzing Statements
You can use both the microservices built into the console as well as the monitoring APIs to anaylze the performance of single SQL statement or identify statement that need your attention. You can visually explain any statement by calling the explain/create service and imbed the interactive interface in any IFrame. **Note:** This may take a minute to run.

In [None]:
# Visually explain the access plan for an SQL Statement
SQLStatement = 'select * from syscat.tables'
IFrame(database+'/console/?mode=compact#sql/explain/create/'+SQLStatement+profileURL, width=1400, height=480)

You can also track which statements are running right now using either the microservice User Interface or a direct API call.

In [None]:
IFrame(database+'/console/?mode=compact#monitor/inflight_executions'+profileURL, width=1400, height=360)

In [None]:
# Retrieve the current statements running now
# Display the top 10 by execution time
r = databaseAPI.getInflightCurrentList()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    if json['count'] > 0:
        df = pd.DataFrame(json_normalize(json['resources']))
        print('Columns')
        print(', '.join(list(df)))
        df = df.sort_values(by='exec_time_ms', ascending=False)
        display(df[['application_name','stmt_text','exec_time_ms','estimated_runtime_ms']].head(10))
    else:
        print('No data returned')
else:
    code = databaseAPI.getStatusCode(r)
    databaseAPI.printResponse(r, code)

If you have individual statement monitoring enabled you can see a list of all the statements that ran on your system either through a micro service UI or an API call.

In [None]:
IFrame(database+'/console/?mode=compact#monitor/individual'+profileURL, width=1400, height=480)

In [None]:
# Retrieve the statements that ran over the last day.
# Retrieve the first 10000 statements
# Show the top ten by Total CPU TIME
startTimeMinusWeek = endTime - oneDay
r = databaseAPI.getIndividualStatementExecution(startTimeMinusWeek, endTime, 10000)
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    if json['count'] > 0:     
        df = pd.DataFrame(json_normalize(json['resources']))
        print('Columns')
        print(', '.join(list(df)))
        df = df.sort_values(by='total_cpu_time', ascending=False)
        display(df[['sql_text','total_cpu_time','wlm_queue_time_total','stmt_exec_time']].head(10))
    else: 
        print('No data returned')  
else:
    code = databaseAPI.getStatusCode(r)
    databaseAPI.printResponse(r, code)

In one last example you can take a higher level look at the applications connected to your database to see which application is driving the most work.

In [None]:
IFrame(database+'/console/?mode=compact#monitor/connections'+profileURL, width=1400, height=480)

In [None]:
# Display the 10 most recently started Database Connections
r = databaseAPI.getCurrentApplicationsConnections()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    if json['count'] > 0: 
        df = pd.DataFrame(json_normalize(json['resources']))
        print(', '.join(list(df)))
        df = df.sort_values(by='connection_start_time', ascending=False)
        df['connection_start_time'] = df['connection_start_time'].apply(epochtotimeseries)
        display(df[['application_name','application_handle','connection_start_time']].head(10))
    else: 
        print('No data returned')  
else:
    print(databaseAPI.getStatusCode(r))

#### Credits: IBM 2019, Peter Kohlmann [kohlmann@ca.ibm.com]