# IntegratedML applied to biomedical data
## Using InterSystems IRIS DB-API Driver
This notebook demonstrates the following:
- Connecting to InterSystems IRIS via DB-API driver
- Creating, Training and Executing (PREDICT() function) an IntegratedML machine learning model, applied to breast cancer tumor diagnoses
- INSERTING machine learning predictions into a new SQL table
- Executing a relatively complex SQL query containing IntegratedML PREDICT() and PROBABILITY() functions, and flexibly using the results to filter and sort the output

### InterSystems IRIS Python Driver Resources
The `intersystems-irispython` package provides native Python connectivity to InterSystems IRIS.

Documentation:
- https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=BPYNAT_pyapi
- https://pypi.org/project/intersystems-irispython/

In [1]:
# make the notebook full screen
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

### 1. No additional system packages needed with DB-API driver
The `intersystems-irispython` driver is pre-installed in the container.

In [None]:
# No apt-get or pip installs needed - using DB-API driver

### Connection Setup
No ODBC configuration needed with DB-API driver.

In [None]:
# DB-API driver connection - no ODBC setup needed

In [None]:
# Skip ODBC config file setup

In [None]:
# No ODBC ini files needed

In [None]:
# ODBC not used

In [None]:
# ODBC not used

In [None]:
# ODBC not used

### 2. Verify DB-API driver is available

In [None]:
import iris
print(f"InterSystems IRIS Python driver version: {iris.__version__}")

### 3. Get a connection using DB-API driver

In [None]:
import iris
import time

# Connection configuration
connection_string = "irisimlsvr:1972/USER"
username = "SUPERUSER"
password = "SYS"

# Establish connection using DB-API
cnxn = iris.connect(connection_string, username, password)

### 4. Get a cursor; start the timer

In [None]:
cursor = cnxn.cursor()
start = time.perf_counter()

### 5. Specify the training data, and give a model name

In [12]:
dataTable = 'Biomedical.BreastCancer'
dataTablePredict = 'Result02'
dataColumn =  'Diagnosis'
dataColumnPredict = "PredictedDiagnosis"
modelName = "bc" #chose a name - must be unique in server end

 ### Cleaning before retrying

In [13]:
#If we re-run the notebook just drop model and table
#cursor.execute("DROP MODEL %s" % modelName)
#cursor.execute("DROP TABLE %s" % dataTablePredict)

### 6. Train and predict

In [14]:
cursor.execute("CREATE MODEL %s PREDICTING (%s)  FROM %s" % (modelName, dataColumn, dataTable))
cursor.execute("TRAIN MODEL %s FROM %s" % (modelName, dataTable))
cursor.execute("Create Table %s (%s VARCHAR(100), %s VARCHAR(100))" % (dataTablePredict, dataColumnPredict, dataColumn))
cursor.execute("INSERT INTO %s  SELECT TOP 20 PREDICT(%s) AS %s, %s FROM %s" % (dataTablePredict, modelName, dataColumnPredict, dataColumn, dataTable)) 
cnxn.commit()

### 7. Show the predict result

In [15]:
import pandas as pd
from IPython.display import display

df1 = pd.read_sql("SELECT * from %s ORDER BY ID" % dataTablePredict, cnxn)
display(df1)

Unnamed: 0,PredictedDiagnosis,Diagnosis
0,M,M
1,M,M
2,M,M
3,M,M
4,M,M
5,M,M
6,M,M
7,M,M
8,M,M
9,M,M


### 8. Show a complicated query
IntegratedML function PREDICT() and PROBABILITY() can appear virtually anywhere in a SQL query, for maximal flexibility!
Below we are SELECTing columns as well as the result of the PROBABILITY function, and then filtering on the result of the PREDICT function. To top it off, ORDER BY is using the output of PROBSBILITY for sorting.

In [16]:
df2 = pd.read_sql("SELECT ID, PROBABILITY(bc FOR 'M') AS Probability, Diagnosis FROM %s \
                    WHERE Mean_Area BETWEEN 300 AND 600 AND Mean_Radius > 5 AND PREDICT(%s) = 'M' \
                    ORDER BY Probability" % (dataTable, modelName),cnxn)         
display(df2)

Unnamed: 0,ID,Probability,diagnosis
0,74,0.508227,M
1,298,0.675269,M
2,216,0.863261,M
3,42,0.955022,M
4,147,0.96117,M
5,101,0.994392,M
6,45,0.99522,M
7,6,0.995779,M
8,40,0.99636,M
9,194,0.998938,M


### 9. Close and clean 

In [None]:
cnxn.close()
end = time.perf_counter()
print(f"Total elapsed time: {end - start:.3f} seconds")