# RTSMADlib Demo

## Logistic Regression Model Sample and Model deployment

In this sample we demonstrate how to deploy a Apache MADlib model using RTSMADlib. We will use a simple supervised learning model Logistic regression sample explained on https://madlib.apache.org/docs/latest/group__grp__logreg.html.


### Pre-Requisites
1. A running instance of Greenplum with MADlib
2. Modify database connection parameters whe we see place holders
3. A runing insance of kubernetes environment

### The notebook perform the below tasks

1. Connect to greenplum and setup the session
2. Create the schema and create all the tables needed, Load test the data to Greenpulm
3. Build and train model
4. Test Batch Score the model .
5. Operationalize model with RTSMADlib
6. Test Model REST Service
7. Undeploy the Model container


### Create SQL Connection to Greenplum

In [None]:
import psycopg2               # Python-PostgreSQL Database Adapter - https://pypi.python.org/pypi/psycopg2
import pandas as pd           # Python Data Analysis Library - https://pandas.pydata.org/
import math  
import json

%load_ext sql

# PLEASE MODIFY THE BELOW AS PER YOUR GREENPLUM CLUSTER SETTINGS
database_host = '35.196.46.152'
database_databasename = 'gpadmin'
database_username = 'gpadmin'
database_password = 'qg7lGPyhxEQmj'
database_port = '5432'

try:
    connString = "host='{}' dbname='{}' user='{}' password='{}' port={}".format(database_host,database_databasename,database_username,database_password,database_port)
    # print connString
    conn = psycopg2.connect(connString)
    cur = conn.cursor()
    conn.autocommit = True
        
    connection = 'postgresql://{}:{}@{}:{}/{}'.format(database_username,database_password,database_host,database_port,database_databasename)
    %sql $connection

    message = "<span style='color:green'>**Connection successful!**</span>"
    print(message)
except Exception as e:
    message = "<span style='color:red'>**ERROR: Unable to connect to the database ({})**</span>".format(e)
    print(message) 

### Create Schema

In [None]:
%%sql

DO $$
BEGIN

    IF NOT EXISTS(
        SELECT schema_name
          FROM information_schema.schemata
          WHERE schema_name = 'madlib_demo'
      )
    THEN
      EXECUTE 'CREATE SCHEMA madlib_demo';
    END IF;

END
$$;


DROP TABLE IF EXISTS madlib_demo.patients;

CREATE TABLE madlib_demo.patients( id INTEGER NOT NULL,
                       second_attack INTEGER,
                       treatment INTEGER,
                       trait_anxiety INTEGER)
DISTRIBUTED RANDOMLY;
                      

### Generate some test data.

In [None]:
%%sql
INSERT INTO madlib_demo.patients VALUES
(1,  1, 1, 70),
(2,  1, 1, 80),
(3,  1, 1, 50),
(4,  1, 0, 60),
(5,  1, 0, 40),
(6,  1, 0, 65),
(7,  1, 0, 75),
(8,  1, 0, 80),
(9,  1, 0, 70),
(10, 1, 0, 60),
(11, 0, 1, 65),
(12, 0, 1, 50),
(13, 0, 1, 45),
(14, 0, 1, 35),
(15, 0, 1, 40),
(16, 0, 1, 50),
(17, 0, 0, 55),
(18, 0, 0, 45),
(19, 0, 0, 50),
(20, 0, 0, 60);

### Train a regression model

In [None]:
%%sql

DROP TABLE IF EXISTS madlib_demo.patients_logregr;
DROP TABLE IF EXISTS madlib_demo.patients_logregr_summary;

SELECT madlib.logregr_train( 'madlib_demo.patients',                 -- Source table
                             'madlib_demo.patients_logregr',         -- Output table
                             'second_attack',                        -- Dependent variable
                             'ARRAY[1, treatment, trait_anxiety]',   -- Feature vector
                             NULL,                                   -- Grouping
                             20,                                     -- Max iterations
                             'irls'                                  -- Optimizer to use
                           );

SELECT * from madlib_demo.patients_logregr;

In [None]:
%%sql
SELECT unnest(array['intercept', 'treatment', 'trait_anxiety']) as attribute,
       unnest(coef) as coefficient,
       unnest(std_err) as standard_error,
       unnest(z_stats) as z_stat,
       unnest(p_values) as pvalue,
       unnest(odds_ratios) as odds_ratio
    FROM madlib_demo.patients_logregr;

### Predict the dependent variable using the logistic regression model. 
(This example uses the original data table to perform the prediction. Typically a different test dataset with the same features as the original training dataset would be used for prediction.)

In [None]:
%%sql
-- Display prediction value along with the original value
SELECT p.id, madlib.logregr_predict(coef, ARRAY[1, treatment, trait_anxiety]),
       p.second_attack::BOOLEAN
FROM madlib_demo.patients p, madlib_demo.patients_logregr m
ORDER BY p.id;

### Predict the probability of the dependent variable being TRUE.

In [None]:
%%sql
SELECT p.id, madlib.logregr_predict_prob(coef, ARRAY[1, treatment, trait_anxiety]),
       p.second_attack::BOOLEAN
FROM madlib_demo.patients p, madlib_demo.patients_logregr m
ORDER BY p.id;

# RTSMADlib

## Operationalize the  model 

The MADlib model from Greenplum is containerized and deployed container management system. In this case we are using local docker environment. The rtsmadlib tool will take care of how to bundle, deploy and serve the model as REST endpoint.

In [None]:
! source ~/.bash_profile
! rts4madlib --help

# Deployment manifest of Model

In [None]:
import json

myconfig=json.dumps ({
	"modeldb-datasource.jdbc-url": "jdbc:postgresql://35.196.46.152:5432/gpadmin",
    "modeldb-datasource.userName": "gpadmin",
    "modeldb-datasource.password": "qg7lGPyhxEQmj",
    "madlibrest.modelname": "patients_data_Logistic_Regression",
    "madlibrest.modeldescription": "Logistic Regression model predicting the patiens health.",
    "madlibrest.modelschema": "madlib_demo",
    "madlibrest.modeltables": ["patients_logregr"],
    "madlibrest.modelinputtable": "patients",
    "madlibrest.modelquery": "SELECT madlib.logregr_predict(coef, ARRAY[1, treatment, trait_anxiety]) ,  madlib.logregr_predict_prob(coef, ARRAY[1, treatment, trait_anxiety]) FROM madlib_demo.patients p, madlib_demo.patients_logregr"
    }
)


with open("model-config.json", "w") as f:
    f.write(myconfig)
    

# Deploy

In [None]:
! source ~/.bash_profile
! rts4madlib --name patientslrm --action deploy --type madlib-model --target kubernetes --inputJson model-config.json

In [None]:
! kubectl get all

# Testing - RTSMADlib container
The log files of deployment should show the service end points container. We use the endpoint to test. The below tests the information end point on the model container.

In [None]:
! curl -v -H "Content-Type:application/json" http://35.227.34.60:8085/actuator/info

In [None]:
! curl -v -H "Content-Type:application/json" http://35.227.34.60:8085/predict -d '{ "treatment": 1, "trait_anxiety": 70}'

# Undeploy Model

In [None]:
! source ~/.bash_profile
! rts4madlib --name patientslrm --action undeploy --type madlib-model --target kubernetes

In [None]:
! kubectl get all 

# Thanks You!