# Deployment

We will be loading some data, performing prediction, saving the model, and pushing predictions onto a Microsoft SQL Server.

A Microsoft SQL Server or SQLite is required, setup can be found here: https://github.com/HealthCatalyst/healthcareai-py/blob/master/docs/deploy.md

If you are using the Microsoft SQL Server, then the docker container must be up and running, you can check the status of the docker container with the following command: `sudo docker ps`

## Step 1: Create a model

In [1]:
import sys
import healthcareai

import pandas as pd

from healthcareai.supervised_model_trainer import SupervisedModelTrainer
import healthcareai.common.database_connections as hcai_db
import healthcareai.datasets as hcai_datasets
import healthcareai.common.file_io_utilities as file_io_utilities
import healthcareai.common.csv_loader as csv_loader

Load diabetes dataset

In [2]:
dataframe = hcai_datasets.load_diabetes()

We want to predict the SystolicBPNBR, and remove PatientID

In [3]:
dataframe.head()

Unnamed: 0,PatientEncounterID,PatientID,SystolicBPNBR,LDLNBR,A1CNBR,GenderFLG,ThirtyDayReadmitFLG
0,1,10001,167.0,195.0,4.2,M,N
1,2,10001,153.0,214.0,5.0,M,N
2,3,10001,170.0,191.0,4.0,M,N
3,4,10002,187.0,135.0,4.4,M,N
4,5,10002,188.0,125.0,4.3,M,N


In [4]:
dataframe.drop(['PatientID'], axis=1, inplace=True)

In order to train a classifier, you would need to use `SupervisedModelTrainer`, and it allows inputs parameters which can be found at: http://healthcareai-py.readthedocs.io/en/v1.0/training/#what-is-supervisedmodeltrainer

In [5]:
regression_trainer = SupervisedModelTrainer(
    dataframe=dataframe,
    predicted_column='SystolicBPNBR',
    model_type='regression',
    grain_column='PatientEncounterID',
    impute=False,
    verbose=False)

The `SupervisedModelTrainer` has a pipeline that can clean the dataset, such as imputing values. After the data is cleaned, you can view the data with the `clean_dataframe.head()` function call.

In [6]:
print(regression_trainer.clean_dataframe.head())

   SystolicBPNBR  LDLNBR  A1CNBR  GenderFLG.M  ThirtyDayReadmitFLG.Y
0          167.0   195.0     4.2            1                      0
1          153.0   214.0     5.0            1                      0
2          170.0   191.0     4.0            1                      0
3          187.0   135.0     4.4            1                      0
4          188.0   125.0     4.3            1                      0


Train a linear regression model on the dataset.

In [7]:
trained_linear_model = regression_trainer.linear_regression()


Training Linear Regression
LinearRegression Training Results:
- Training time:
    Trained the LinearRegression model in 0.01 seconds
- Best hyperparameters found were:
    N/A: No hyperparameter search was performed
- LinearRegression performance metrics:
    Mean Squared Error (MSE): 650.0561148063488
    Mean Absolute Error (MAE): 21.69631005614234


You can make predictions on a dataframe that contains the same columns as the dataframe used to train the model.

In [8]:
trained_linear_model.make_predictions(dataframe).head()

Unnamed: 0,PatientEncounterID,Prediction
0,1,150.160964
1,2,149.312638
2,3,150.366739
3,4,150.493354
4,5,150.663512


The `create_catalyst_dataframe` can create a dataframe that will be pushed to the Microsoft SQL server.

In [9]:
trained_linear_model.create_catalyst_dataframe(dataframe).head()

Unnamed: 0,PatientEncounterID,Factor1TXT,Factor2TXT,Factor3TXT,Prediction,BindingID,BindingNM,LastLoadDTS
0,1,A1CNBR,LDLNBR,GenderFLG.M,150.160964,0,Python,2017-08-21 18:37:47.419
1,2,A1CNBR,LDLNBR,GenderFLG.M,149.312638,0,Python,2017-08-21 18:37:47.419
2,3,A1CNBR,LDLNBR,GenderFLG.M,150.366739,0,Python,2017-08-21 18:37:47.419
3,4,A1CNBR,LDLNBR,GenderFLG.M,150.493354,0,Python,2017-08-21 18:37:47.419
4,5,A1CNBR,LDLNBR,GenderFLG.M,150.663512,0,Python,2017-08-21 18:37:47.419


You can save the model as a file, and then reload it later on

In [10]:
trained_linear_model.save()

Trained LinearRegression model saved as 2017-08-21T12-37-47_regression_LinearRegression.pkl


In [11]:
trained_model = file_io_utilities.load_saved_model('2017-08-21T12-03-09_regression_LinearRegression.pkl')

Trained model loaded from file: 2017-08-21T12-03-09_regression_LinearRegression.pkl
    Type: <class 'healthcareai.trained_models.trained_supervised_model.TrainedSupervisedModel'>
    Model type: <class 'sklearn.linear_model.base.LinearRegression'>


## Step 2: Push the predictions onto a database

You can push the predictions onto a database with the `predict_to_catalyst_sam` function. If you provide `secure_connection` as True, you won't need to provide your username and password since the database will authenticate you with windows credentials.

In [12]:
trained_model.predict_to_catalyst_sam(dataframe, server="192.168.86.123", database="SAM", table="HCAIPredictionRegressionBASE", userid="sa", password="yourStrongPassword1234", secure_connection=False)


Successfully inserted 1000 rows. Dataframe contained 1000 rows


## Verifying if the data is inserted

We will be using a nice tool called DataGrip, and this assumes that you have been using the docker container for MSSQL.

### Step 1: Right click on the + sign, and create a new data source for SQL Server (Microsoft)

<img src="./img/example_deployment_01.png">

## Step 2: Enter the username and password with the server as localhost (or any IP that the server is on).

The username and password is supplied at https://github.com/HealthCatalyst/healthcareai-py/blob/master/docs/deploy.md

<img src="./img/example_deployment_02.png">

## Step 3: Press OK, and you will see your database as localhost

<img src="./img/example_deployment_03.png">

## Step 4: Click on the Schemas tab

<img src="./img/example_deployment_04.png">

## Step 5: Check the SAM database

<img src="./img/example_deployment_05.png">

## Step 6: Expand the SAM database check, and then check all schemas

<img src="./img/example_deployment_06.png">

## Step 7: Select the HCIPredictionRegressionBASE tab and you can see your data successfully pushed

<img src="./img/example_deployment_07.png">