# ketos jupyter notebook machine learning example

Hi there, this is your personal model building environment. You have full access to your environment and can install any R packages you need and build your own models.

Rhis is an example for a ketos jupyter notebook, that will guide you through the model development and deployment process.

TRAIN YOUR MODEL

To train your model you can install any R packages you might need, add any additional functions and build your own model.

For any good building process you will need data, here is how you get it:

1. go to your ketos user interface and request a new dataset. Once your dataset has been prepared for you, you can access it from within this notebook, to do this just substitute the string 
"paste your training data url from your ketos system here to get your training data"
in the "ketos_train" cell with the URL you got.


DEPLOY YOUR MODEL

to deploy the notebook in order to allow for external predictions just do the following:

1. tag the cells that you would like to be executed for your prediction with the keyword "ketos_predict" in the first line of each cell as a comment.

2. tag the cells which produce the output for the prediction with the keyword "ketos_predict_output", please note that this cell will be executed and the output will be used as return value for the prediction. It is expected to be in json format as supplied by the "format_to_json" function provided here.


#### please note that if you would like to test a prediction using this model on our demo server, you will need to call one of the following patient ids in the front end: https://ketos.ai/mlmodels

Requirement for patientid:

1 <= patientid <= 204


In [None]:
# ketos_init
# Load opal and datashield libraries
library(opal)
library(opaladmin)
library(dsBaseClient)
library(dsStatsClient)
library(dsGraphicsClient)
library(dsModellingClient)

In [None]:
# ketos_train
# Login to VMs

# To understand why these variables are assigned this way, see the
# documentation for the datashield.login function (part of the opal
# package)

# login details
server <- c("datashield_opal")
# note the datashield_opal only works from inside this docker container
#url <- c("http://localhost:8880")
url <- c("https://gruendner.imi.uni-erlangen.de:8443")
# ^^^ Note this specifies the port number
user <- "test"
password <- "test123"
table <- c("test.ds_example_bc")
# ^^^ note that this reflects the folder hierarchy that can be seen via the OPAL web interface

# Create a dataframe with all these details
logindata <- data.frame(server,url,user,password,table)

# Create an 'opals' object by passing the 'logindata' data frame to the
# datashield.login function
opals <- datashield.login(logins=logindata, assign = TRUE)

ds.asNumeric("D$Cl.thickness")
ds.asNumeric("D$Cell.size")
ds.asNumeric("D$Cell.shape")
ds.asFactor("D$Class", "Class_factor")

myvectors <- c('Cl.thickness_num','Cell.size_num','Cell.shape_num', 'Class_factor')
ds.dataframe(x=myvectors, "bcdf")

glm_model <- ds.glm(formula='bcdf$Class_factor~bcdf$Cl.thickness_num + bcdf$Cell.size_num + bcdf$Cell.shape_num', family='binomial')

modeltoSave = 'ds_ketos_model'
save(glm_model,file = modeltoSave)

datashield.logout(opals)
print("Congratulations!!! - a basic glm model was successfully generated")

In [None]:
#ketos_predict
ds_prediction_function = function(thickness, size, shape ){
  load("ds_ketos_model")
  coeff = glm_model$coefficients
  pred_result = coeff[1] *1 + coeff[2] * thickness + coeff[3] * size + coeff[4] * shape
  pred_result = plogis(pred_result)

  if(pred_result > 0.5){
      return(1)
  }
    
  return(0)
}

ds_generate_predictions = function(predict_data){
    
    predictions = c();
    
    for(row in 1:nrow(predict_data)){
        prediction = ds_prediction_function(predict_data$CellThickness[row], predict_data$CellSize[row], predict_data$CellShape[row] )
        predictions = c(predictions, prediction)

    }
    
    return(predictions)
}

In [None]:
#ketos_predict, ketos_predict_output

format_to_json = function(patientIds, predictions){
    ret_json = "["
    ret_json = paste(ret_json,'{"patientId":"',patientIds[1] ,'", "prediction":"',predictions[1], '"}', sep="")
    
    if(!(length(patientIds) == 1)){
    for(i in 2:length(patientIds)){
        patientId = patientIds[i]
        prediction = predictions[i]
        
        ret_json = paste(ret_json,',{"patientId":"',patientId ,'", "prediction":"' ,prediction, '"}', sep="")
    }
        }
    ret_json = paste(ret_json, "]", sep="")
    
    return (ret_json)
}

ketos_predict = function(){
    ketos_predict_data = ("http://ketos_preproc:5000/aggregation/5d5af211c9dde910a875dafe?output_type=csv&aggregation_type=latest")
    load("ds_ketos_model")
    predict_data = read.csv(url(ketos_predict_data))
    
    predict_data$CellThickness = as.numeric(predict_data$CellThickness)
    predict_data$CellSize = as.numeric(predict_data$CellSize)
    predict_data$CellShape = as.numeric(predict_data$CellShape)
    predict_data$TumorClass = as.factor(predict_data$TumorClass)
    
    predictions = ds_generate_predictions(predict_data)
    prediction = format_to_json(predict_data$patient, predictions)
    return ( paste('{"ds_glm_predictions":', prediction,'}', sep=""))
}
print(ketos_predict(), quote=FALSE)