# Deployment and integration of Julia in production environments

### Sebastian Zając



The main goal of our part of workshop is show how we can deploy Julia app on production environments. 
 

We'll build two simple models (linear regression model and neural network) to predict median house value in the Boston suburbs. 


In the workshop we will use the dataset from [UCI repository](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/).

We use `housing.csv` file that stores the data.

Each record of this data base has the following fields:

* `CRIM`: per capita crime rate by town
* `ZN`: proportion of residential land zoned for lots over 25,000 sq.ft.
* `INDUS`: proportion of non-retail business acres per town
* `CHAS`: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
* `NOX`: nitric oxides concentration (parts per 10 million)
* `RM`: average number of rooms per dwelling
* `AGE`: proportion of owner-occupied units built prior to 1940
* `DIS`: weighted distances to five Boston employment centres
* `RAD`: index of accessibility to radial highways
* `TAX`: full-value property-tax rate per \$10,000
* `PTRATIO`: pupil-teacher ratio by town
* `B`: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* `LSTAT`: - \% lower status of the population
* **MEDV**: - Target feature - Median value of owner-occupied homes in \$1000's

After training and evaluation, the model should be deployed to serve the scores and predictions.

The model is usually embedded into a bigger application or exposed through a web service. The mentioned solutions need additional logic to properly prepare the input data and return the prediction should be returned to the user in appropriate form.
* **JSON-based web service** - JSON payload with input observation is provided to the web service and the JSON with the prediction is returned back

## 1. Data preprocessing and model building

Model building will be proceed with 3 steps: 

1. Load data
2. Data Preprocessing (normalization)
3. Models training

#### 1.1 Load data

In [1]:
using CSV

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]


In [2]:
using DataFrames

In [3]:
houses = CSV.read("housing.csv", DataFrame) 

houses[1:5,:]

Row,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [4]:
describe(houses, :min, :mean, :max, :nmissing, :eltype)

Row,variable,min,mean,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Float64,Float64,Int64,DataType
1,CRIM,0.00632,3.61352,88.9762,0,Float64
2,ZN,0.0,11.3636,100.0,0,Float64
3,INDUS,0.46,11.1368,27.74,0,Float64
4,CHAS,0.0,0.06917,1.0,0,Float64
5,NOX,0.385,0.554695,0.871,0,Float64
6,RM,3.561,6.28463,8.78,0,Float64
7,AGE,2.9,68.5749,100.0,0,Float64
8,DIS,1.1296,3.79504,12.1265,0,Float64
9,RAD,1.0,9.54941,24.0,0,Float64
10,TAX,187.0,408.237,711.0,0,Float64


In [5]:
# check names of our features
names(houses)

14-element Vector{String}:
 "CRIM"
 "ZN"
 "INDUS"
 "CHAS"
 "NOX"
 "RM"
 "AGE"
 "DIS"
 "RAD"
 "TAX"
 "PTRATIO"
 "B"
 "LSTAT"
 "MEDV"

#### 1.2 Data pipeline

In [None]:
# not defined how to do it yet

### Let's create simple linear regression model 

More info about linear regression You can find in Day_2a_Classical_predictive_model

In [6]:
using GLM

# simple linear regression model

model_specification = @formula(MEDV ~ CRIM + INDUS + CHAS + RM + AGE + DIS + TAX + LSTAT)
linear_model = lm(model_specification, houses)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int32}}}}, Matrix{Float64}}

MEDV ~ 1 + CRIM + INDUS + CHAS + RM + AGE + DIS + TAX + LSTAT

Coefficients:
─────────────────────────────────────────────────────────────────────────────────
                   Coef.  Std. Error       t  Pr(>|t|)    Lower 95%     Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept)   8.93642     3.47595       2.57    0.0104   2.10704     15.7658
CRIM         -0.077255    0.0343622    -2.25    0.0250  -0.144768    -0.00974194
INDUS        -0.156351    0.0626433    -2.50    0.0129  -0.27943     -0.033273
CHAS          3.65623     0.934083      3.91    0.0001   1.82099      5.49147
RM            4.77929     0.436726     10.94    <1e-24   3.92124      5.63735
AGE          -0.023582    0.0138332    -1.70    0.0889  -0.0507608    0.00359671
DIS       

In [7]:
# take a first two row of data
test_prediction = houses[1:2, [:CRIM, :INDUS, :CHAS, :RM, :AGE, :DIS, :TAX, :LSTAT]]

# predict value for this rows
predict(linear_model, test_prediction)

2-element Vector{Union{Missing, Float64}}:
 29.919737211857885
 25.13217586404559

In [9]:
test_dict = Dict("DIS" => 3.02,"CRIM" => 0.00532,"INDUS" => 1.51,"RM" => 4.53,"AGE" => 40.2,"CHAS" => 0.0,"TAX" => 296.0,"LSTAT" => 4.98)

Dict{String, Float64} with 8 entries:
  "DIS"   => 3.02
  "CHAS"  => 0.0
  "CRIM"  => 0.00532
  "INDUS" => 1.51
  "RM"    => 4.53
  "AGE"   => 40.2
  "TAX"   => 296.0
  "LSTAT" => 4.98

In [11]:
predict(linear_model, DataFrame(test_dict))

1-element Vector{Union{Missing, Float64}}:
 21.99126461065233

>> Let's think - how You can use Your great model on your n

In [None]:
using BSON: @save
@save "linear_regression.bson" linear_model

In [None]:
using LinearAlgebra
using BSON: @load

linear_model = nothing

@load "linear_regression.bson" linear_model

predict(linear_model, DataFrame(test_dict))

### Neural network model with Flux

let's take neural network model

In [None]:
X = transpose(Matrix(houses[!,Not(:MEDV)]))
y = transpose(houses.MEDV);

In [None]:
# test for data frame data - not working
#X = houses[:, Not(:MEDV)]
#y = houses.MEDV

In [None]:
using Flux
using ProgressMeter

# Neural network model one dense hidden layer with ReLU activation function

# data
data = [(X, y)]
# model type
nn_model = Chain(Dense(13 => 8, relu), Dense(8 => 1))
# loss function definition
loss(x, y) = Flux.Losses.mse(nn_model(x), y)
# hyperparams
parameters = Flux.params(nn_model)
# optymalization algorithm type
opt = Flux.Adam(0.002)

@showprogress for epoch in 1:20_000
    Flux.train!(loss, parameters, data, opt)
end

In [None]:
first_row_nn = X[:,1]
println("from NN model: ", model(first_row_nn)[1])

In [None]:
# model evaluation 
using Statistics

RMSE(y, ŷ) = sqrt(mean((y - ŷ).^2));

In [None]:
# for regression 
RMSE(y, transpose(predict(linear_model, houses)))

In [None]:
# for neural network
RMSE(y, model(X))

In [None]:
RMSE(y[1], transpose(predict(linear_model, DataFrame(one_prediction))[1]))

In [None]:
RMSE(y[1], model(first_row_nn)[1])

In [None]:
using BSON: @save
@save "nn_model.bson" nn_model

In [None]:
using BSON: @load
nn_model = nothing
@load "nn_model.bson" nn_model

println("from NN model: ", nn_model(first_row_nn)[1])

In [None]:
using BSON
d = BSON.parse("nn_model.bson")
model_nn = d[:model] ? Jak to ma działać ?
model_nn(first_row_nn)

### prepare data for POST request

Saving first observation from the training dataset into `house.json` file

In [None]:
using JSON

open("house.json","w") do f
    JSON.print(f, Dict(names(houses)[begin:end-1] .=> X[:,1]),4)
end

In [None]:
println(read("house.json", String))

In [None]:
;more house.json

## Simple REST API with Julia

[Genie](https://genieframework.com/docs/) is a full stack web framework for the Julia programming language

We can create simple API with Genie. We want json as a response

In the most easy way we can take GET method to send variables. 


In [None]:
using Genie, Genie.Renderer.Json
using Genie.Requests # for method GET and POST

route("/") do 
  (:message => "Hello Julia!") |> Json.json
end

route("/getapi", method=GET) do
  vars = getpayload()
  (:variables => vars) |> Json.json
end

#start the server - it will not block the Jupyter due to async=true
up(8000, async=true)

After starting the server, you can use `curl` or other tool capable of sending and receiving HTTP requests to interact with the API.

In [None]:
;curl http://localhost:8000/

In [None]:
;curl http://localhost:8000/getapi\?\&val1=43\&val2=3

In [None]:
using HTTP
resp = HTTP.get("http://localhost:8000")
println(resp.status)
println(String(resp.body))

You can also use Python for simple client program

```julia
using PyCall
req = pyimport("requests")
r = req.get("http://localhost:8000")
print(r.status_code)
```

The server is running asynchronously in Jupyter. When you are finished, run the `down()` command to turn it off.

In [None]:
down()

In [None]:
using Genie, Genie.Requests, Genie.Renderer.Json
using Flux
using BSON: @load
using GLM
using DataFrames
using LinearAlgebra


@load "nn_model.bson" nn_model

@load "linear_regresion.bson" linear_model

route("/flux") do
"""<div style="white-space:pre">To receive a prediction send POST request with JSON payload.

Example:
>> curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/
>> cat house.json
{
    "crim": 0.00632,
    "tax": 296.0,
    "chas": 0.0,
    "black": 396.9,
    "lstat": 4.98,
    "age": 65.2,
    "indus": 2.31,
    "rm": 6.575,
    "dis": 4.09,
    "zn": 18.0,
    "nox": 0.538,
    "ptratio": 15.3,
    "rad": 1.0
}</div>"""
end

route("/flux", method = POST) do
    
    input_data = jsonpayload()
    keys_json = keys(input_data)
    columns = ["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT"]
    missing_fields = [k for k in columns if k ∉ keys_json]
    
    if length(missing_fields) != 0
        missing_str = join(missing_fields, ",")
        Json.json(:error => "The fields: $missing_str are missing from the JSON payload."*
            "The prediction can not be returned.")
    else
        try
            Json.json(Dict("input" => input_data,
                        "prediction" => model([input_data[f] for f in columns])[1])
                     )
        catch e
            Json.json(:error => "Ooops! There was a problem while generating a prediction.")
        end
    end
end

route("/glm") do
"""<div style="white-space:pre">To receive a prediction for GLM linear model send POST request with JSON payload.

First row:
{
    "crim": 0.00632,
    "tax": 296.0,
    "chas": 0.0,
    "black": 396.9,
    "lstat": 4.98,
    "age": 65.2,
    "indus": 2.31,
    "rm": 6.575,
    "dis": 4.09,
    "zn": 18.0,
    "nox": 0.538,
    "ptratio": 15.3,
    "rad": 1.0
}</div>"""
    
end

route("/glm", method = POST) do
    input_data = jsonpayload()
    try
        (":input" => input_data,":prediction" => predict(linear_model, DataFrame(input_data))) |> Json.json
    catch e
        (:error => "Ooops! There was a problem while generating a prediction.") |> Json.json
    end
end


#start the server - it will not block the Jupyter due to async=true
up(port=8000, async=true)

In [None]:
down()

In [None]:
;cat house.json

In [None]:
;curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/flux/

In [None]:
;curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/glm/

## Docker container 

In [None]:
] generate Docker

In [None]:
;cd Docker

In [None]:
;pwd

In [None]:
] activate .

### i will use just simple GLM model

In [None]:
] add "Genie" "BSON" "GLM" "DataFrames" "LinearAlgebra"

In [None]:
;cd ..

Add you BSON file with model and create new app.jl file with genie server.
Remember change async setting
```julia
 up(port=8000, async=false)
```

*Preparation of this worksop has been supported by the Polish National Agency for Academic Exchange under the Strategic Partnerships programme, grant number BPI/PST/2021/1/00069/U/00001.*

![SGH & NAWA](logo.png)