# How to deploy ML models into production

https://www.youtube.com/watch?v=-UYyyeYJAoQ
Talk by Sumit Goyal, Software Engineer at IBM

The real value in DS project comes when you deploy the model on the web service. And deployment is rarely covered. 



The machine learning workflow starts with business understanding, then followed by data understanding and data preparation. Then comes modelling, evaluation and deployment. 

But the work does not end with deployment.

The data for the demo comes from https://www.kaggle.com/mlg-ulb/creditcardfraud

#### Deployment

The really bad way to deal with the deployment is to create and train the model in python, then re-implement it in Java or C++ and then deploy into the Rest API. It is bad from time to value standpoint. 

There is also a PMML (Predictive Model Markup Language) but it is not very stable.

The third way is to serialize the model and deploy it into the python web application serving a REST API. Serialized Object (a blob) is saved into database, and on top of the blob the database stores the version of the model, the name, type, ML Frameworks, Eval. metrics of the model, etc. And we need some sort of web framework, we load the model, create a route (like predict route), prepare the data, and then run the predict step and serve the prediction. In the ecommerce application when the transaction is created in the service, there is a call that is created to the ML endpoint of the REST API which asks is it a fraud or not. 

When the model is served into production it is not a toy anymore. And there are service requirements: max response time, availability, quality/confidence of prediction, max re-train time, monitor. If the model is re-training, does it mean that we can't use the model anymore?

### Deployment with Cloud Foundry
Cloud Foundry automates the deployment. https://www.cloudfoundry.org/ Takes care of application lifecycle management. Cloud Foundry has a concept of build pack, which is like a docker container. In a buildback we will say: connect to the database, load the model, import sklearn, import flask, create a route, prepare score data, run prediction, post-processing, return results. With Cloud Foundry you can scale an application from 5 calls/second to 1000 calls/second.

1. Develop cloud foundry app
2. CLI: push
3. Set configuration
4. predictions are served using post.

##### The app.py that is deployed looks like this:

In [None]:
# import Cloudant library so the model can be downloaded into the application
from cloudant import Cloudant
from flask import Flask, render_template, request, jsonify
import atexit
import os
import json
import pickle
import pandas as pd
import numpy as np


app = Flask(__name__, static_url_path='')

db_name = None
model_id = None
client = None
db = None


# credential in CloudFoundry are called VCAP_SERVICES
if 'VCAP_SERVICES' in os.environ:
    vcap = json.loads(os.getenv('VCAP_SERVICES'))
    print('Found VCAP_SERVICES')
    if 'cloudantNoSQLDB' in vcap:
        creds = vcap['cloudantNoSQLDB'][0]['credentials']
        user = creds['username']
        password = creds['password']
        url = 'htttps://' + creds['host']
        client = Cloudant(user, password, url=url, connect=True)
        db_name = os.getenv('MODELS_DB_NAME')
        db = client.create_database(db_name, throw_on_exists=False)
        model_id = os.getenv('MODEL_ID')
    elif os.path.isfile('vcap-local.json'):
        with open('vcap-local.json') as f:
            vcap = json.load(f)
            print('Found local VCAP_SERVICES file')
            creds = vcap['services']['cloudantNoSQLDB'][0]['credentials']
            user = creds['username']
            password = creds['password']
            urls = 'https://' + creds['host']
            client = Cloudant(user, password, url=url, connect=True)
            db_name = os.getenv('MODELS_DB_NAME')
            db = client.create_database(db_name, throw_on_exists=False)
            model_id = os.getenv('MODEL_ID')

            
doc = db[model_id]
model_binary = doc.get_attachment(attachment='model')
model = pickle.loads(model_binary)
print('Loaded the model')

# When running this app on local machine, default the port to 8000
port = int(ps.getenv('PORT', 8000))

@app.route('/')
def root():
    return app.send_static_file('index.html')

@app.route('/api/predict', methods=['POST'])
def score():
    features = json.dumps(request.json['data'])
    print('Incoming request for scoring')
    print(features)
    # read the data from POST call
    df = pd.read_json(features, orient='index')
    predictions = model.predict(df)
    print('Responding with prediction: {}'.format(predictions))
    return jsonify(np.array2string(predictions))


@atexit.register
def shutdown():
    if client:
        client.disconnect()
        
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=port)

starts a wev server locally when type in CLI:
python app.py

##### manifest.yml contains necessary configuration details to deploy the model 

In [None]:
---
applications:
- name: FraudDectectionAPI
  host: fraud-detection-api-v1
  memory: 1024mb

cf push command in CLI actually deploys the model. 
cf push 

### Deployment with Docker

To deploy an app with docker and kubernetes, the only difference is using a dockerfile:

FROM: python:3.6.3

WORKDIR /app/

COPY app.py setup.py vcap-local.json requirments.txt /app/

RUN pip install -r ./requirements.txt

COPY static /app/static

EXPOSE 8000

ENTRYPOINT python ./hello.py

### Deployment with managed services 
Managed services like Microsoft Azure ML Studio, Amazon SageMaker, IBM Watson Machine Learning have one click deployment by providing a special SDK. 

one more talk
https://www.youtube.com/watch?v=BJ2QVzGmb2w