# Putting your model in production

- Python focused
- Just touching the topic (but with some important best practices)
- You'll need to experiment by yourself

![Services](img/services.png)

# Using cloud

- Google Cloud Platform
- https://cloud.google.com/ml-engine/docs/concepts/prediction-overview
- https://cloud.google.com/blog/big-data/2017/09/performing-prediction-with-tensorflow-object-detection-models-on-google-cloud-machine-learning-engine

# Web app 101

- Request / Response
- Request types - GET / POST

GET: http://example.com

GET with parameters: http://example.com?feature1=encoded_data&feature2=42

### JSON format 

Sending/receiving data

In [None]:
{
    "head_element": [
        "just_string": "I'm happy string!",
        "array_example": [
            {"more": "inside"},
            {"even": "more"}
        ]
    ]
}

Remember to set your Request / Response header to ```application/json```

_(Example POST request - in Postman)_

# Flask Application

- Base flask "template"
- installing flask related modules - ```pip install flask flask_restful ...```  _(all in pip)_
- virtual env / conda env
- running with default python development server

```
$ python app.py 
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [22/Sep/2017 15:31:49] "GET /classify_user HTTP/1.0" 200 -
```

# File: app.py

In [None]:
from flask import Flask
from flask_restful import Api
from resource.weibo_classification_resource import WeiboClassificationResource
from utils import set_logging

app = Flask(__name__)
api = Api(app)

# Handle our GETs and POSTs
api.add_resource(WeiboClassificationResource, '/classify_user',) 

if __name__ == '__main__':
    app.run(port=5000)

# File: weibo_classification_resource.py

In [None]:
from flask_restful import Resource
from flask import jsonify, request
from flask import Response

class WeiboClassificationResource(Resource):
    def get(self):
        return jsonify(hello='world!')
    
    def post(self):
        json_data = request.get_json(force=True)
        posts = json_data['user_posts']

        if posts[0] == 'fake':
            res_text = 'fake user!'
        else:
            res_text = 'real_user'

        return jsonify(classification_result=res_text)    

In [None]:
# 1.    
# GET http://localhost:5000/classify_user

# 2.
# POST http://localhost:5000/classify_user    
# JSON:     
#{
#    "user_posts": ["user post 1", "user post 2"]
#}

# Loading model in memory

## File: model.py

In [None]:
class Model:
    '''
    Model initialization process
    '''
    def __init__(self):
        print("START LOADING MODEL")
        for _ in range(0, 3):
            with open('/Users/bart/Downloads/vocabulary_and_requests_2017_09_11.sql') as fh:
                self.f = fh.read()
        print('END LOADING MODEL')
        print("All operations required to run predict() are in worker's memory")

        self.len_f = 0

    def predict(self):
        self.len_f = self.len_f + 1
        return self.len_f

# Logging

In [None]:
from flask import Flask
from flask_restful import Api
from resource.weibo_classification_resource import WeiboClassificationResource
from utils import set_logging

app = Flask(__name__)
api = Api(app)

# LOGGING
logging_config_file = 'config/logging_config.yml'
set_logging(logging_config_file)

api.add_resource(WeiboClassificationResource, '/classify_user',)

if __name__ == '__main__':
    app.run(port=5000)

# Logging

## File: config/logging_config.yml

In [None]:
version: 1

formatters:
  simple:
    format: '[%(asctime)s] %(name)s:%(module)s:%(levelname)s - %(message)s'

handlers:
  console:
    class: logging.StreamHandler
    level: WARNING
    formatter: simple
    stream: ext://sys.stdout
  logfile:
    class: logging.FileHandler
    filename: logs/app.log
    formatter: simple
    level: DEBUG

root:
  level: DEBUG
  handlers: [logfile]

# Batching

In [None]:
{
    "posts": [
        'user first post',
        'user next post',
        'and third post'
    ]
}

# Gunicorn

- installing - ```pip install gunicorn```
- invoking

### File: run.sh

In [None]:
#!/usr/bin/env bash

source activate py36 # or virtualenv ....
gunicorn -b 0.0.0.0:5000 app:app
        
# in file: app.py
# app = Flask(__name__)

# Gunicorn

### Multiple workers

In [None]:
gunicorn -w4 -b 0.0.0.0:5000 app:app -k gevent

### Extended access/error logging

In [None]:
gunicorn -w4 -b 0.0.0.0:5000 app:app --access-logfile logs/access.log --error-logfile logs/error.log -k gevent

### Deamon mode

In [None]:
gunicorn -w4 -b 0.0.0.0:5000 app:app --deamon --access-logfile logs/access.log --error-logfile logs/error.log -k gevent

## Nginx

very basic config

In [None]:
http {
    upstream my_app {
        server localhost:5000;
    }

    server {
        listen 5001;

        location / {
            proxy_pass http://my_app1;
        }
    }
}

## Sending other data formats

### Image - encode/decode Base64

  1. (Sender) Take image and encode using Base64
  2. Send using JSON
  3. (Model API) decode Base64 back to image
  4. predict()

## Sending other data formats

### Image -  multipart/form-data

In [None]:
>>> url = 'http://model-api.com'
>>> files = {'file': open('image.jpg', 'rb')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

# Security 

Header tokens

In [None]:
from flask import jsonify, request
from flask import Response

class WeiboClassificationResource(Resource):
    
    def post(self):
        auth = request.headers.get('Authorization')
        if auth != 'super-secret-password':
            return Response("", status=401) # HTTP ERROR CODE UNAUTHORIZED

# Scaling

### nginx

In [None]:
http {
    upstream my_app {
        server 127.0.0.1:5000 weight=3;
        server 123.456.789.002:5000;
        server 123.456.789.003:5000;
    }

    server {
        listen 5001;

        location / {
            proxy_pass http://myapp;
        }
    }
}

# Scaling

### haproxy

![Haproxy](img/load-balancing-haproxy-nginx.png)

![Haproxy](img/haproxy_stats.png)

# GPU vs. many CPU

Every case is different - just do the math

# Service level

- Zero-downtime deploy
- Zero-downtime reload

# Tensorflow

- TF Serving - https://www.tensorflow.org/serving/

![TF Serving](https://www.tensorflow.org/serving/images/tf_diagram.svg)