<a id='top'></a>
<a name="top"></a><!--Need for Colab-->
# Quick introduction to TensorFlow Serving

Using Docker, the subprocess module, and HTTP requests-logging with TensorFlow Serving.

1. [Setup](#setup)
2. [Introduction](#2.0)
3. [Using subprocess with TensorFlow Serving](#3.0)
4. [Set up the prebuilt half_plus_two model](#4.0)
5. [HTTP Request Logging for TensorFlow Serving](#5.0)
6. [Running a minimal Docker image with TensorFlow Serving](#6.0)
7. [Subprocess to write logs to file](#7.0)
8. [Health check](#8.0)
    * 8.1 [Verify response](#8.1)
    * 8.2 [Verify logs](#8.2)
9. [Predict requests via POST](#9.0)
10. [Misc properties](#10.0)
11. [End and clean up processes](#11.0)

---
<a name="setup"></a>
# 1. Setup
<a href="#top">[back to top]</a>

In [7]:
# stdlib imports
import asyncio
import json
import os
from pathlib import Path
import pprint
import shlex
import subprocess
import sys
import time

# third party imports
import tensorflow as tf
import requests

# For debugging, provides version & hardware info
try:
    %load_ext watermark
except ImportError:
    print("Installing watermark:")
    !pip install watermark -q
    %load_ext watermark
finally:
    %watermark --python --packages requests,tensorflow
    
pp = pprint.PrettyPrinter(indent=2)

def HR():
    print("-"*40)

print("Finished loading packages..")

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Python implementation: CPython
Python version       : 3.8.12
IPython version      : 7.34.0

requests  : 2.27.1
tensorflow: 2.9.1

Finished loading packages..


---
<a id="2.0"></a><a name="2.0"></a>
# 2. Introduction
<a href="#top">[back to top]</a>

We explore these tasks here:

* Using subprocess with TensorFlow Serving.
* HTTP Request Logging for TensorFlow Serving.
* Set up the prebuilt model, half_plus_two.
* Running a minimal Docker image with TensorFlow Serving.
* Subprocess to write logs to file.

---
<a id="3.0"></a><a name="3.0"></a>
# 3. Using subprocess with TensorFlow Serving
<a href="#top">[back to top]</a>



A convenient way to use TensorFlow Serving is via Docker. We can operate the server and logging functions via Docker Commands on the CLI, but there are certain advantages (readability, maintenance, security, etc) to wrapping them in Python.

The older method of doing this involved using either os.system or os.spawn*. Here, we will instead use the newer subprocess module. This allows us to spawn new processes, connect to their input/output/error pipes, and optionally obtain their return codes. This is a safer analog to os.system().

The underlying process creation and management in `subprocess` is done by the [Popen Constructor](https://docs.python.org/3/library/subprocess.html#popen-constructor), `subprocess.Popen`. The underlying Popen interface can be used directly and offers the most flexibility. By default it results in a non-blocking call.

Once you've created the Popen instance, some options are:
* `wait()`:  to pause until the subprocess has exited.
* `poll()`: check if it's exited without pausing.
* `communicate()`: interact with process
    - Send data to stdin. 
    - Read data from stdout and stderr, until end-of-file is reached. 
    - Wait for process to terminate and set the returncode attribute. 


**Popen Constructor:**

<sup>

```bash
class subprocess.Popen(
    args, 
    bufsize=- 1, 
    executable=None, 
    stdin=None, 
    stdout=None, 
    stderr=None, 
    preexec_fn=None, 
    close_fds=True, 
    shell=False, 
    cwd=None, 
    env=None, 
    universal_newlines=None, 
    startupinfo=None, 
    creationflags=0, 
    restore_signals=True, 
    start_new_session=False, 
    pass_fds=(), 
    *, 
    group=None, 
    extra_groups=None, 
    user=None, 
    umask=- 1, 
    encoding=None, 
    errors=None, 
    text=None, 
    pipesize=- 1
)

```
<br>
</sup>
    
A convenience function built upon the underlying Popen interface is `subprocess.run`. This is a blocking call, as it waits for the command(s) to complete, then return a CompletedProcess instance.

There are older high-level APIs, existing prior to Python 3.5. The functionality provided by them has been superceded by `subprocess.Popen` and `subprocess.run`:

* `subprocess.call`
* `subprocess.check_call`
* `subprocess.check_output`


**Useful resources on subprocess:**

- https://peps.python.org/pep-0324/
- https://docs.python.org/3/whatsnew/2.4.html#pep-324-new-subprocess-module
- https://docs.python.org/3/library/subprocess.html
- https://www.bogotobogo.com/python/python_subprocess_module.php
- https://qiita.com/HidKamiya/items/e192a55371a2961ca8a4 (JP)
- https://www.programcreek.com/python/example/50/subprocess.Popen


---
<a id="4.0"></a><a name="4.0"></a>
# 4. Set up the prebuilt half_plus_two model
<a href="#top">[back to top]</a>

For this example, we will use Docker to deploy Tensorflow Serving and host the toy model **half_plus_two** that computes f(x) = (x / 2) + 2.  This model is found in the Tensorflow Serving Github repository.

* https://www.tensorflow.org/tfx/serving/tensorboard
* https://www.tensorflow.org/tfx/serving/docker
* https://github.com/tensorflow/serving/tree/master/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu/00000123


In [2]:
if not Path('serving').is_dir():
    !git clone https://github.com/tensorflow/serving
else:
    print("Saved TensorFlow models in 'serving/tensorflow_serving/servables/tensorflow/testdata'")

In [3]:
# Get absolute pathway of demo models
TESTDATA=Path("serving/tensorflow_serving/servables/tensorflow/testdata").resolve()
!du -hs {TESTDATA}

In [4]:
# Define model_name and model_dir, used to start the Docker container
model_saved = 'saved_model_half_plus_two_cpu'
model_dir = (Path() / TESTDATA / model_saved).resolve()
model_name = 'half_plus_two'

print(model_dir)

---
<a id="5.0"></a><a name="5.0"></a>
# 5. HTTP Request Logging for TensorFlow Serving
<a href="#top">[back to top]</a>

* The easiest way to enable logging for TensorFlow-Model-Serving is via environment variables. This is not specific to TensorFlow Serving, but general to TensorFlow.

* `TF_CPP_MIN_VLOG_LEVEL` enables logging of the main C++ backend. However, even the lowest setting of `TF_CPP_MIN_VLOG_LEVEL=1` results in too much noise.

* `TF_CPP_VMODULE` provides a way to constrain logging to specific modules or source files. The general format is `TF_CPP_VMODULE=<module_name>=1`, where the module name can be either the C++ or Python file name (without the extension).

* Here, we can activate logging individually for http_server.cc via 
`TF_CPP_VMODULE=http_server=1`. This will enable simple HTTP request logging and errors.

* To use this with Docker, we pass it as an environmental variable:  `--env TF_CPP_VMODULE=http_server=1`

---
<a name="6.0"></a>
# 6. Running a minimal Docker image with TensorFlow Serving
<a href="#top">[back to top]</a>

* This is the original Docker command-line format.

* `docker run` also pulls a docker image (or repository) from the docker registry, if it doesn't already exist locally.


This docker image features

* Port 8500 exposed for gRPC
* Port 8501 exposed for the REST API
* Optional environment variable MODEL_NAME (defaults to model)
* Optional environment variable MODEL_BASE_PATH (defaults to /models)

---

It can be easier to first to experiment and set up the Docker cli-commands first, then later wrap with subprocess.Popen in Python.

In [5]:
cmd_cli = f"""
docker run \\
--rm --tty -p 8500:8500 -p 8501:8501 \\
--name {model_name} \\
--mount type=bind,source={model_dir},target=/models/{model_name} \\
--env MODEL_NAME={model_name} \\
--env TF_CPP_VMODULE='http_server=1' \\
--detach \\
--log-driver=json-file \\
--log-opt=mode=non-blocking \\
tensorflow/serving:latest
"""

print(cmd_cli)

---
For more feedback, run these commands in different terminals before instantiating the Docker container:

```bash
$ watch docker ps
$ docker logs -f half_plus_two
```

In [6]:
def server_docker():  
        
    cmd=f"""
docker run
--rm --tty -p 8500:8500 -p 8501:8501
--name {model_name}
--mount type=bind,source={model_dir},target=/models/{model_name}
--env MODEL_NAME={model_name}
--env TF_CPP_VMODULE='http_server=1'
--detach
--log-driver=json-file
--log-opt=mode=non-blocking
tensorflow/serving:latest
"""
           
    try:
        proc = subprocess.Popen(
            shlex.split(cmd),
            stdout = subprocess.PIPE,
            stderr = subprocess.PIPE
        )
        
        # The communicate() method returns a tuple (stdoutdata, stderrdata)
        # It only reads data from stdout and stderr.
        out, err = proc.communicate()
        out = out.decode()
        err = err.decode()
        print(f"out: {(out.strip())}")
        
        if err:
            print(f"err: {err}")
                
        HR()
        
        sleep_time = 0.5 # Small time delay for docker instance to start up
        time.sleep(sleep_time)
        
    except subprocess.CalledProcessError as e:
        print(f"Subprocess error: {e.stderr}")
    except Exception as e:
        print(f"Error: {e}")
        
    return proc 

In [7]:
proc = server_docker()
#print(f"docker process id: {proc.pid}")  

In [8]:
# If need to immediatley kill and remove unused containers/networks/images
# !docker kill {model_name} && docker system prune --force

---
<a name="7.0"></a>
# 7. Subprocess to write logs to file
<a href="#top">[back to top]</a>

We can always access logs via `docker logs <container name>` on the CLI:

In [9]:
!docker logs --tail 5 {model_name}

---
We can also create a subprocess that redirects the logs to a file:

In [21]:
def log_docker_process():  
    cmd = f"docker logs --follow {model_name}"   
    try:
        proc = subprocess.Popen(
            shlex.split(cmd),
            stdout=open('logger_tfs.log', 'w'),
            stderr=subprocess.STDOUT, # redirect to stdout
        )
    except Exception as e:
        print(f"Error: {e}")

In [22]:
log_docker_process()

!du -hs logger_tfs.log

---
<a name="8.0"></a>
# 8. Health check
<a href="#top">[back to top]</a>

Creates a simple http client and send requests.

* https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/tensorflow_model_server_test.py#L520

In [11]:
def request_status_rest():
    try:
        resp_data = requests.get(f'http://localhost:8501/v1/models/{model_name}')
    except Exception as e:
        print(f"Error: {e}")
    else:
        return resp_data

<a id='8.1'></a><a name='8.1'></a>
## 8.1 Verify response
<a href="#top">[back to top]</a>

In [12]:
request_result = request_status_rest()
pp.pprint(request_result.json())

assert request_result.json() == {
        'model_version_status': [{
            'version': '123',
            'state': 'AVAILABLE',
            'status': {
                'error_code': 'OK',
                'error_message': ''
            }
        }]
    }

<a id='8.2'></a><a name='8.2'></a>
## 8.2 Verify logs
<a href="#top">[back to top]</a>

In [13]:
for x in range(5):
    resp = request_status_rest()
    print(f"===> TFS Status : {resp.headers['Date']} {resp}")
    time.sleep(0.5)
    
HR()
    
!du -h logger_tfs.log
HR()
!tail -5 logger_tfs.log

---
<a name="9.0"></a>
# 9. Predict requests via POST
<a href="#top">[back to top]</a>

This is the calculation returned by the TensorFlow Model:

f(x) = (x / 2) + 2

In [14]:
!curl -d '{"instances": [1.0]}' \
    -X POST http://localhost:8501/v1/models/half_plus_two:predict

In [15]:
# Payload
data = json.dumps({
    "instances": [1.0, 2.0, 5.0, -9]
})
print(f"Payload:\t{data}")
HR()

headers = {
    "content-type": "application_json"
}

response = requests.post(
    f'http://localhost:8501/v1/models/{model_name}:predict',
    data=data,
    headers=headers
)

print(f"Predictions:\t{json.loads(response.text)['predictions']}")
HR()

print("Properties of response:\n")
pp.pprint(response.__dict__)

---
<a name="10.0"></a>
# 10. Misc properties
<a href="#top">[back to top]</a>

In [16]:
# Check environmental variables of this container
!docker exec {model_name} env

In [17]:
# Using jq:
!docker inspect {model_name} | jq '.[] | .Config.Env'

In [18]:
# As a toy example, we log into the container and check this environmental variable:
!docker exec -it {model_name} /bin/bash | echo TF_CPP_VMODULE

In [19]:
def show_tfs_dict(proc):
    print("Properties of returned container process:\n")
    for key in proc.__dict__:
        if key == 'args':
            HR()
            print(key, '->', proc.__dict__[key])
            HR()
        else:
            print(key, '->', proc.__dict__[key])
 
show_tfs_dict(proc)

---
<a name="11.0"></a>
# 11. End and clean up processes
<a href="#top">[back to top]</a>

In [20]:
# Kill and remove unused containers, networks, image
!docker kill {model_name} && docker system prune --force