# How to Deploy a Tensorflow Model in Production

We know how to 
- write models
- train models
- test models

but how do we deploy them for production use?

Let's create a simple  webapp that will allow the user to upload an image and run the Inception model over it for classifying.

##  Tensorflow Serving 

![Image of Yaktocat](https://cdn-images-1.medium.com/max/1800/0*O7yprjYDk2WTO3__.)

![Image of Yaktocat](https://blogs.nvidia.com/wp-content/uploads/2016/08/ai_difference_between_deep_learning_training_inference.jpg)

- Google's open source library that accompanies Tensorflow
- Meant for Inference. (managing models, giving versioned access via reference-counted lookup table i.e HTTP interface via RPC) 
https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/
- Can serve multiple models simultaneously (great for A/B testing)
- Can serve multiple versions of the same model
- written in C++ 


## Architecture Overview

![Image of Yaktocat](https://tensorflow.github.io/serving/images/serving_architecture.svg)

### 4 major components

### Servables

- The central abstraction in TensorFlow Serving. They are the objects that clients use to perform computation (for example, a lookup or inference).
- Flexible size (single lookup table shard, model, multiple models)
- Good for Concurrent operations, A/B testing
- Multiple versions of a servable in one instance helps refresh configs
- Streams are sorted in-order like Git

### Loaders

- manage a servable's life cycle. Enables common infrastructure, standardizes the APIs for loading and unloading a servable.

### Sources

- plugin modules that originates zero or more servable streams. For each stream, a Source supplies one Loader instance for each version it wants to have loaded. 

### Managers

- handle the full lifecycle of Servables (loading, serving, unloading)
- listen to Sources and track all versions. 
- tries to fulfill Sources' requests, but may refuse to load an aspired version if, say, required resources aren't available. 
- may wait to unload until a newer version finishes loading, based on a policy to guarantee that at least one version is loaded at all times.

### 2 Step Process 

1. Sources create Loaders for Servable Versions.
2. Loaders are sent as Aspired Versions to the Manager, which loads and serves them to client requests.



# Step 1 - Setup Development Environment

Manually install from source? No. Let's use Docker.


Docker is like a lightweight version of a virtual machine image that runs without the overhead of running a full OS inside it. It's like an app-specific VM. No need to worry about conflicting versions and other entanglements with the rest of the OS. 

![Image of Yaktocat](http://zdnet2.cbsistatic.com/hub/i/r/2017/05/08/af178c5a-64dd-4900-8447-3abd739757e3/resize/770xauto/78abd09a8d41c182a28118ac0465c914/docker-vm-container.png)

Install Instructions https://docs.docker.com/engine/installation/.

Let's first clone the Tensorflow serving repo

```
git clone --recursive https://github.com/tensorflow/serving
cd serving
```

Now we can create a docker image with all the required dependencies(pip dependencies, bazel, grpc)

```
docker build --pull -t $USER/tensorflow-serving-devel -f tensorflow_serving/tools/docker/Dockerfile.devel .
```
for who use virtual box in windows please use this command.
```
docker build --pull -t user/tensorflow-serving-devel -f tensorflow_serving/tools/docker/Dockerfile.devel .
```


Now let's run the container locally. Once it's running it will let us work in a terminal inside of it. 

```
docker run --name=tensorflow_container -it $USER/tensorflow-serving-devel
```

Now we can clone Tensorflow serving into our dependency-ready container

git clone --recursive https://github.com/tensorflow/serving
cd serving/tensorflow
./configure


Now we need to build it using Google's Bazel build tool from inside our container.  Bazel manages third party dependencies at code level, downloading and building them, as long as they are also built with Bazel. 

Dependencies needed are

* tensorflow serving
* pre-trained inception model


First TF Serving (This will take like 20-50 minutes)

```
cd ..
bazel build -c opt tensorflow_serving/...
```

Once completed we can test it out by running the model server

```
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
```

Output should look like this if install was successful

```
Usage: model_server [--port=8500] [--enable_batching] [--model_name=my_name] --model_base_path=/path/to/export
```

Now for dependency 2 of 2, the Inception Model. It's a Deep convolutional neural network that achieved state of the art classification in the ImageNet competition in 2014. Trained on hundreds of thousands of images.

```
curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
tar xzf inception-v3-2016-03-01.tar.gz
bazel-bin/tensorflow_serving/example/inception_export --checkpoint_dir=inception-v3 --export_dir=inception-export
```
![Image of Yaktocat](https://1.bp.blogspot.com/-O7AznVGY9js/V8cV_wKKsMI/AAAAAAAABKQ/maO7n2w3dT4Pkcmk7wgGqiSX5FUW2sfZgCLcB/s1600/image00.png)

Let's run it and the gRPC server locally!


```
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &
```


Now that it's running on our local server, let's test it out using our python client app. We'll query it using a panda picture and it'll return a classification output

```
wget https://upload.wikimedia.org/wikipedia/en/a/ac/Xiang_Xiang_panda.jpg
bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=./Xiang_Xiang_panda.jpg
```

If everything works, we'll see a panda classification output to terminal!

Wanna push this to the cloud? Well using Google cloud and the automatic container management tool (https://kubernetes.io/) we can. See part 2 of this tutorial to do that 

https://tensorflow.github.io/serving/serving_inception

and when we're ready to build our own model

https://tensorflow.github.io/serving/serving_basic