# NVidia Inference Server

## Setup account on NGC Cloud

1. Go to http://ngc.nvidia.com and create an account.

2. Generate an API Key.


## NVidia Image (recommended)

```
export HOST_NAME=`whoami`-inference
export PROJECT=$DEVSHELL_PROJECT_ID
export ZONE=europe-west4-a

gcloud beta compute --project "$PROJECT" \
  instances create "$HOST_NAME" \
  --zone "$ZONE" \
  --machine-type "n1-standard-4" \
  --subnet "default" \
  --maintenance-policy "TERMINATE" \
  --accelerator type=nvidia-tesla-p100,count=2 \
  --min-cpu-platform "Automatic" \
  --image nvidia-gpu-cloud-image-20180717 \
  --image-project nvidia-ngc-public \
  --boot-disk-size "200GB" \
  --boot-disk-type "pd-standard" \
  --boot-disk-device-name "$HOSTNAME" \
  --scopes=https://www.googleapis.com/auth/cloud-platform
```

## Ubuntu 16.04

### Install docker-engine and nvidia-docker

This assumes Ubuntu 16.04.

    sudo apt-get update && sudo apt-get install apt-transport-https ca-certificates curl -y
    sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 \
        --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
    echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list
    sudo apt-get update
    sudo apt-get -y install docker-engine=1.12.6-0~ubuntu-xenial

    wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
    sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
    
    sudo usermod -a -G docker $USER

(You may need to log out and log in again in order to run docker commands as non-sudo)


### Start an Inference Server instance

1. Browse to [Inference Server images](https://ngc.nvidia.com/registry/nvidia-inferenceserver)

2. Login docker to NGC:

```
    docker login nvcr.io
```

You will be prompted to enter a Username and Password. Type “$oauthtoken” exactly as shown, and enter your NGC API key obtained during NGC account setup:

```
    Username: $oauthtoken
    Password: <Your NGC API Key>
```
 
3. Pull container for inference server.

```
    docker pull nvcr.io/nvidia/inferenceserver:18.07-py3
```

4. Download sample Model Store (ResNet50 implemented in Caffe)

```
    git clone https://github.com/NVIDIA/dl-inference-server.git
    cd dl-inference-store/examples
    ./fetch_models.sh
    cd ../..
```

4. Start the server instance

```
nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -v"/home/lramsey/dl-inference-server/examples/models/resnet50_netdef":"/tmp/models" nvcr.io/nvidia/inferenceserver:18.07-py3 /opt/inference_server/bin/inference_server --model-store=/tmp/models
```

The nvidia-docker -v option maps ./dl-inference-server/examples/models/resnet50_netdef on the host into the container at /tmp/models, and the --model-store option to the Inference Server is used to point to /tmp/models as the model store.