kserve · k8s-ci-robot · Nov 9, 2020 · Nov 4, 2020 · Nov 5, 2020 · Nov 5, 2020
diff --git a/docs/samples/custom/torchserve/hugging-face-bert-sample.md b/docs/samples/custom/torchserve/hugging-face-bert-sample.md
@@ -0,0 +1,121 @@
+# Torchserve custom server with Hugging Face bert sample
+
+## Model archive file creation
+
+[Torchserve Huggingface Transformers Bert example](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers)
+
+## Download and install transformer models
+
+```bash
+pip install transformers
+```
+
+Modify setup_config.json
+
+```json
+{
+ "model_name":"bert-base-uncased",
+ "mode":"sequence_classification",
+ "do_lower_case":"True",
+ "num_labels":"2",
+ "save_mode":"torchscript",
+ "max_length":"150"
+}
+```
+
+Download models
+
+```bash
+python Download_Transformer_models.py
+```
+
+Create a requirements.txt and add `transformers` to it.
+
+Run the below command with all files in place to generate mar file.
+
+```bash
+torch-model-archiver --model-name BERTSeqClassification --version 1.0 --serialized-file Transformer_model/pytorch_model.bin --handler ./Transformer_handler_generalized.py --extra-files"Transformer_model/config.json,./setup_config.json,./Seq_classification_artifacts/index_to_name.json" -r requirements.txt
+```
+
+## Build and push the sample Docker Image
+
+The custom torchserve image is wrapped with model inside the container and serves it with KFServing.
+
+In this example we use Docker to build the torchserve image with marfile and config.properties into a container.
+
+Add the below config in config.properties to install transformers
+
+```json
+install_py_dep_per_model=true
+```
+
+To build and push with Docker Hub, run these commands replacing {username} with your Docker Hub username:
+
+```bash
+# Build the container on your local machine
+
+# For CPU
+DOCKER_BUILDKIT=1 docker build --file Dockerfile -t torchserve:latest .
+
+# For GPU
+DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-argBASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .
+
+# Push the container to docker registry
+docker tag torchserve:latest {username}/torchserve:latest
+docker push {username}/torchserve:latest
+```
+
+## Create the InferenceService
+
+In the `torchserve-custom.yaml` file edit the container image and replace {username} with your Docker Hub username.
+
+Apply the CRD
+
+```bash
+kubectl apply -f torchserve-custom.yaml
+```
+
+Expected Output
+
+```bash
+$inferenceservice.serving.kubeflow.org/torchserve-custom created
+```
+
+## Run a prediction
+
+The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
+
+```bash
+MODEL_NAME=torchserve-custom
+SERVICE_HOSTNAME=$(kubectl get route ${MODEL_NAME}-predictor-default -n <namespace> -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+
+curl -v -H "Host: ${SERVICE_HOSTNAM}" http://${INGRESS_HOST}:${INGRESS_PORT}/predictions/BERTSeqClassification -T serve/examples/Huggingface_Transformers/sample_text.txt
+```
+
+Expected Output
+
+```bash
+*   Trying 44.239.20.204...
+* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (44.239.20.204) port 80 (#0)
+> PUT /predictions/BERTSeqClassification HTTP/1.1
+> Host: torchserve-custom-predictor-default.kfserving-test.example.com
+> User-Agent: curl/7.47.0
+> Accept: */*
+> Content-Length: 79
+> Expect: 100-continue
+>
+< HTTP/1.1 100 Continue
+* We are completely uploaded and fine
+< HTTP/1.1 200 OK
+< cache-control: no-cache; no-store, must-revalidate, private
+< content-length: 8
+< date: Wed, 04 Nov 2020 10:54:49 GMT
+< expires: Thu, 01 Jan 1970 00:00:00 UTC
+< pragma: no-cache
+< x-request-id: 4b54d3ac-185f-444c-b344-b8a785fdeb50
+< x-envoy-upstream-service-time: 2085
+< server: istio-envoy
+<
+* Connection #0 to host torchserve-custom-predictor-default.kfserving-test.example.com left intact
+Accepted
+```