Skip to content

Commit

Permalink
Add TorchServe CPU Example (#2613)
Browse files Browse the repository at this point in the history
* add basic ipex torchserve example

* update commands and proxy info

* Added advanced model archive info

* Added advanced model archive info

* Added advanced model archive info

* Update README.md

* grammar

* Update container versions

* Update README.md

---------

Co-authored-by: Pratool Bharti <pratool.bharti@intel.com>
  • Loading branch information
tylertitsworth and pbharti0831 committed Mar 5, 2024
1 parent 5ac04d3 commit 1f6fe64
Show file tree
Hide file tree
Showing 2 changed files with 170 additions and 0 deletions.
137 changes: 137 additions & 0 deletions examples/cpu/serving/torchserve/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Serving ResNet50 INT8 model with TorchServe and Intel® Extension for PyTorch optimizations

## Description
This sample provides code to integrate Intel® Extension for PyTorch with TorchServe. This project quantizes a ResNet50 model to use the INT8 Precision to improve performance on CPU.

## Preparation
You'll need to install Docker Engine on your development system. Note that while **Docker Engine** is free to use, **Docker Desktop** may require you to purchase a license. See the [Docker Engine Server installation instructions](https://docs.docker.com/engine/install/#server) for details.

## Quantize Model
Create and Quantize a TorchScript model for the INT8 precision using the python environment found in the Intel® Optimized TorchServe Container. The below command will output `rn50_int8_jit.pt` that will be used in the next step.

```bash
docker run \
--rm -it -u root \
--entrypoint='' \
-v $PWD:/home/model-server \
intel/intel-optimized-pytorch:2.2.0-serving-cpu \
python quantize_model.py
```

> [!NOTE]
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
## Archive Model
The [Torchserve Model Archiver](https://github.com/pytorch/serve/blob/master/model-archiver/README.md) is a CLI tool found in the torchserve container as well as on [pypi](https://pypi.org/project/torch-model-archiver/). This process is very similar for the [TorchServe Workflow](https://github.com/pytorch/serve/tree/master/workflow-archiver).

Follow the instructions found in the link above depending on whether you are intending to archive a model or a workflow. Use the provided container rather than installing the archiver with the example command below:

```bash
docker run \
--rm -it -u root \
--entrypoint='' \
-v $PWD:/home/model-server \
intel/intel-optimized-pytorch:2.2.0-serving-cpu \
torch-model-archiver \
--model-name ipex-resnet50 \
--version 1.0 \
--serialized-file rn50_int8_jit.pt \
--handler image_classifier \
--export-path /home/model-server/model-store
```

> [!NOTE]
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
#### Advanced Model Archival
The `--handler` argument is an important component of serving as it controls the inference pipeline. Torchserve provides several default handlers [built into the application](https://pytorch.org/serve/default_handlers.html#torchserve-default-inference-handlers). that are often enough for most inference cases, but you may need to create a custom handler if your application's inference needs additional preprocessing, postprocessing or using other variables to derive a final output.

To create a custom handler, first inherit `BaseHandler` or another built-in handler and override any necessary functionality. Usually, you only need to override the preprocessing and postprocessing methods to achieve an application's inference needs.

```python
from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
"""
A custom model handler implementation.
"""
```

> [!TIP]
> For more examples of how to write a custom handler, see the [TorchServe documentation](https://github.com/pytorch/serve/blob/master/docs/custom_service.md).
Additionally, the `torch-model-archiver` allows you to pass additional parameters/files to tackle complex scenarios while archiving the package.

```txt
--requirements-file Path to requirements.txt file containing
a list of model specific python packages
to be installed by TorchServe for seamless
model serving.
--extra-files Pass comma separated path to extra dependency
files required for inference and can be
accessed in handler script.
--config-file Path to model-config yaml files that can
contain information like threshold values,
any parameter values need to be passed from
training to inference.
```

> [!TIP]
> For more use-case examples, see the [TorchServe documentation](https://github.com/pytorch/serve/tree/master/examples).
## Start Server
Start the TorchServe Server.

```bash
docker run \
-d --rm --name server \
-v $PWD/model-store:/home/model-server/model-store \
-v $PWD/wf-store:/home/model-server/wf-store \
--net=host \
intel/intel-optimized-pytorch:2.2.0-serving-cpu
```

> [!TIP]
> For more information about how to configure the TorchServe Server, see the [Intel AI Containers Documentation](https://github.com/intel/ai-containers/tree/main/pytorch/serving).
> [!NOTE]
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
Check the server logs to verify that the server has started correctly.

```bash
docker logs server
```

Register the model using the HTTP/REST API and verify it has been registered

```bash
curl -v -X POST "http://localhost:8081/models?url=ipex-resnet50.mar&initial_workers=1"
curl -v -X GET "http://localhost:8081/models"
```

Download a [test image](https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg) and make an inference request using the HTTP/REST API.

```bash
curl -v -X POST "http://localhost:8080/v2/models/ipex-resnet50/infer" \
-T kitten_small.jpg
```

Unregister the Model

```bash
curl -v -X DELETE "http://localhost:8081/models/ipex-resnet50"
```

## Stop Server
When finished with the example, stop the torchserve server with the following command:

```bash
docker container stop server
```

## Trademark Information
Intel, the Intel logo and Intel Xeon are trademarks of Intel Corporation or its subsidiaries.
* Other names and brands may be claimed as the property of others.

&copy;Intel Corporation
33 changes: 33 additions & 0 deletions examples/cpu/serving/torchserve/quantize_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import torch
import intel_extension_for_pytorch as ipex
import torchvision.models as models

# load the model
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
model = model.eval()

# define dummy input tensor to use for the model's forward call to record operations in the model for tracing
N, C, H, W = 1, 3, 224, 224
dummy_tensor = torch.randn(N, C, H, W)

from intel_extension_for_pytorch.quantization import prepare, convert

# ipex supports two quantization schemes: static and dynamic
# default static qconfig
qconfig = ipex.quantization.default_static_qconfig_mapping

# prepare and calibrate
model = prepare(model, qconfig, example_inputs=dummy_tensor, inplace=False)

n_iter = 100
for i in range(n_iter):
model(dummy_tensor)

# convert and deploy
model = convert(model)

with torch.no_grad():
model = torch.jit.trace(model, dummy_tensor)
model = torch.jit.freeze(model)

torch.jit.save(model, './rn50_int8_jit.pt')

0 comments on commit 1f6fe64

Please sign in to comment.