From 7e861e03f4112443376f47e13ffea0321215cfd7 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Fri, 1 Apr 2022 12:39:40 -0400 Subject: [PATCH 01/14] altered emoji and title font sizes to match other readmes --- src/deepsparse/benchmark_model/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/deepsparse/benchmark_model/README.md b/src/deepsparse/benchmark_model/README.md index 789a23255e..f721be7180 100644 --- a/src/deepsparse/benchmark_model/README.md +++ b/src/deepsparse/benchmark_model/README.md @@ -14,11 +14,11 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Benchmarking ONNX Models ๐Ÿ“œ +## ๐Ÿ“œ Benchmarking ONNX Models `deepsparse.benchmark` is a command-line (CLI) tool for benchmarking the DeepSparse Engine with ONNX models. The tool will parse the arguments, download/compile the network into the engine, generate input tensors, and execute the model depending on the chosen scenario. By default, it will choose a multi-stream or asynchronous mode to optimize for throughput. -## Quickstart +### Quickstart After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset where the model path is the minimum input required to get started, run: @@ -26,7 +26,7 @@ After `pip install deepsparse`, the benchmark tool is available on your CLI. For deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none ``` __ __ -## Usage +### Usage In most cases, good performance will be found in the default options so it can be as simple as running the command with a SparseZoo model stub or your local ONNX model. However, if you prefer to customize benchmarking for your personal use case, you can run `deepsparse.benchmark -h` or with `--help` to view your usage options: @@ -100,7 +100,7 @@ Output of the JSON file: ![alt text](./img/json_output.png) -## Sample CLI Argument Configurations +### Sample CLI Argument Configurations To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput using 8 streams of requests: From 487493523594992c918aec1383b872f792984523 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Fri, 1 Apr 2022 12:41:47 -0400 Subject: [PATCH 02/14] altered emoji and title font sizes to match other readmes --- src/deepsparse/benchmark_model/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/deepsparse/benchmark_model/README.md b/src/deepsparse/benchmark_model/README.md index f721be7180..c67133744e 100644 --- a/src/deepsparse/benchmark_model/README.md +++ b/src/deepsparse/benchmark_model/README.md @@ -100,7 +100,7 @@ Output of the JSON file: ![alt text](./img/json_output.png) -### Sample CLI Argument Configurations +#### Sample CLI Argument Configurations To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput using 8 streams of requests: @@ -114,21 +114,21 @@ To run a sparse quantized INT8 6-layer BERT at batch size 1 for latency: deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant_6layers-aggressive_96 --batch_size 1 --scenario sync ``` __ __ -## Inference Scenarios โšกโšก +### โšก Inference Scenarios -### Synchronous (Single-stream) Scenario +#### Synchronous (Single-stream) Scenario Set by the `--scenario sync` argument, the goal metric is latency per batch (ms/batch). This scenario submits a single inference request at a time to the engine, recording the time taken for a request to return an output. This mimics an edge deployment scenario. The latency value reported is the mean of all latencies recorded during the execution period for the given batch size. -### Asynchronous (Multi-stream) Scenario +#### Asynchronous (Multi-stream) Scenario Set by the `--scenario async` argument, the goal metric is throughput in items per second (i/s). This scenario submits `--num_streams` concurrent inference requests to the engine, recording the time taken for each request to return an output. This mimics a model server or bulk batch deployment scenario. The throughput value reported comes from measuring the number of finished inferences within the execution time and the batch size. -### Example Benchmarking Output of Synchronous vs. Asynchronous +#### Example Benchmarking Output of Synchronous vs. Asynchronous **BERT 3-layer FP32 Sparse Throughput** From 988f768c6c60156107ae471d3eee62873dc1fd86 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 13:33:45 -0400 Subject: [PATCH 03/14] fix yaml code block indentation --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b719fd6094..91696a09cf 100644 --- a/README.md +++ b/README.md @@ -97,15 +97,17 @@ To look up arguments run: `deepsparse.server --help`. **โญ Multiple Models โญ** To serve multiple models in your deployment you can easily build a `config.yaml`. In the example below, we define two BERT models in our configuration for the question answering task: - models: +```yaml +models: - task: question_answering model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none batch_size: 1 - alias: question_answering/dense + alias: question_answering/base - task: question_answering model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 batch_size: 1 - alias: question_answering/sparse_quantized + alias: question_answering/pruned_quant +``` Finally, after your `config.yaml` file is built, run the server with the config file path as an argument: ```bash From 15cebcacaa66a49b0aa7e88d25f3b037c67c18ed Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 13:50:59 -0400 Subject: [PATCH 04/14] aligned indentation 2nd time --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 91696a09cf..54e8bfa51f 100644 --- a/README.md +++ b/README.md @@ -100,13 +100,13 @@ To serve multiple models in your deployment you can easily build a `config.yaml` ```yaml models: - task: question_answering - model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none - batch_size: 1 - alias: question_answering/base + model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none + batch_size: 1 + alias: question_answering/base - task: question_answering - model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 - batch_size: 1 - alias: question_answering/pruned_quant + model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 + batch_size: 1 + alias: question_answering/pruned_quant ``` Finally, after your `config.yaml` file is built, run the server with the config file path as an argument: From a6dbc066a752e3efa977df576e4077ae0b64a7ac Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 14:01:18 -0400 Subject: [PATCH 05/14] fix yaml identation --- src/deepsparse/server/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/deepsparse/server/README.md b/src/deepsparse/server/README.md index cd7dae595a..5904eef558 100644 --- a/src/deepsparse/server/README.md +++ b/src/deepsparse/server/README.md @@ -89,16 +89,16 @@ __ __ To serve multiple models you can build a `config.yaml` file. In the sample YAML file below, we are defining two BERT models to be served by the `deepsparse.server` for the **question answering** task: -``` +```yaml models: - task: question_answering - model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none - batch_size: 1 - alias: question_answering/base + model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none + batch_size: 1 + alias: question_answering/base - task: question_answering - model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 - batch_size: 1 - alias: question_answering/pruned_quant + model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 + batch_size: 1 + alias: question_answering/pruned_quant ``` You can now run the server with the config file path passed in the `--config_file` argument: From dd94a8eaee3256a4cc5da4241b17ed9b8a4f1155 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 18:11:51 -0400 Subject: [PATCH 06/14] edited tables to sync with docs, added urls for new readmes, and edited grammar --- README.md | 62 ++++++++++++++++++++++++------------------------------- 1 file changed, 27 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index 54e8bfa51f..5ef3f5b4a5 100644 --- a/README.md +++ b/README.md @@ -76,25 +76,25 @@ pip install deepsparse ## ๐Ÿ”Œ DeepSparse Server -The DeepSparse Server allows you to serve models and pipelines in deployment in CLI. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: +The DeepSparse Server allows you to serve models and pipelines in CLI. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: ```bash pip install deepsparse[server] ``` -**โญ Single Model โญ** +### Single Model Once installed, the following example CLI command is available for running inference with a single BERT model: ```bash deepsparse.server \ --task question_answering \ - --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none" + --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95" ``` To look up arguments run: `deepsparse.server --help`. -**โญ Multiple Models โญ** +### Multiple Models To serve multiple models in your deployment you can easily build a `config.yaml`. In the example below, we define two BERT models in our configuration for the question answering task: ```yaml @@ -113,6 +113,9 @@ Finally, after your `config.yaml` file is built, run the server with the config ```bash deepsparse.server --config_file config.yaml ``` + +[Getting Started with the DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) for more info. + ## ๐Ÿ“œ DeepSparse Benchmark The benchmark tool is available on your CLI to run expressive model benchmarks on the DeepSparse Engine with minimal parameters. @@ -124,27 +127,26 @@ deepsparse.benchmark [-h] [-b BATCH_SIZE] [-shapes INPUT_SHAPES] [-ncores NUM_CORES] [-s {async,sync}] [-t TIME] [-nstreams NUM_STREAMS] [-pin {none,core,numa}] [-q] [-x EXPORT_PATH] - model_path + model_path ``` [Getting Started with CLI Benchmarking](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark_model) includes examples of select inference scenarios: - Synchronous (Single-stream) Scenario - Asynchronous (Multi-stream) Scenario -__ __ -## ๐Ÿ‘ฉโ€๐Ÿ’ป NLP Inference | Question Answering + +## ๐Ÿ‘ฉโ€๐Ÿ’ป NLP Inference Example ```python from deepsparse.transformers import pipeline # SparseZoo model stub or path to ONNX file -onnx_filepath="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98" +model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98" qa_pipeline = pipeline( task="question-answering", - model_path=onnx_filepath, - num_cores=None, # uses all available CPU cores by default + model_path=model_path, ) my_name = qa_pipeline(question="What's my name?", context="My name is Snorlax") @@ -158,16 +160,13 @@ Tasks Supported: - Question Answering - Masked Language Modeling (MLM) -__ __ - ## ๐Ÿฆ‰ SparseZoo ONNX vs. Custom ONNX Models DeepSparse can accept ONNX models from two sources: -1. `SparseZoo ONNX`: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML.](https://github.com/neuralmagic/sparseml) - -2. `Custom ONNX`: Your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions. +- **SparseZoo ONNX**: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml). +- **Custom ONNX**: your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions. ```bash > wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx @@ -188,15 +187,13 @@ inputs = generate_random_inputs(onnx_filepath, batch_size) engine = compile_model(onnx_filepath, batch_size) outputs = engine.run(inputs) ``` -Compatibility/Support Notes +Compatibility/Support Notes: - ONNX version 1.5-1.7 - ONNX opset version 11+ - ONNX IR version has not been tested at this time The [GitHub repository](https://github.com/neuralmagic/deepsparse) includes package APIs along with examples to quickly get started benchmarking and inferencing sparse models. -__ __ - ## Scheduling Single-Stream, Multi-Stream, and Elastic Inference The DeepSparse Engine offers up to three types of inferences based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md). @@ -233,34 +230,29 @@ Here is a table detailing specific support for some algorithms over different mi ## Resources - - -
Documentation    Versions Info
- -[DeepSparse](https://docs.neuralmagic.com/deepsparse/) - -[SparseML](https://docs.neuralmagic.com/sparseml/) -[SparseZoo](https://docs.neuralmagic.com/sparsezoo/) +### Libraries +- [DeepSparse](https://docs.neuralmagic.com/deepsparse/) -[Sparsify](https://docs.neuralmagic.com/sparsify/) +- [SparseML](https://docs.neuralmagic.com/sparseml/) - +- [SparseZoo](https://docs.neuralmagic.com/sparsezoo/) - stable : : [DeepSparse](https://pypi.org/project/deepsparse) +- [Sparsify](https://docs.neuralmagic.com/sparsify/) - nightly (dev) : : [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) - releases : : [GitHub](https://github.com/neuralmagic/deepsparse/releases) +### Versions +- stable | [DeepSparse](https://pypi.org/project/deepsparse) - +- nightly (dev) | [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) -[Blog](https://www.neuralmagic.com/blog/) +- releases | [GitHub](https://github.com/neuralmagic/deepsparse/releases) -[Resources](https://www.neuralmagic.com/resources/) +### Info -
+- [Blog](https://www.neuralmagic.com/blog/) +- [Resources](https://www.neuralmagic.com/resources/) ## Community From 8b271238f45c529a8d60dc889df13e9246cafb12 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 18:14:00 -0400 Subject: [PATCH 07/14] removed border --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 5ef3f5b4a5..7ce7028c70 100644 --- a/README.md +++ b/README.md @@ -213,7 +213,6 @@ PRO TIP: The most common use cases for the multi-stream scheduler are where para 3 โšก Elastic scheduling: requests execute in parallel, but not multiplexed on individual NUMA nodes. Use Case: A workload that might benefit from the elastic scheduler is one in which multiple requests need to be handled simultaneously, but where performance is hindered when those requests have to share an L3 cache. -__ __ ## ๐Ÿงฐ CPU Hardware Support From 5b6521367079663b29d5a30935d6e28901eccd54 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Mon, 4 Apr 2022 18:17:14 -0400 Subject: [PATCH 08/14] fixed resources section --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 7ce7028c70..7a2a1a9131 100644 --- a/README.md +++ b/README.md @@ -230,7 +230,7 @@ Here is a table detailing specific support for some algorithms over different mi ## Resources -### Libraries +#### Libraries - [DeepSparse](https://docs.neuralmagic.com/deepsparse/) - [SparseML](https://docs.neuralmagic.com/sparseml/) @@ -240,14 +240,14 @@ Here is a table detailing specific support for some algorithms over different mi - [Sparsify](https://docs.neuralmagic.com/sparsify/) -### Versions -- stable | [DeepSparse](https://pypi.org/project/deepsparse) +#### Versions +- [DeepSparse](https://pypi.org/project/deepsparse) | stable -- nightly (dev) | [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) +- [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) | nightly (dev) -- releases | [GitHub](https://github.com/neuralmagic/deepsparse/releases) +- [GitHub](https://github.com/neuralmagic/deepsparse/releases) | releases -### Info +#### Info - [Blog](https://www.neuralmagic.com/blog/) From 11c649f93c712b31482c3ca57ef7a8fd0cc5d6d9 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Wed, 6 Apr 2022 11:02:19 -0400 Subject: [PATCH 09/14] altered urls to tasks in the nlp inference section --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7a2a1a9131..e58b01a6c7 100644 --- a/README.md +++ b/README.md @@ -156,9 +156,11 @@ NLP Tutorials: - [Getting Started with Hugging Face Transformers ๐Ÿค—](https://github.com/neuralmagic/deepsparse/tree/main/examples/huggingface-transformers) Tasks Supported: -- Text Classification (Sentiment Analysis) -- Question Answering -- Masked Language Modeling (MLM) +- [Named Entity Recognition](https://neuralmagic.com/use-cases/sparse-named-entity-recognition/) +- [Multi-Class Classification](https://neuralmagic.com/use-cases/sparse-multi-class-text-classification/) +- [Binary Text Classification](https://neuralmagic.com/use-cases/sparse-binary-text-classification/) +- [Text Classification (Sentiment Analysis)](https://neuralmagic.com/use-cases/sparse-sentiment-analysis/) +- [Question Answering](https://neuralmagic.com/use-cases/sparse-question-answering/) ## ๐Ÿฆ‰ SparseZoo ONNX vs. Custom ONNX Models From 6e0b2eff93ff27e6594f77011180c6b9e9385b47 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Wed, 6 Apr 2022 15:02:31 -0400 Subject: [PATCH 10/14] edited grammar and URL issues --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index e58b01a6c7..a9e322ee4c 100644 --- a/README.md +++ b/README.md @@ -156,10 +156,10 @@ NLP Tutorials: - [Getting Started with Hugging Face Transformers ๐Ÿค—](https://github.com/neuralmagic/deepsparse/tree/main/examples/huggingface-transformers) Tasks Supported: -- [Named Entity Recognition](https://neuralmagic.com/use-cases/sparse-named-entity-recognition/) -- [Multi-Class Classification](https://neuralmagic.com/use-cases/sparse-multi-class-text-classification/) -- [Binary Text Classification](https://neuralmagic.com/use-cases/sparse-binary-text-classification/) -- [Text Classification (Sentiment Analysis)](https://neuralmagic.com/use-cases/sparse-sentiment-analysis/) +- [Token Classification: Named Entity Recognition](https://neuralmagic.com/use-cases/sparse-named-entity-recognition/) +- [Text Classification: Multi-Class](https://neuralmagic.com/use-cases/sparse-multi-class-text-classification/) +- [Text Classification: Binary](https://neuralmagic.com/use-cases/sparse-binary-text-classification/) +- [Text Classification: Sentiment Analysis](https://neuralmagic.com/use-cases/sparse-sentiment-analysis/) - [Question Answering](https://neuralmagic.com/use-cases/sparse-question-answering/) ## ๐Ÿฆ‰ SparseZoo ONNX vs. Custom ONNX Models @@ -263,7 +263,7 @@ Here is a table detailing specific support for some algorithms over different mi Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md) -For user help or questions about DeepSparse, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community. +For user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community. For more general questions about Neural Magic, complete this [form.](http://neuralmagic.com/contact/) From 11d66257d76b394e00e8f8eb006cf768bac8c2f3 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Wed, 6 Apr 2022 16:38:06 -0400 Subject: [PATCH 11/14] edited grammar --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a9e322ee4c..21ed21bde5 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ pip install deepsparse ## ๐Ÿ”Œ DeepSparse Server -The DeepSparse Server allows you to serve models and pipelines in CLI. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: +The DeepSparse Server allows you to serve models and pipelines from terminal. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: ```bash pip install deepsparse[server] From 761f45de168120f3840e9616c3f9c8f030d14c5f Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Thu, 7 Apr 2022 10:54:17 -0400 Subject: [PATCH 12/14] grammar --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 21ed21bde5..101b33bb3f 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ pip install deepsparse ## ๐Ÿ”Œ DeepSparse Server -The DeepSparse Server allows you to serve models and pipelines from terminal. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: +The DeepSparse Server allows you to serve models and pipelines from the terminal. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command: ```bash pip install deepsparse[server] From 2c411afa04d6ae3264777889eedace3767688ccb Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Thu, 7 Apr 2022 14:53:43 -0400 Subject: [PATCH 13/14] updating squad model stubs --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 101b33bb3f..2e97850a80 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ Once installed, the following example CLI command is available for running infer ```bash deepsparse.server \ --task question_answering \ - --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95" + --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni" ``` To look up arguments run: `deepsparse.server --help`. @@ -142,7 +142,7 @@ deepsparse.benchmark [-h] [-b BATCH_SIZE] [-shapes INPUT_SHAPES] from deepsparse.transformers import pipeline # SparseZoo model stub or path to ONNX file -model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98" +model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni" qa_pipeline = pipeline( task="question-answering", From 017f17637fdee5d92e7191f840349bd4d13adfc9 Mon Sep 17 00:00:00 2001 From: Ricky Costa Date: Thu, 7 Apr 2022 14:56:18 -0400 Subject: [PATCH 14/14] added more changes to squad stubs --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 18ff5da320..91e4d21b22 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,7 @@ models: batch_size: 1 alias: question_answering/base - task: question_answering - model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95 + model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni batch_size: 1 alias: question_answering/pruned_quant ```