Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 34 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,25 +76,25 @@ pip install deepsparse

## 🔌 DeepSparse Server

The DeepSparse Server allows you to serve models and pipelines in deployment in CLI. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command:
The DeepSparse Server allows you to serve models and pipelines from the terminal. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command:

```bash
pip install deepsparse[server]
```

**⭐ Single Model ⭐**
### Single Model

Once installed, the following example CLI command is available for running inference with a single BERT model:

```bash
deepsparse.server \
--task question_answering \
--model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none"
--model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
```

To look up arguments run: `deepsparse.server --help`.

**⭐ Multiple Models ⭐**
### Multiple Models
To serve multiple models in your deployment you can easily build a `config.yaml`. In the example below, we define two BERT models in our configuration for the question answering task:

```yaml
Expand All @@ -104,7 +104,7 @@ models:
batch_size: 1
alias: question_answering/base
- task: question_answering
model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-aggressive_95
model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
batch_size: 1
alias: question_answering/pruned_quant
```
Expand All @@ -113,6 +113,9 @@ Finally, after your `config.yaml` file is built, run the server with the config
```bash
deepsparse.server --config_file config.yaml
```

[Getting Started with the DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) for more info.

## 📜 DeepSparse Benchmark

The benchmark tool is available on your CLI to run expressive model benchmarks on the DeepSparse Engine with minimal parameters.
Expand All @@ -124,27 +127,26 @@ deepsparse.benchmark [-h] [-b BATCH_SIZE] [-shapes INPUT_SHAPES]
[-ncores NUM_CORES] [-s {async,sync}] [-t TIME]
[-nstreams NUM_STREAMS] [-pin {none,core,numa}]
[-q] [-x EXPORT_PATH]
model_path
model_path

```

[Getting Started with CLI Benchmarking](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark_model) includes examples of select inference scenarios:
- Synchronous (Single-stream) Scenario
- Asynchronous (Multi-stream) Scenario
__ __

## 👩‍💻 NLP Inference | Question Answering

## 👩‍💻 NLP Inference Example

```python
from deepsparse.transformers import pipeline

# SparseZoo model stub or path to ONNX file
onnx_filepath="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98"
model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"

qa_pipeline = pipeline(
task="question-answering",
model_path=onnx_filepath,
num_cores=None, # uses all available CPU cores by default
model_path=model_path,
)

my_name = qa_pipeline(question="What's my name?", context="My name is Snorlax")
Expand All @@ -154,20 +156,19 @@ NLP Tutorials:
- [Getting Started with Hugging Face Transformers 🤗](https://github.com/neuralmagic/deepsparse/tree/main/examples/huggingface-transformers)

Tasks Supported:
- Text Classification (Sentiment Analysis)
- Question Answering
- Masked Language Modeling (MLM)

__ __
- [Token Classification: Named Entity Recognition](https://neuralmagic.com/use-cases/sparse-named-entity-recognition/)
- [Text Classification: Multi-Class](https://neuralmagic.com/use-cases/sparse-multi-class-text-classification/)
- [Text Classification: Binary](https://neuralmagic.com/use-cases/sparse-binary-text-classification/)
- [Text Classification: Sentiment Analysis](https://neuralmagic.com/use-cases/sparse-sentiment-analysis/)
- [Question Answering](https://neuralmagic.com/use-cases/sparse-question-answering/)

## 🦉 SparseZoo ONNX vs. Custom ONNX Models

DeepSparse can accept ONNX models from two sources:

1. `SparseZoo ONNX`: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML.](https://github.com/neuralmagic/sparseml)

2. `Custom ONNX`: Your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions.
- **SparseZoo ONNX**: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).

- **Custom ONNX**: your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions.

```bash
> wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Expand All @@ -188,15 +189,13 @@ inputs = generate_random_inputs(onnx_filepath, batch_size)
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)
```
Compatibility/Support Notes
Compatibility/Support Notes:
- ONNX version 1.5-1.7
- ONNX opset version 11+
- ONNX IR version has not been tested at this time

The [GitHub repository](https://github.com/neuralmagic/deepsparse) includes package APIs along with examples to quickly get started benchmarking and inferencing sparse models.

__ __

## Scheduling Single-Stream, Multi-Stream, and Elastic Inference

The DeepSparse Engine offers up to three types of inferences based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md).
Expand All @@ -216,7 +215,6 @@ PRO TIP: The most common use cases for the multi-stream scheduler are where para
3 ⚡ Elastic scheduling: requests execute in parallel, but not multiplexed on individual NUMA nodes.

Use Case: A workload that might benefit from the elastic scheduler is one in which multiple requests need to be handled simultaneously, but where performance is hindered when those requests have to share an L3 cache.
__ __

## 🧰 CPU Hardware Support

Expand All @@ -233,34 +231,29 @@ Here is a table detailing specific support for some algorithms over different mi

## Resources

<table>
<tr><th> Documentation </th><th> &emsp;&emsp;&emsp;Versions </th><th> Info </th></tr>
<tr><td>

[DeepSparse](https://docs.neuralmagic.com/deepsparse/)

[SparseML](https://docs.neuralmagic.com/sparseml/)

[SparseZoo](https://docs.neuralmagic.com/sparsezoo/)
#### Libraries
- [DeepSparse](https://docs.neuralmagic.com/deepsparse/)

[Sparsify](https://docs.neuralmagic.com/sparsify/)
- [SparseML](https://docs.neuralmagic.com/sparseml/)

</td><td>
- [SparseZoo](https://docs.neuralmagic.com/sparsezoo/)

&emsp;stable : : [DeepSparse](https://pypi.org/project/deepsparse)
- [Sparsify](https://docs.neuralmagic.com/sparsify/)

&emsp;nightly (dev) : : [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/)

&emsp;releases : : [GitHub](https://github.com/neuralmagic/deepsparse/releases)
#### Versions
- [DeepSparse](https://pypi.org/project/deepsparse) | stable

</td><td>
- [DeepSparse-Nightly](https://pypi.org/project/deepsparse-nightly/) | nightly (dev)

[Blog](https://www.neuralmagic.com/blog/)
- [GitHub](https://github.com/neuralmagic/deepsparse/releases) | releases

[Resources](https://www.neuralmagic.com/resources/)
#### Info

</td></tr> </table>
- [Blog](https://www.neuralmagic.com/blog/)

- [Resources](https://www.neuralmagic.com/resources/)


## Community
Expand All @@ -270,7 +263,7 @@ Here is a table detailing specific support for some algorithms over different mi

Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)

For user help or questions about DeepSparse, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.
For user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.

For more general questions about Neural Magic, complete this [form.](http://neuralmagic.com/contact/)

Expand Down