# Hardware used AWS c6i.8xlarge 32vCPUs and 64 GB Memory - (in RHOAI)

#### Installing packages and activating the license

#### Supported hardware

In [43]:
# https://docs.neuralmagic.com/user-guides/deepsparse-engine/hardware-support
# Does not support mac (refer above link for supported harware)


```
OSError: Native Mac is currently unsupported for DeepSparse. Please run on a Linux system or within a Linux container on Mac. More info can be found in our docs here: https://docs.neuralmagic.com/deepsparse/source/hardware.html
```

In [14]:
#!pip install deepsparse-ent
# !pip install deepsparse[transformers] # This is needed for deepsparse pipeline

In [35]:
#!deepsparse.license license.txt

In [8]:
!deepsparse.validate_license

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0 ENTERPRISE [90-DAY TRIAL LICENSE] | 128 Cores | Exp: 2024-04-15, 00:00:00 UTC | (72972e43) (release) (optimized) (system=avx512_vnni, binary=avx512)


# Deployment APIs

### Engine

In [9]:
from deepsparse import Engine
from deepsparse.utils import generate_random_inputs, model_to_path

In [10]:
# download onnx, compile
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"
batch_size = 1
compiled_model = Engine(model=zoo_stub, batch_size=batch_size)

Downloading (…)ed/deployment.tar.gz:   0%|          | 0.00/29.8M [00:00<?, ?B/s]

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0 ENTERPRISE [90-DAY TRIAL LICENSE] | 128 Cores | Exp: 2024-04-15, 00:00:00 UTC | (72972e43) (release) (optimized) (system=avx512_vnni, binary=avx512)


In [11]:
# run inference (input is raw numpy tensors, output is raw scores)
inputs = generate_random_inputs(model_to_path(zoo_stub), batch_size)
output = compiled_model(inputs)
print(output)

2024-01-16 15:43:25 deepsparse.utils.onnx INFO     Generating input 'input_ids', type = int64, shape = [1, 128]
2024-01-16 15:43:25 deepsparse.utils.onnx INFO     Generating input 'attention_mask', type = int64, shape = [1, 128]
2024-01-16 15:43:25 deepsparse.utils.onnx INFO     Generating input 'token_type_ids', type = int64, shape = [1, 128]


[array([[-0.3380675 ,  0.09602544]], dtype=float32)]


### Pipelines

In [15]:
from deepsparse import Pipeline

# download onnx, set up pipeline
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"  
sentiment_analysis_pipeline = Pipeline.create(
  task="sentiment-analysis",    # name of the task
  model_path=zoo_stub,          # zoo stub or path to local onnx file
)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


In [16]:
# run inference (input is a sentence, output is the prediction)
prediction = sentiment_analysis_pipeline("I love using DeepSparse Pipelines")
print(prediction)

labels=['positive'] scores=[0.9954759478569031]


### Server

In [21]:
# Commenting out the code as it is to be run from terminal
# !deepsparse.server \
#   --task sentiment-analysis \
#   --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none

#### sending a request

In [20]:
# Commenting out the code to be run from terminal
# import requests

# def test_request():
#     url = "http://localhost:5543/v2/models/sentiment_analysis/infer" # Server's port default to 5543
#     obj = {"sequences": "Snorlax loves my Tesla!"}

#     response = requests.post(url, json=obj)
#     print(response.text)
#     # {"labels":["positive"],"scores":[0.9965094327926636]}


# if __name__ == "__main__":
#     test_request()

### ONNX

In [26]:
!wget https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-7.onnx

--2024-01-16 16:20:35--  https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


302 Found
Location: https://media.githubusercontent.com/media/onnx/models/main/validated/vision/classification/mobilenet/model/mobilenetv2-7.onnx [following]
--2024-01-16 16:20:36--  https://media.githubusercontent.com/media/onnx/models/main/validated/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14246826 (14M) [application/octet-stream]
Saving to: ‘mobilenetv2-7.onnx’


2024-01-16 16:20:36 (120 MB/s) - ‘mobilenetv2-7.onnx’ saved [14246826/14246826]



In [27]:
from deepsparse import Engine
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16

In [32]:
# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)
print(inputs[0].shape)

2024-01-16 16:22:21 deepsparse.utils.onnx INFO     Generating input 'data', type = float32, shape = [16, 3, 224, 224]


(16, 3, 224, 224)


In [30]:
# Compile and run
compiled_model = Engine(model=onnx_filepath, batch_size=batch_size)
outputs = compiled_model(inputs)
print(outputs[0].shape)

(16, 1000)


## Product usage analytics

In [36]:
!export NM_DISABLE_ANALYTICS=True # Disabling 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
