## Run Inference

First, compile the model. Don't worry ahout the warning about disabiling the optimized runtime, this is an auxilary engine used to handle any odd input shapes we get.

In [15]:
from deepsparse.sentence_transformers import DeepSparseSentenceTransformer
ds_model = DeepSparseSentenceTransformer("zeroshot/bge-base-en-v1.5-quant")



Run inference by calling encode.

In [30]:
# Our sentences we like to encode
sentences = [
    'This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.'
]

# Sentences are encoded by calling model.encode()
embeddings = ds_model.encode(
    sentences,
    normalize_embeddings=True
)

# Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding Shape:", embedding.shape)
    print("")

1


Batches:   0%|          | 0/3 [00:00<?, ?it/s]

Sentence: This framework generates embeddings for each input sentence
Embedding Shape: (768,)

Sentence: Sentences are passed as a list of string.
Embedding Shape: (768,)

Sentence: The quick brown fox jumps over the lazy dog.
Embedding Shape: (768,)



In [31]:
import numpy as np
np.linalg.norm(embedding)

1.0

## Benchmark Performance

DeepSparse offers superior performance relative to the baseline of sentence-transformers.

In [7]:
import time

def benchmark_model(model, sentences):
    # Benchmark the encoding time for a model with a given list of sentences.
    start_time = time.time()
    _ = model.encode(sentences)
    elapsed_time = time.time() - start_time
    return elapsed_time

In [6]:
import sentence_transformers
st_model = sentence_transformers.SentenceTransformer("BAAI/bge-base-en-v1.5")

Downloading (…)db36e/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)88b99db36e/README.md:   0%|          | 0.00/90.2k [00:00<?, ?B/s]

Downloading (…)b99db36e/config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)db36e/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

Downloading (…)88b99db36e/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)99db36e/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [18]:
from deepsparse.sentence_transformers.benchmark_encoding import generate_random_sentence

num_sentences = 100
sentence_length = 700
sentences = [generate_random_sentence(sentence_length) for _ in range(num_sentences)]

In [9]:
st_time = benchmark_model(st_model, sentences)
print(f"sentence_transformer time: {st_time}")

sentence_transformer time: 25.30896258354187


In [19]:
ds_time = benchmark_model(ds_model, sentences)
print(f"sentence_transformer time: {ds_time}")

1


Batches:   0%|          | 0/100 [00:00<?, ?it/s]

sentence_transformer time: 6.328191041946411


## Run Evaluation

We provided a script showing how to run an MTEB evaluation.

In [1]:
!mteb --available_tasks

INFO:mteb.cmd:Running with parameters: Namespace(model=None, task_types=None, task_categories=None, tasks=None, task_langs=None, device=None, batch_size=32, seed=42, output_folder='results', verbosity=2, eval_splits=None, k=None, n_experiments=None, samples_per_label=None, corpus_chunk_size=None, available_tasks=True)
[30m───────────────────────────────── [0m[1mMTEB tasks [0m[30m ──────────────────────────────────[0m
[1mClassification[0m
    - AmazonCounterfactualClassification, [3;90ms2s[0m, [3;31mmultilingual [0m[1;3;31m4[0m[3;31m [0m[3;31m/[0m[3;31m [0m[1;3;31m4[0m[3;31m langs[0m
    - AmazonPolarityClassification, [3;90mp2p[0m
    - AmazonReviewsClassification, [3;90ms2s[0m, [3;31mmultilingual [0m[1;3;31m6[0m[3;31m [0m[3;31m/[0m[3;31m [0m[1;3;31m6[0m[3;31m langs[0m
    - AngryTweetsClassification, [3;90ms2s[0m
    - Banking77Classification, [3;90ms2s[0m
    - DalajClassification, [3;90ms2s[0m
    - DanishPoliticalCommentsClassificatio

In [5]:
!python3 run_eval.py \
    --model-name zeroshot/bge-base-en-v1.5-quant \
    --task-names STS12 STS13 \
    --output-dir results/bge-base-quant

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231027 COMMUNITY | (2083fc4e) (release) (optimized) (system=avx512_vnni, binary=avx512)
[30m─────────────────────────────── [0m[1mSelected tasks [0m[30m ────────────────────────────────[0m
[1mSTS[0m
    - STS12, [3;90ms2s[0m
    - STS13, [3;90ms2s[0m


1
Batches: 100%|██████████████████████████████| 3108/3108 [01:28<00:00, 35.11it/s]
1
Batches: 100%|██████████████████████████████| 3108/3108 [01:11<00:00, 43.57it/s]
1
Batches: 100%|██████████████████████████████| 1500/1500 [00:30<00:00, 49.33it/s]
1
Batches: 100%|██████████████████████████████| 1500/1500 [00:41<00:00, 35.85it/s]
{'STS12': {'mteb_version': '1.1.1', 'dataset_revision': 'a0d554a64d88156834ff5ae9920b964011b16384', 'mteb_dataset_name': 'STS12', 'test': {'cos_sim': {'pearson': 0.8605059236544976, 'spearman': 0.777379620275434}, 'manhattan': {'pearson': 0.8303339435275494, 'spearman': 0.7809286810596651}, 'euclidean': {'pearson': 0.830478943794