Qdrant refactor #20

prrao87 · 2023-04-24T14:15:03Z

Purpose of this PR

This PR refactors the Qdrant code base to offer better performance and a structure that allows the user to decide the course of operation, depending on the available hardware and Python version.

Running the vectorization via ONNX-optimized SentenceBert showed a pretty good performance jump, so the code is refactored to allow for two modes of running:
- User can directly use sbert model (without any optimizations) -- this is the slower option on CPU
- User can opt to use a quantized ONNX model to vectorize the text prior to indexing -- this is the much faster option (with minimal loss in accuracy as per the docs)
As of early 2023, ONNX only works on Python 3.10, so this is the version recommended until they are able to support Python 3.11.
The docs are updated to include instructions for both methods
Fix bug: Include mean pooling for quantized models
- Using a transformers pipeline with mean pooling prior to optimization allows us to generate similar quality embeddings as the original
- The model is still the same size, but the similarities it predicts are now much more similar to the un-optimized model

prrao87 · 2023-04-24T14:55:53Z

Notes on ONNX performance

It looks like ONNX does utilize all available CPU cores when processing the text and generating the embeddings (the image below was generated from an AWS EC2 T2 ubuntu instance with a single 4-core CPU).

On average, the entire wine reviews dataset of 129,971 reviews is vectorized and ingested into Qdrant in 34 minutes via the quantized ONNX model, as opposed to more than 1 hour for the regular sbert model downloaded from the sentence-transformers repo. The quantized ONNX model is also ~33% smaller in size from the original model.

sbert model: Processes roughly 51 items/sec
Quantized onnxruntime model: Processes roughly 92 items/sec

This amounts to a roughly 1.8x reduction in indexing time, with a ~26% smaller (quantized) model that loads and processes results faster. To verify that the embeddings from the quantized models are of similar quality, some example cosine similarities are shown below.

Example results:

The following results are for the sentence-transformers/multi-qa-MiniLM-L6-cos-v1 model that was built for semantic similarity tasks.

Vanilla model

---
Loading vanilla sentence transformer model
---
Similarity between 'I'm very happy' and 'I am so glad': [0.74601071]
Similarity between 'I'm very happy' and 'I'm so sad': [0.6456476]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09541589]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.27607652]

Quantized ONNX model

---
Loading quantized ONNX model
---
The ONNX file model_optimized_quantized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
Similarity between 'I'm very happy' and 'I am so glad': [0.74153285]
Similarity between 'I'm very happy' and 'I'm so sad': [0.65299551]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09312761]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.26112114]

As can be seen, the similarity scores are very close to the vanilla model, but the model is ~26% smaller and we are able to process the sentences much faster on the same CPU.

* Using a transformers pipeline with mean pooling prior to optimization allows us to generate similar quality embeddings as the original * The model is still the same size, but the similarities it predicts are now much more similar to the un-optimized model

prrao87 added 7 commits April 24, 2023 10:08

Reorg docker files

5c14517

Rename requirements files for docker

fb3fb78

Reorg onnx optimization script into its own directory

26b96f8

Rename bulk indexing scripts

b0b960c

Update data directory for onnx models

750e29c

Update API as per new structure

50c5b7d

Update README for clarity

b61d242

prrao87 added the enhancement New feature or request label Apr 24, 2023

prrao87 and others added 3 commits April 24, 2023 10:17

Fix headers in docs

23a5b81

Fix broken paths for onnx

187a3ba

Fix code style issues with Black

829e499

prrao87 and others added 3 commits April 24, 2023 15:00

Run isort

db183bd

Fix code style issues with Black

4a30334

prrao87 merged commit 34d988a into main Apr 24, 2023

prrao87 deleted the qdrant branch July 14, 2023 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant refactor #20

Qdrant refactor #20

prrao87 commented Apr 24, 2023 •

edited

Loading

prrao87 commented Apr 24, 2023 •

edited

Loading

Qdrant refactor #20

Qdrant refactor #20

Conversation

prrao87 commented Apr 24, 2023 • edited Loading

Purpose of this PR

prrao87 commented Apr 24, 2023 • edited Loading

Notes on ONNX performance

Example results:

Vanilla model

Quantized ONNX model

prrao87 commented Apr 24, 2023 •

edited

Loading

prrao87 commented Apr 24, 2023 •

edited

Loading