pgvector improvements #98

ankane · 2024-01-31T02:11:29Z

Hi, thanks for creating this framework! This PR makes a number of improvements to pgvector performance.

Separates uploading and indexing
Removes an extra query from each search
Sets parameters for the Postgres server
Updates pgvector to 0.6.0

ankane · 2024-01-31T02:14:00Z

engine/clients/pgvector/upload.py

+            raise IncompatibilityError(f"Unsupported distance metric: {distance}")
+
+        cls.conn.execute("SET max_parallel_workers = 128")
+        cls.conn.execute("SET max_parallel_maintenance_workers = 128")


This will use all available cores to build the index (let me know if it should do something different)

If you want to utilize all cores, would not it be better to get the number of cores from OS rather than to hardcode it to 128?

Setting a high upper limit should be simpler than trying to get the exact number of cores from the server (and won't make a difference in the number of cores used).

ankane · 2024-01-31T02:15:29Z

engine/servers/pgvector-single-node/docker-compose.yaml

+    # maintenance_work_mem should be ~65%
+    command: postgres -c shared_buffers=2GB -c maintenance_work_mem=5GB -c max_connections=200
+    # shm_size should be shared_buffers + maintenance_work_mem
+    shm_size: 7g


These settings should be updated based on the hardware used to run the benchmark

Edit: Updated the parameters for 25 GB of memory in a follow-up commit

ankane · 2024-01-31T02:17:17Z

experiments/configurations/pgvector-single-node.json

        "search_params": [
-          { "parallel": 1, "search_params": { "hnsw_ef": 128 } }
+          { "parallel": 8, "search_params": { "hnsw_ef": 128 } }


Updated the parallel options for pgvector-default to be the same as qdrant-default in qdrant-single-node.json

It's not used for running the experiments. But okay :)

ankane · 2024-01-31T02:18:45Z

engine/servers/pgvector-single-node/docker-compose.yaml

@@ -3,13 +3,17 @@ version: '3.7'
 services:
  pgvector:
    container_name: pgvector
-    image: ankane/pgvector:v0.5.1
+    image: pgvector/pgvector:pg16


New Docker image as of v0.6.0

Edit: Updated to use the versioned tag 0.6.0-pg16

ankane · 2024-01-31T02:21:00Z

engine/clients/pgvector/upload.py

+        with cls.cur.copy(
+            "COPY items (id, embedding) FROM STDIN WITH (FORMAT BINARY)"
+        ) as copy:
+            copy.set_types(["integer", "vector"])


Binary format requires set_types in psycopg 3

ankane · 2024-01-31T02:22:30Z

engine/clients/pgvector/config.py

@@ -7,6 +7,7 @@
 def get_db_config(host, connection_params):
    return {
        "host": host or "localhost",
+        "port": PGVECTOR_PORT,


PGVECTOR_PORT was previously unused

joein · 2024-02-04T10:47:29Z

engine/clients/pgvector/upload.py

+        cls.conn.execute(
+            f"CREATE INDEX ON items USING hnsw (embedding {hnsw_distance_type}) WITH (m = {cls.upload_params['hnsw_config']['m']}, ef_construction = {cls.upload_params['hnsw_config']['ef_construct']})"
+        )


Hm, might be a bit unfair to the other competitors, which also support loading in a "split-mode", however everyone is welcome to come and improve the setup if they think it is not optimised 🤷‍♂️

joein · 2024-02-04T10:50:08Z

@KShivendu could you make a couple of test runs to make sure that everything works fine?

KShivendu

It worked as expected. Thanks for the contribution!

Please make a few small changes as I suggested and I'd be happy to approve :)

KShivendu · 2024-03-07T13:02:52Z

engine/servers/pgvector-single-node/docker-compose.yaml

@@ -3,13 +3,17 @@ version: '3.7'
 services:
  pgvector:
    container_name: pgvector
-    image: ankane/pgvector:v0.5.1
+    image: pgvector/pgvector:0.6.0-pg16


Can you please use the official image?

This is the official image starting with 0.6.0. https://github.com/pgvector/pgvector#docker-1

KShivendu · 2024-03-07T13:03:49Z

experiments/configurations/pgvector-single-node.json

        "search_params": [
-          { "parallel": 1, "search_params": { "hnsw_ef": 128 } }
+          { "parallel": 8, "search_params": { "hnsw_ef": 128 } }


It's not used for running the experiments. But okay :)

for more information, see https://pre-commit.ci

ankane · 2024-03-07T17:46:47Z

Rebased to fix merge conflict from #100

KShivendu

LGTM. Thanks for contributing!

* pgvector improvements * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated Postgres parameters * Use versioned Docker image * Updated pgvector to 0.6.2 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

ankane · 2024-04-11T21:09:08Z

Thanks

ankane commented Jan 31, 2024

View reviewed changes

joein reviewed Feb 4, 2024

View reviewed changes

KShivendu reviewed Mar 7, 2024

View reviewed changes

ankane and others added 4 commits March 7, 2024 09:44

pgvector improvements

69068cd

[pre-commit.ci] auto fixes from pre-commit.com hooks

6dcd9fa

for more information, see https://pre-commit.ci

Updated Postgres parameters

eb167f9

Use versioned Docker image

65c6005

ankane force-pushed the pgvector branch from 85e10bb to 65c6005 Compare March 7, 2024 17:45

Updated pgvector to 0.6.2

4eef95a

KShivendu approved these changes Apr 11, 2024

View reviewed changes

KShivendu merged commit f4436e4 into qdrant:master Apr 11, 2024
1 check passed

KShivendu mentioned this pull request Apr 12, 2024

Is this support to test pgvector #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgvector improvements #98

pgvector improvements #98

ankane commented Jan 31, 2024

ankane Jan 31, 2024

joein Feb 4, 2024

ankane Feb 4, 2024

ankane Jan 31, 2024 •

edited

Loading

ankane Jan 31, 2024

KShivendu Mar 7, 2024

ankane Jan 31, 2024 •

edited

Loading

ankane Jan 31, 2024

ankane Jan 31, 2024

joein Feb 4, 2024

joein commented Feb 4, 2024

KShivendu left a comment •

edited

Loading

KShivendu Mar 7, 2024 •

edited

Loading

ankane Mar 7, 2024

KShivendu Mar 7, 2024

ankane commented Mar 7, 2024

KShivendu left a comment

ankane commented Apr 11, 2024

pgvector improvements #98

pgvector improvements #98

Conversation

ankane commented Jan 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankane Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankane Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joein commented Feb 4, 2024

KShivendu left a comment • edited Loading

Choose a reason for hiding this comment

KShivendu Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankane commented Mar 7, 2024

KShivendu left a comment

Choose a reason for hiding this comment

ankane commented Apr 11, 2024

ankane Jan 31, 2024 •

edited

Loading

ankane Jan 31, 2024 •

edited

Loading

KShivendu left a comment •

edited

Loading

KShivendu Mar 7, 2024 •

edited

Loading