Increase vector lookup speeds by applying an HSNW index #21379

TheOtherBrian1 · 2024-02-20T05:32:05Z

TheOtherBrian1
Feb 20, 2024
Maintainer

Although this guide is specifically for HSNW indexes, it can be generalized to work for any index type

Building an index without the CONCURRENTLY modifier will lock the table, but it will also increase build times. For general advice about indexes, check out this guide.

To speed up queries, it is ideal to create an HSNW index on your embedded column

The general structure for creating an hsnw index follows this pattern:

CREATE INDEX <custom name of index> ON <table name> USING hnsw (<vectorized column> <search type>);

Search can be one of three types:

operator	description	search type
<->	Euclidean distance	vector_l2_ops
<#>	negative inner product	vector_ip_ops
<=>	cosine distance	vector_cosine_ops

Queries can only utilize the index if it matches the search type used. If you are unsure which search type to prioritize, vector_cosine_ops is the most commonly used. You can checkout our guide for more info. The folks at Crunchy Data also wrote an explainer that you may find useful.

Applying an index can be slow and computationally expensive, so there are a few preparations that should be made beforehand:

1. Make sure your pgvector is the latest available version on Supabase.

Versions 0.6 and later have accelerated HNSW build speeds. You can observe your current version in the Dashboard's Extensions Page. You can perform a software upgrade in the Infrastructure Settings if necessary.

2. Setting up an external connection

The Dashboard has an internal time limit of ~2 minutes for queries. Indexing a large table will almost always take more time, so it is necessary to execute your code through an external interface, such as PSQL.

You can install PSQL in macOS and Windows by following these links and instructions.
For Linux (Debian) you can run the following:

sudo apt-get update
sudo apt-get install postgresql-client

Once installed, you can find your PSQL string from the Database Settings, which can be executed in the terminal to create a psql session.

If your network can use IPv6, consider using the direct connection string instead of Supavisor. It's not mandatory, but for tasks that run a long time, it's best to reduce network complexity. To check if your network is compatible, use this cURL command to request your IPv6 address:

curl -6 https://ifconfig.co/ip

If an address is returned, you should be able to use your direct connection string found in the Database Settings:

3. Increase memory for index creation (optional)

The maintance_work_mem variable limits the maximum amount of memory that can be used by maintenance operations, such as vacuuming, altering, and indexing tables. In your session you should try to set it to a reasonably high value:

set maintenance_work_mem to <several Gb's>; -- '#GB'

Inspect value to make sure it has been set:

show maintenance_work_mem;

4. Increase cores for index creation (optional)

The max_parallel_maintenance_workers variable limits the amount of cores that can be used by maintenance operations, including indexing tables. In your session, you should try to set it to a value roughly 1/2 to 2/3s of your compute core amount:

set max_parallel_maintenance_workers to <integer>;

Inspect value to make sure it has been set:

show max_parallel_maintenance_workers;

5. Setting a custom timeout

Disable query timeout for your connection:

set statement_timeout = '0';

Inspect value to make sure it has been set:

show statement_timeout;

6. Consider temporarily upgrading your compute size (optional)

If your task is particularly long, you can speed it up by boosting your computing power temporarily. Compute size is charged by the hour, so you can increase it for an hour or two to finish your task faster, then scale it back afterward. Here is a list of compute add-ons. If you want to temporarily upgrade, you can find the add-ons for your project in your Dashboard's Add-Ons Settings.

7. Consider increasing disk size (optional)

HSNW indexes can produce temporary files during their construction that may consume a few GBs worth of disk. Consider increasing the disk size in the Database Settings to accommodate for short-term disk stress.

pearcircuitmike · 2024-10-18T12:15:45Z

pearcircuitmike
Oct 18, 2024

This just gives ERROR: 42704: access method "hnsw" does not exist

1 reply

TheOtherBrian1 Oct 18, 2024
Maintainer Author

@pearcircuitmike a few questions:

Did you enable the vector extension in the Extensions Dashboard
What version of the extension do you have (also visible in the Extensions Dashboard)? HSNW was introduced in v0.5.0. If you're version is older, you'll have to initiate a Postgres software upgrade in the Infrastructure Settings
Did you try the command along the lines of:

  CREATE INDEX some_name ON some_table USING hnsw (embedded_column vector_l2_ops);

bramses · 2024-10-25T01:44:00Z

bramses
Oct 25, 2024

This is amazing, thank you @TheOtherBrian1. Do we need to run this often, or just once? I ask because I scaled my project to medium then I ran the index and put it back to micro

1 reply

TheOtherBrian1 Nov 3, 2024
Maintainer Author

@bramses, HSNW indexes, unlike IVFFLAT, update themselves during write operations. The index should not need to be rebuilt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supabase

Increase vector lookup speeds by applying an HSNW index #21379

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Supabase

Increase vector lookup speeds by applying an HSNW index #21379

TheOtherBrian1 Feb 20, 2024 Maintainer

To speed up queries, it is ideal to create an HSNW index on your embedded column

Replies: 2 comments · 2 replies

pearcircuitmike Oct 18, 2024

TheOtherBrian1 Oct 18, 2024 Maintainer Author

bramses Oct 25, 2024

TheOtherBrian1 Nov 3, 2024 Maintainer Author

TheOtherBrian1
Feb 20, 2024
Maintainer

Replies: 2 comments 2 replies

pearcircuitmike
Oct 18, 2024

TheOtherBrian1 Oct 18, 2024
Maintainer Author

bramses
Oct 25, 2024

TheOtherBrian1 Nov 3, 2024
Maintainer Author