Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introducing pg_sparse #418

Merged
merged 78 commits into from
Oct 31, 2023
Merged

feat: Introducing pg_sparse #418

merged 78 commits into from
Oct 31, 2023

Conversation

rebasedming
Copy link
Collaborator

@rebasedming rebasedming commented Oct 20, 2023

Ticket(s) Closed

  • Closes #

What

pg_sparse is an extension that enables similarity search over sparse vectors in Postgres with HNSW. Think of it as pgvector for sparse vectors.

Why

pgvector only supports dense vectors up to 2K dimensions; sparse vectors are much higher dimensionality and require a custom storage + HNSW implementation.

Todo LIst

  • Option parsing (m, ef, etc.)
  • Import hnswlib rust bindings as crate
  • Create HNSW index on disk
  • Insert sparse vectors into HNSW index
  • Perform index scan
  • Delete vectors
  • Implement vacuum
  • Implement correct cost estimate function
  • Index resizing
  • Pass in ef_search

Tests

Wrote basic unit and regression tests. For testing instructions see the extension README

@vercel
Copy link

vercel bot commented Oct 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
paradedb ⬜️ Ignored (Inspect) Visit Preview Oct 31, 2023 5:45pm

@rebasedming rebasedming changed the title Introducing pg_sparse feat: Introducing pg_sparse Oct 20, 2023
@philippemnoel
Copy link
Collaborator

👀

pg_sparse/Cargo.lock Outdated Show resolved Hide resolved
pg_sparse/README.md Outdated Show resolved Hide resolved
pg_sparse/Cargo.toml Outdated Show resolved Hide resolved
pg_sparse/README.md Outdated Show resolved Hide resolved
pg_sparse/README.md Outdated Show resolved Hide resolved
@rebasedming rebasedming marked this pull request as ready for review October 27, 2023 16:14
@philippemnoel
Copy link
Collaborator

philippemnoel commented Oct 30, 2023

@rebasedming sorry to keep making requests, but could you add the workflows to publish pg_sparse and to test pg_sparse? You can mimick them from our pg_bm25 workflows. I think once you do, you'll notice you need to add pg=16 in the places in code where you have pg=15 feature flags to make it supported, as it's going to fail the tests on pg16 otherwise (it's not enough to just add it to the Cargo.toml)

@rebasedming
Copy link
Collaborator Author

@rebasedming sorry to keep making requests, but could you add the workflows to publish pg_sparse and to test pg_sparse? You can mimick them from our pg_bm25 workflows. I think once you do, you'll notice you need to add pg=16 in the places in code where you have pg=15 feature flags to make it supported, as it's going to fail the tests on pg16 otherwise

Sure!

Copy link
Collaborator

@philippemnoel philippemnoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level lgtm! Thank you for adding the test workflow. It's missing the deployment workflow, but I can handle that in a separate PR, this one has already grown so big

Copy link
Contributor

@sardination sardination left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - have some clarifying questions.

pg_sparse/src/api/mod.rs Show resolved Hide resolved
pg_sparse/src/index_access/scan.rs Show resolved Hide resolved
@rebasedming rebasedming merged commit 0d54aad into dev Oct 31, 2023
15 checks passed
@rebasedming rebasedming deleted the feat/pg_sparse branch October 31, 2023 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants