Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration for Timescale Vector(Postgres) #10650

Merged
merged 12 commits into from Sep 21, 2023

Conversation

cevian
Copy link
Contributor

@cevian cevian commented Sep 15, 2023

Description:
This commit adds a vector store for the Postgres-based vector database (TimescaleVector).

Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in PostgreSQL:

  • Enhances pgvector with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm.
  • Enables fast time-based vector search via automatic time-based partitioning and indexing.
  • Provides a familiar SQL interface for querying vector embeddings and relational data.

Timescale Vector scales with you from POC to production:

  • Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
  • Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
  • Enables a worry-free experience with enterprise-grade security and compliance.

Timescale Vector is available on Timescale, the cloud PostgreSQL platform. (There is no self-hosted version at this time.) LangChain users get a 90-day free trial for Timescale Vector.

This commit adds a vector store for the Postgres-based vector
database (`TimescaleVector`).

Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++
for AI applications. It enables you to efficiently store and query
billions of vector embeddings in `PostgreSQL`:
- Enhances `pgvector` with faster and more accurate similarity search
  on 1B+ vectors via DiskANN inspired indexing algorithm.
- Enables fast time-based vector search via automatic time-based
  partitioning and indexing.
- Provides a familiar SQL interface for querying vector embeddings and
  relational data.

Timescale Vector scales with you from POC to production:
- Simplifies operations by enabling you to store relational metadata,
  vector embeddings, and time-series data in a single database.
- Benefits from rock-solid PostgreSQL foundation with enterprise-grade
  feature liked streaming backups and replication, high-availability and
  row-level security.
- Enables a worry-free experience with enterprise-grade security and
  compliance.

Timescale Vector is available on Timescale, the cloud PostgreSQL platform.
(There is no self-hosted version at this time.) LangChain users get a
90-day free trial for Timescale Vector.
@vercel
Copy link

vercel bot commented Sep 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Sep 21, 2023 2:28pm

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Sep 15, 2023
Copy link
Collaborator

@baskaryan baskaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome! few small comments

libs/langchain/langchain/vectorstores/timescalevector.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan it's a demo dataset that we use to illustrate the similarity search with time-based filtering. The dataset is a JSON of git commit entries. Each entry has a text component (describing the changes in that commit), but also metadata like the author and most importantly the timestamp that the entry was made. We use this dataset to illustrate to users how to use TimescaleVector's similarity search with time-filtering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easy to host the file somewhere and load it in notebook? would be nice to avoid 30k new lines if it's easy 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. I'll update the notebook with a link to the dataset and update the file loading instructions as well. Standby.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan fixed in latest commit!

libs/langchain/langchain/vectorstores/timescalevector.py Outdated Show resolved Hide resolved
libs/langchain/langchain/vectorstores/timescalevector.py Outdated Show resolved Hide resolved
- Using the distance strategy from utils
- Changing naming of embedding_function -> embedding
- Fixing uses of ValueError when it should be ImportError
@property
def distance_strategy(self) -> Any:
if self._distance_strategy == "l2":
return self.EmbeddingStore.embedding.l2_distance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is self.EmbeddingStore set?

@baskaryan baskaryan added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Sep 21, 2023
@baskaryan baskaryan merged commit 6e02c45 into langchain-ai:master Sep 21, 2023
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants