New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration for Timescale Vector(Postgres) #10650
Add integration for Timescale Vector(Postgres) #10650
Conversation
This commit adds a vector store for the Postgres-based vector database (`TimescaleVector`). Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`: - Enhances `pgvector` with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm. - Enables fast time-based vector search via automatic time-based partitioning and indexing. - Provides a familiar SQL interface for querying vector embeddings and relational data. Timescale Vector scales with you from POC to production: - Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database. - Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security. - Enables a worry-free experience with enterprise-grade security and compliance. Timescale Vector is available on Timescale, the cloud PostgreSQL platform. (There is no self-hosted version at this time.) LangChain users get a 90-day free trial for Timescale Vector.
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome! few small comments
docs/extras/modules/ts_git_log.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baskaryan it's a demo dataset that we use to illustrate the similarity search with time-based filtering. The dataset is a JSON of git commit entries. Each entry has a text component (describing the changes in that commit), but also metadata like the author and most importantly the timestamp that the entry was made. We use this dataset to illustrate to users how to use TimescaleVector's similarity search with time-filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be easy to host the file somewhere and load it in notebook? would be nice to avoid 30k new lines if it's easy 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. I'll update the notebook with a link to the dataset and update the file loading instructions as well. Standby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baskaryan fixed in latest commit!
- Using the distance strategy from utils - Changing naming of embedding_function -> embedding - Fixing uses of ValueError when it should be ImportError
@property | ||
def distance_strategy(self) -> Any: | ||
if self._distance_strategy == "l2": | ||
return self.EmbeddingStore.embedding.l2_distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is self.EmbeddingStore set?
…rovided link to download file in timescale vector notebook
Description:
This commit adds a vector store for the Postgres-based vector database (
TimescaleVector
).Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in
PostgreSQL
:pgvector
with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm.Timescale Vector scales with you from POC to production:
Timescale Vector is available on Timescale, the cloud PostgreSQL platform. (There is no self-hosted version at this time.) LangChain users get a 90-day free trial for Timescale Vector.