Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add support for the vector data type #1505

Open
wayneshn opened this issue Aug 10, 2023 · 2 comments
Open

[Feature]: Add support for the vector data type #1505

wayneshn opened this issue Aug 10, 2023 · 2 comments
Labels
priority: p1 This issue will be fixed in the next major/minor version serverity: major Major functionality but can wait type: feature

Comments

@wayneshn
Copy link

Use Case
This issue proposes the addition of support for the vector data type in OceanBase. This feature will enhance the performance and scalability of using OceanBase in AI projects.

Background
While working on a project to integrate AI with OceanBase, I realized that OceanBase does not inherently support the vector data type. This limitation posed a challenge as the project involved transforming articles and user queries into vector representations for comparison and answer generation.

To work around this, I had to convert vectors into JSON for storage in OceanBase and then convert them back into vectors for comparison. This conversion process resulted in a slower system that lacks scalability.

Describe the solution you'd like
The proposed solution involves adding native support for the vector data type to OceanBase, inspired by the pg-vector extension in PostgreSQL.

Just like we store numbers, text, or dates, we need to create a way to store vectors - which are essentially a list of numbers. This change will allow OceanBase to store and retrieve vectors efficiently.

The proposal will also impact how queries are processed in OceanBase. This means teaching OceanBase how to understand and perform operations on vectors. For example, it should be able to calculate the similarity between two vectors, which is a common operation in AI and machine learning applications.

To make searching through vectors efficient, we need to implement a system that can quickly find the most similar vectors in the database. This is crucial for many AI applications where you need to find the most similar item to a given input.

Additional context
I have published an article detailing my experiment using OceanBase as a vector store in AI training. I will include the link in comment to provide more context.

@hnwyllmm
Copy link
Contributor

hnwyllmm commented Aug 11, 2023

It looks interesting.
We will have a discussion.

@wayneshn
Copy link
Author

Here is the article mentioned in the issue. Create a Langchain alternative from scratch using OceanBase

@hnwyllmm hnwyllmm moved this from 🆕 New to 📋 Backlog in Feature Kanban Nov 30, 2023
@hnwyllmm hnwyllmm added serverity: major Major functionality but can wait priority: p1 This issue will be fixed in the next major/minor version labels Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 This issue will be fixed in the next major/minor version serverity: major Major functionality but can wait type: feature
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants