Skip to content

Conversation

@SetnameWang
Copy link
Contributor

Description

Add openGauss as an new database integration.

Short description of openGauss: DataVec currently supports two index structures, IVFFlat and HNSW, to accelerate vector similarity queries. IVFFlat (Inverted File Flat) is an index structure based on inverted files, suitable for fast retrieval of large datasets. HNSW (Hierarchical Navigable Small World) is a graph-based index structure that enables efficient approximate nearest neighbor search in high-dimensional spaces.

Dependencies:
opengauss_sqlalchemy
psycopg2-binary
asyncpg

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • [x Yes
  • No

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 12, 2025
@logan-markewich logan-markewich self-assigned this May 13, 2025
or f"opengauss+psycopg2://{user}:{password}@{host}:{port}/{database}"
)
async_conn_str = async_connection_string or (
f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{database}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks REALLY familiar to the postgres vector store

So much so, that I'm convinced that the postgres vector store will work fine by just change the connection string/engine you pass in

Very hesitnant to merge 16 files and 2.5k lines of code just to replicate postgres

@SetnameWang
Copy link
Contributor Author

SetnameWang commented May 20, 2025

This looks REALLY familiar to the postgres vector store

So much so, that I'm convinced that the postgres vector store will work fine by just change the connection string/engine you pass in

Very hesitnant to merge 16 files and 2.5k lines of code just to replicate postgres

To be honest, true. But there are few differences:

  1. For openGauss, pgvector cannot be used.
  2. openGauss has its own version of SQLAlchemy, which makes compatibility challenging.
  3. Halfvec (half-precision) is not available in openGauss.

For the connection string? is basically compatible.

I also have some doubts about this integration. If you have a better implementation approach suggestions, please let me know, and I will proceed with the necessary modifications.

@SetnameWang SetnameWang closed this Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants