Refine index configuration validation and typed helpers#173
Refine index configuration validation and typed helpers#173hnwyllmm merged 10 commits intooceanbase:developfrom
Conversation
Improve developer ergonomics with dedicated HNSW/fulltext config helpers, enforce documented parameter validation with clear warnings, and align SQL generation/exports/tests with the expanded configuration model. Made-with: Cursor
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExpands and tightens configuration dataclasses/enums and validation, changes SparseVector defaults, makes VECTOR INDEX SQL use HNSW config fields, lazily exposes SeekdbEmbeddedClient from client package, updates tests and numpy constraint. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit_tests/test_configuration.py (1)
226-255: Consider adding deprecation warning assertion.The
Configurationclass emits aDeprecationWarningwhen instantiated, but none of these tests assert that the warning is raised. Consider wrapping theConfiguration()calls withpytest.warns(DeprecationWarning)to document and verify the expected deprecation behavior.Example for one test case
def test_configuration_empty(self): """Test Configuration with no parameters""" - config = Configuration() + with pytest.warns(DeprecationWarning, match="Configuration is deprecated"): + config = Configuration() assert config.hnsw is None assert config.fulltext_config is None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/test_configuration.py` around lines 226 - 255, Each test should assert the deprecation warning emitted by instantiating Configuration: wrap each instantiation of Configuration in the tests (test_configuration_with_hnsw_only, test_configuration_with_fulltext_only, test_configuration_with_both, test_configuration_empty) with pytest.warns(DeprecationWarning) so the tests verify the DeprecationWarning produced by the Configuration class upon construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit_tests/test_configuration.py`:
- Around line 226-255: Each test should assert the deprecation warning emitted
by instantiating Configuration: wrap each instantiation of Configuration in the
tests (test_configuration_with_hnsw_only, test_configuration_with_fulltext_only,
test_configuration_with_both, test_configuration_empty) with
pytest.warns(DeprecationWarning) so the tests verify the DeprecationWarning
produced by the Configuration class upon construction.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/pyseekdb/__init__.pysrc/pyseekdb/client/__init__.pysrc/pyseekdb/client/client_base.pysrc/pyseekdb/client/configuration.pytests/unit_tests/test_configuration.pytests/unit_tests/test_sparse_vector_index_config.py
Exclude yanked numpy 2.4.0 and regenerate uv.lock (including pylibseekdb upgrade) so `make check` passes consistently with --locked. Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pyseekdb/__init__.py (1)
112-147:⚠️ Potential issue | 🟠 Major
SeekdbEmbeddedClientis exported in__all__but not actually importable from the top-level module.
__all__includes"SeekdbEmbeddedClient"(line 135), but this module does not import it from.clientand does not define__getattr__to resolve it lazily. This breaks the expected import pathfrom pyseekdb import SeekdbEmbeddedClientused in integration tests.The symbol is available in the
.clientsubmodule via its__getattr__, but the top-level module needs its own__getattr__to delegate to it.Proposed fix
from .client.collection import Collection try: __version__ = importlib.metadata.version("pyseekdb") except importlib.metadata.PackageNotFoundError: __version__ = "0.0.1.dev1" __author__ = "OceanBase <open_oceanbase@oceanbase.com>" + + +def __getattr__(name: str): + if name == "SeekdbEmbeddedClient": + from .client import SeekdbEmbeddedClient + return SeekdbEmbeddedClient + raise AttributeError(f"module '{__name__}' has no attribute '{name}'")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pyseekdb/__init__.py` around lines 112 - 147, The top-level __all__ exports "SeekdbEmbeddedClient" but the module doesn't expose it, causing imports like from pyseekdb import SeekdbEmbeddedClient to fail; add a top-level __getattr__ that delegates missing attribute lookups to the .client submodule (or explicitly import SeekdbEmbeddedClient from .client into this module) so the exported symbol is resolvable; reference the existing __all__ list, the symbol SeekdbEmbeddedClient, and implement __getattr__(name) to import .client and return getattr(client, name) (or perform a one-time import of SeekdbEmbeddedClient) to restore the expected top-level import behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit_tests/test_sparse_vector_sql.py`:
- Around line 55-58: The test test_no_optional_params_by_default currently
discards the result of _get_sparse_vector_index_sql(config); update it to assert
meaningful behavior by capturing the returned SQL and asserting either that it
equals the expected default SQL string or at minimum is non-empty and contains
expected tokens (e.g., "CREATE INDEX", the table/column names produced by
_make_config). Locate test_no_optional_params_by_default, call sql =
_get_sparse_vector_index_sql(config), and add assertions (e.g., assert sql
and/or assert "CREATE INDEX" in sql and/or assert the exact expected SQL) so
regressions in default SQL rendering are caught.
---
Outside diff comments:
In `@src/pyseekdb/__init__.py`:
- Around line 112-147: The top-level __all__ exports "SeekdbEmbeddedClient" but
the module doesn't expose it, causing imports like from pyseekdb import
SeekdbEmbeddedClient to fail; add a top-level __getattr__ that delegates missing
attribute lookups to the .client submodule (or explicitly import
SeekdbEmbeddedClient from .client into this module) so the exported symbol is
resolvable; reference the existing __all__ list, the symbol
SeekdbEmbeddedClient, and implement __getattr__(name) to import .client and
return getattr(client, name) (or perform a one-time import of
SeekdbEmbeddedClient) to restore the expected top-level import behavior.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/pyseekdb/__init__.pysrc/pyseekdb/client/__init__.pytests/unit_tests/test_sparse_vector_sql.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.github/workflows/ci.yml (2)
113-115: The grep chain logic is confusing and potentially fragile.The expression
grep 'passed' | grep -q 'failed' && exit 1 || exit 0attempts to find lines containing both "passed" and "failed", but this is convoluted. Withset -o pipefail, pytest's exit code should already propagate correctly.If the intent is to catch summary lines like
5 passed, 2 failed, consider simplifying:♻️ Suggested simplification
set -o pipefail - uv run pytest tests/unit_tests/ -v --log-cli-level=${log_level} | tee pytest.log - tail -n 1 pytest.log | grep '=======' | grep 'passed' |grep -q 'failed' && exit 1 || exit 0 + uv run pytest tests/unit_tests/ -v --log-cli-level=${log_level} --tb=shortAlternatively, if log capture is needed, rely on pytest's exit code directly:
set -o pipefail - uv run pytest tests/unit_tests/ -v --log-cli-level=${log_level} | tee pytest.log - tail -n 1 pytest.log | grep '=======' | grep 'passed' |grep -q 'failed' && exit 1 || exit 0 + uv run pytest tests/unit_tests/ -v --log-cli-level=${log_level} 2>&1 | tee pytest.log🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 113 - 115, The current CI step uses a fragile grep chain after "set -o pipefail" and "uv run pytest ... | tee pytest.log"—specifically the "tail -n 1 pytest.log | grep '=======' | grep 'passed' | grep -q 'failed' && exit 1 || exit 0" logic is confusing and unnecessary; replace it by relying on pytest's exit code (remove the tail/grep chain so the job simply exits with pytest's exit status when using set -o pipefail), or if you must inspect the logfile, simplify to a single explicit check like grepping pytest.log for "failed" and exiting non‑zero if found (refer to the CI step that runs "uv run pytest ... | tee pytest.log" and the use of "set -o pipefail").
124-126: Same fragile grep pattern as noted above.This duplicates the convoluted exit-code handling from line 115. Consider applying the same simplification here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 124 - 126, The current pipeline uses a fragile grep chain on the last line of pytest.log after running "uv run pytest ..." (with set -o pipefail) to decide the job status; instead, capture and check pytest's actual exit code from the pipeline (the shell PIPESTATUS entry for the "uv run pytest" command) and exit with that non-zero code when tests fail. Update the step that runs "uv run pytest tests/integration_tests/ -v --log-cli-level=${log_level} -k ..." and replace the final "tail ... | grep ..." logic with a check of the pipeline exit status (PIPESTATUS[0]) so the workflow reliably fails when pytest fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 113-115: The current CI step uses a fragile grep chain after "set
-o pipefail" and "uv run pytest ... | tee pytest.log"—specifically the "tail -n
1 pytest.log | grep '=======' | grep 'passed' | grep -q 'failed' && exit 1 ||
exit 0" logic is confusing and unnecessary; replace it by relying on pytest's
exit code (remove the tail/grep chain so the job simply exits with pytest's exit
status when using set -o pipefail), or if you must inspect the logfile, simplify
to a single explicit check like grepping pytest.log for "failed" and exiting
non‑zero if found (refer to the CI step that runs "uv run pytest ... | tee
pytest.log" and the use of "set -o pipefail").
- Around line 124-126: The current pipeline uses a fragile grep chain on the
last line of pytest.log after running "uv run pytest ..." (with set -o pipefail)
to decide the job status; instead, capture and check pytest's actual exit code
from the pipeline (the shell PIPESTATUS entry for the "uv run pytest" command)
and exit with that non-zero code when tests fail. Update the step that runs "uv
run pytest tests/integration_tests/ -v --log-cli-level=${log_level} -k ..." and
replace the final "tail ... | grep ..." logic with a check of the pipeline exit
status (PIPESTATUS[0]) so the workflow reliably fails when pytest fails.
Introduce a workflow step that incrementally includes integration test modules in collection to identify import side-effect culprits behind intermittent exit 139 in the non-db subset. Made-with: Cursor
Remove direct SeekdbEmbeddedClient import from test module and infer embedded mode from runtime server metadata, preventing Python 3.14 collection-time native crashes. Made-with: Cursor
Summary
type/liband extend unit tests for new defaults and validation behaviorTest plan
PYTHONPATH=src pytest tests/unit_tests/test_configuration.py tests/unit_tests/test_sparse_vector_index_config.pyMade with Cursor
Summary by CodeRabbit
New Features
Improvements
Breaking Changes
Chores