add chunk_family_fields to okp enrichment config#1304
add chunk_family_fields to okp enrichment config#1304tisnik merged 2 commits intolightspeed-core:mainfrom
chunk_family_fields to okp enrichment config#1304Conversation
WalkthroughAdded a single configuration entry to the Solr Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)
402-454:⚠️ Potential issue | 🟠 MajorBackfill
chunk_family_fieldson existing Solr providers too.This only sets
chunk_family_fieldswhen the Solr provider is newly appended. Ifconstants.SOLR_PROVIDER_IDis already present in the input config, this branch is skipped and upgraded configs keep the missing field that this PR is trying to fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 402 - 454, The code only sets chunk_family_fields when appending a new Solr provider, so existing Solr entries keep missing chunk_family_fields; update the branch that detects existing_providers (using existing_providers, ls_config, and constants.SOLR_PROVIDER_ID) to iterate over ls_config["providers"]["vector_io"] and, for any provider with provider_id == constants.SOLR_PROVIDER_ID, ensure its config.chunk_window_config includes chunk_family_fields (e.g., add ["headings"] if absent) and the other chunk_window_config defaults (use solr_config.get("chunk_filter_query", "is_chunk:true") as used when appending) so upgrades backfill the missing field for existing providers as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llama_stack_configuration.py`:
- Line 445: The multiline dict entry for the key "chunk_family_fields" is
missing a trailing comma which causes Black to reformat the file; update the
dict literal containing the "chunk_family_fields": ["headings"] entry by adding
a trailing comma after the closing bracket so the line becomes
"chunk_family_fields": ["headings"], ensuring the surrounding dict (where this
key is defined) is syntactically correct and Black will pass.
---
Outside diff comments:
In `@src/llama_stack_configuration.py`:
- Around line 402-454: The code only sets chunk_family_fields when appending a
new Solr provider, so existing Solr entries keep missing chunk_family_fields;
update the branch that detects existing_providers (using existing_providers,
ls_config, and constants.SOLR_PROVIDER_ID) to iterate over
ls_config["providers"]["vector_io"] and, for any provider with provider_id ==
constants.SOLR_PROVIDER_ID, ensure its config.chunk_window_config includes
chunk_family_fields (e.g., add ["headings"] if absent) and the other
chunk_window_config defaults (use solr_config.get("chunk_filter_query",
"is_chunk:true") as used when appending) so upgrades backfill the missing field
for existing providers as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b046e6c-fabd-47bb-9b6c-73ef9877e9c3
📒 Files selected for processing (1)
src/llama_stack_configuration.py
|
/lgtm |
ac3205f to
040ff3a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/llama_stack_configuration.py (1)
445-445: Add a regression check forchunk_family_fields.The current Solr enrichment tests only verify provider/store/model IDs, so this new key can be removed or misspelled without any test failing. Please extend
tests/unit/test_llama_stack_configuration.pyto assertchunk_window_config["chunk_family_fields"] == ["headings"].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` at line 445, The tests don't assert the new chunk_family_fields key so regressions can remove or misspell it; add an assertion in tests/unit/test_llama_stack_configuration.py that verifies chunk_window_config["chunk_family_fields"] == ["headings"]. Locate where chunk_window_config is built or loaded in that test (search for chunk_window_config, chunk_window, or llama_stack_configuration usage) and add a single assertion checking the exact list value, ensuring the test fails if the key is missing or altered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llama_stack_configuration.py`:
- Line 445: The tests don't assert the new chunk_family_fields key so
regressions can remove or misspell it; add an assertion in
tests/unit/test_llama_stack_configuration.py that verifies
chunk_window_config["chunk_family_fields"] == ["headings"]. Locate where
chunk_window_config is built or loaded in that test (search for
chunk_window_config, chunk_window, or llama_stack_configuration usage) and add a
single assertion checking the exact list value, ensuring the test fails if the
key is missing or altered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 750b1853-0051-4634-ae9c-6af0c46632f6
📒 Files selected for processing (1)
src/llama_stack_configuration.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/llama_stack_configuration.py (1)
437-446: LGTM! The trailing comma issue from the previous review has been addressed.The addition of
chunk_family_fieldstochunk_window_configis correct and aligns with the PR objective to fix the error when chunk expansion is enabled without this field.One consideration: the existing tests for
enrich_solr(seetests/unit/test_llama_stack_configuration.py:406-450) validate provider/store registration but do not assert on the internal structure ofchunk_window_config. Adding a test that verifieschunk_family_fieldsis set to["headings"]would help prevent regressions.,
Optional: Add test coverage for chunk_window_config
def test_enrich_solr_sets_chunk_family_fields() -> None: """Test enrich_solr configures chunk_family_fields in chunk_window_config.""" ls_config: dict[str, Any] = {} enrich_solr(ls_config, {"enabled": True}) solr_provider = next( p for p in ls_config["providers"]["vector_io"] if p["provider_id"] == "okp_solr" ) chunk_window_config = solr_provider["config"]["chunk_window_config"] assert chunk_window_config["chunk_family_fields"] == ["headings"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llama_stack_configuration.py` around lines 437 - 446, Add a unit test that asserts enrich_solr sets chunk_family_fields to ["headings"] in the Solr provider's chunk_window_config: in tests/unit/test_llama_stack_configuration.py create a test (e.g., test_enrich_solr_sets_chunk_family_fields) that calls enrich_solr(ls_config, {"enabled": True}), locates the Solr provider via ls_config["providers"]["vector_io"] where provider_id == "okp_solr", extracts provider["config"]["chunk_window_config"], and asserts chunk_window_config["chunk_family_fields"] == ["headings"] to prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llama_stack_configuration.py`:
- Around line 437-446: Add a unit test that asserts enrich_solr sets
chunk_family_fields to ["headings"] in the Solr provider's chunk_window_config:
in tests/unit/test_llama_stack_configuration.py create a test (e.g.,
test_enrich_solr_sets_chunk_family_fields) that calls enrich_solr(ls_config,
{"enabled": True}), locates the Solr provider via
ls_config["providers"]["vector_io"] where provider_id == "okp_solr", extracts
provider["config"]["chunk_window_config"], and asserts
chunk_window_config["chunk_family_fields"] == ["headings"] to prevent
regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3ff31f06-9189-4eb1-80d2-df61f981149e
📒 Files selected for processing (1)
src/llama_stack_configuration.py
Description
In the OKP enrichment config, this sets
chunk_family_fieldsto["headings"]which causes the Solr provider's chunk expansion algorithm to expand only within a shared heading, lowering the likelihood of irrelevant chunks being included. I'm marking it as a bug fix too, because currently if chunk expansion is enabled, omittingchunk_family_fieldscauses an error even though the field is meant to be optional.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit