feat: enable MMR diversity for hybrid search#39
Conversation
MMR (Maximal Marginal Relevance) was artificially blocked for hybrid search in QQL, even though the Qdrant SDK supports it: Prefetch.query accepts NearestQuery, which carries an MMR field. - Remove the hybrid guard from _validate_search_mmr_usage - Wire _build_dense_query() into the dense prefetch of both flat and GROUP BY hybrid paths, so MMR params produce a NearestQuery(mmr=...) instead of a raw vector - Keep the sparse-only and recommend guards (MMR is a dense-space concept and RecommendInput has no MMR field in the Qdrant API) - Replace test_hybrid_search_with_mmr_raises with a new test that asserts NearestQuery + MMR is passed in the dense prefetch
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughHybrid searches now support Maximum Marginal Relevance (MMR) by updating validation logic to allow MMR configuration, applying MMR through constructed dense queries in both grouped and non-grouped paths, and verifying the behavior with updated test coverage. ChangesHybrid MMR Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_executor.py (1)
1245-1277: ⚡ Quick winAdd grouped-hybrid MMR coverage for the second changed path.
Line 1245 validates flat hybrid prefetch MMR, but the grouped hybrid branch (Line 1636 in
src/qql/executor.py) is also newly wired through_build_dense_queryand should get a direct regression assertion (query_points_groups+group_by+hybrid=True+NearestQuery.mmrchecks).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_executor.py` around lines 1245 - 1277, Add a new test that mirrors test_hybrid_search_with_mmr_uses_nearest_query_in_prefetch but exercises the grouped-hybrid path: create a SearchStmt with hybrid=True and a group_by value (and with_clause=SearchWith(mmr_diversity=..., mmr_candidates=...)), patch Embedder/SparseEmbedder and make mock_client.collection_exists/get_collection/ query_points_groups return a MagicMock response, call executor.execute(node), then inspect mock_client.query_points_groups.call_args.kwargs["prefetch"] (not query_points) to get the dense query and assert isinstance(..., NearestQuery) and that dense_query.mmr.diversity and dense_query.mmr.candidates_limit match the provided values; this ensures the _build_dense_query/grouped hybrid branch is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_executor.py`:
- Around line 1245-1277: Add a new test that mirrors
test_hybrid_search_with_mmr_uses_nearest_query_in_prefetch but exercises the
grouped-hybrid path: create a SearchStmt with hybrid=True and a group_by value
(and with_clause=SearchWith(mmr_diversity=..., mmr_candidates=...)), patch
Embedder/SparseEmbedder and make mock_client.collection_exists/get_collection/
query_points_groups return a MagicMock response, call executor.execute(node),
then inspect mock_client.query_points_groups.call_args.kwargs["prefetch"] (not
query_points) to get the dense query and assert isinstance(..., NearestQuery)
and that dense_query.mmr.diversity and dense_query.mmr.candidates_limit match
the provided values; this ensures the _build_dense_query/grouped hybrid branch
is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 827162fb-8be5-4b15-b436-b08655272932
📒 Files selected for processing (2)
src/qql/executor.pytests/test_executor.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_executor.py (1)
1302-1305: ⚡ Quick winMake dense-prefetch assertion order-independent.
This test currently assumes the dense prefetch is always at index 0. That makes it brittle to harmless ordering changes. Select the dense prefetch by
usingbefore assertingNearestQueryand MMR fields.Suggested test adjustment
- prefetch = mock_client.query_points_groups.call_args.kwargs["prefetch"] - assert prefetch is not None - dense_query = prefetch[0].query + prefetch = mock_client.query_points_groups.call_args.kwargs["prefetch"] + assert prefetch is not None + dense_prefetch = next(p for p in prefetch if p.using == "dense") + dense_query = dense_prefetch.query assert isinstance(dense_query, NearestQuery)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_executor.py` around lines 1302 - 1305, The test assumes the dense prefetch is at index 0; change it to find the prefetch item by its using attribute instead of relying on order: retrieve prefetch = mock_client.query_points_groups.call_args.kwargs["prefetch"], then locate the item where item.using == "dense" (or the actual using string used in the diff) and assign that to dense_prefetch; assert isinstance(dense_prefetch.query, NearestQuery) and check the MMR-related fields on dense_prefetch/query accordingly so the assertions are order-independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_executor.py`:
- Around line 1302-1305: The test assumes the dense prefetch is at index 0;
change it to find the prefetch item by its using attribute instead of relying on
order: retrieve prefetch =
mock_client.query_points_groups.call_args.kwargs["prefetch"], then locate the
item where item.using == "dense" (or the actual using string used in the diff)
and assign that to dense_prefetch; assert isinstance(dense_prefetch.query,
NearestQuery) and check the MMR-related fields on dense_prefetch/query
accordingly so the assertions are order-independent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c40dec0e-90aa-495b-bae2-163151851b72
📒 Files selected for processing (1)
tests/test_executor.py
7022578 to
0140d97
Compare
MMR (Maximal Marginal Relevance) was artificially blocked for hybrid search in QQL, even though the Qdrant SDK supports it: Prefetch.query accepts NearestQuery, which carries an MMR field.
Summary by CodeRabbit
New Features
Tests