fix: Add filter_id support and v1 filter resolution#1666
fix: Add filter_id support and v1 filter resolution#1666edwinjosechittilappilly wants to merge 11 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds filter_id support for document deletion and result scoping across Python/TypeScript SDKs and FastAPI endpoints, implements a shared resolve_filter_id helper, updates chat/search/documents endpoints to use resolved filters, adjusts MCP exposure, and expands integration tests validating filter scoping and deletion behavior. ChangesFilter-based document deletion and filtering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds first-class filter_id support across the v1 API surface (chat/search/documents) by resolving knowledge filters server-side, extends document deletion to support deleting by filter_id, and updates both SDKs + integration tests to exercise the new behavior.
Changes:
- Introduce
api.v1._filter_resolution.resolve_filter_id()and wirefilter_idinto v1 chat/search/documents behavior (including wildcard stripping and override rules). - Extend
DELETE /v1/documents+ Python/TypeScript SDKs to support deleting by eitherfilenameorfilter_id(mutually exclusive), returning richer deletion metadata. - Expand SDK integration tests (Python + TypeScript) to validate that
filter_idtruly scopes retrieval/deletion behavior (including streaming + not-found + validation cases).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/sdk/test_filters.py | Adds end-to-end filter scoping tests for chat/search (including streaming and not-found). |
| tests/integration/sdk/test_documents.py | Adds delete-by-filter_id integration tests and SDK-side validation cases. |
| src/mcp_http/server.py | Updates MCP tool descriptions and route mapping behavior (GETs as tools; excludes ingest). |
| src/api/v1/search.py | Accepts filter_id and resolves filter values for v1 search requests. |
| src/api/v1/documents.py | Extends v1 documents DELETE to support deleting all filenames referenced by a filter. |
| src/api/v1/chat.py | Accepts filter_id and resolves filter values for v1 chat (streaming + non-streaming). |
| src/api/v1/_filter_resolution.py | New shared helper for resolving filter_id into concrete filters/limit/threshold with wildcard stripping. |
| sdks/typescript/tests/integration.test.ts | Strengthens TS integration tests to validate actual scoping and delete-by-filter behavior. |
| sdks/typescript/src/types.ts | Extends delete response typing and introduces delete options type. |
| sdks/typescript/src/documents.ts | Updates documents.delete to accept filename or { filterId } and preserve filename-idempotency. |
| sdks/python/openrag_sdk/models.py | Extends delete response model to include filter deletion metadata. |
| sdks/python/openrag_sdk/documents.py | Updates documents.delete to support filter_id and preserve filename-idempotency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resolved_filters = body.filters | ||
| resolved_limit = body.limit | ||
| resolved_score_threshold = body.score_threshold | ||
| if body.filter_id: | ||
| resolved = await resolve_filter_id( | ||
| body.filter_id, | ||
| knowledge_filter_service, | ||
| user_id=user.user_id, | ||
| jwt_token=jwt_token, | ||
| ) | ||
| # Inline values override per-field; defaults (10 / 0) fall back to the filter. | ||
| if not body.filters: | ||
| resolved_filters = resolved["filters"] | ||
| if body.limit == 10: | ||
| resolved_limit = resolved["limit"] | ||
| if body.score_threshold == 0: | ||
| resolved_score_threshold = resolved["score_threshold"] |
| resolved_filters = body.filters | ||
| resolved_limit = body.limit | ||
| resolved_score_threshold = body.score_threshold | ||
| if body.filter_id: | ||
| resolved = await resolve_filter_id( | ||
| body.filter_id, | ||
| knowledge_filter_service, | ||
| user_id=user.user_id, | ||
| jwt_token=None, | ||
| ) | ||
| if not body.filters: | ||
| resolved_filters = resolved["filters"] | ||
| if body.limit == 10: | ||
| resolved_limit = resolved["limit"] | ||
| if body.score_threshold == 0: | ||
| resolved_score_threshold = resolved["score_threshold"] |
| total_deleted = 0 | ||
| for fname in filenames: | ||
| payload, _status = await delete_documents_by_filename_core( | ||
| filename=fname, | ||
| session_manager=session_manager, | ||
| user_id=user.user_id, | ||
| jwt_token=None, | ||
| ) | ||
| results.append(payload) | ||
| total_deleted += payload.get("deleted_chunks", 0) or 0 | ||
|
|
||
| return JSONResponse( | ||
| { | ||
| "success": True, | ||
| "deleted_chunks": total_deleted, | ||
| "filenames": filenames, | ||
| "filter_id": body.filter_id, | ||
| "per_file": results, | ||
| } |
| def _strip_wildcards(filters: dict[str, Any] | None) -> dict[str, list[str]]: | ||
| """Drop `["*"]` and empty lists from each filter dimension.""" | ||
| if not filters: | ||
| return {} | ||
| cleaned: dict[str, list[str]] = {} | ||
| for key in _FILTER_DIMENSIONS: | ||
| values = filters.get(key) | ||
| if not values or not isinstance(values, list): | ||
| continue | ||
| if "*" in values: | ||
| continue | ||
| cleaned[key] = values |
| if not result.get("success"): | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail={"error": f"Filter {filter_id} not found"}, |
| "description": ( | ||
| "Check the status of an ingestion task. " | ||
| "Use the task_id returned from openrag_ingest." | ||
| "Check the status of an ingestion task. Use the task_id returned from openrag_ingest." |
|
Actionable comments posted: 0 |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/api/v1/search.py (2)
53-68: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftMove filter merge/resolution orchestration out of the route handler.
This block is endpoint business logic and should live in a service/helper so the route stays as transport + validation only.
As per coding guidelines
src/api/**/*.py: “No business logic in route handlers.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/search.py` around lines 53 - 68, Extract the filter-resolution/merge logic from the route handler in src/api/v1/search.py into a new service/helper function (e.g., resolve_and_merge_filters) that accepts the request body (or its fields), knowledge_filter_service, user_id and returns resolved_filters, resolved_limit, and resolved_score_threshold; inside that helper call the existing resolve_filter_id(...) and apply the same merge rules (use resolved["filters"] when body.filters is empty, resolved["limit"] when body.limit == 10, and resolved["score_threshold"] when body.score_threshold == 0). Replace the inline block in the route with a single call to this helper and ensure the route remains only transport/validation while resolve_filter_id, resolved_filters, resolved_limit, resolved_score_threshold, body, knowledge_filter_service, and user.user_id are used to locate and wire up the change.
63-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse presence-based override checks instead of truthiness/default-value checks.
Lines 63-68 treat
{},10, and0as “not provided”, so explicit inline overrides can be ignored.🔧 Minimal fix
- resolved_filters = body.filters - resolved_limit = body.limit - resolved_score_threshold = body.score_threshold + provided_fields = body.model_fields_set + resolved_filters = body.filters + resolved_limit = body.limit + resolved_score_threshold = body.score_threshold @@ - if not body.filters: + if "filters" not in provided_fields: resolved_filters = resolved["filters"] - if body.limit == 10: + if "limit" not in provided_fields: resolved_limit = resolved["limit"] - if body.score_threshold == 0: + if "score_threshold" not in provided_fields: resolved_score_threshold = resolved["score_threshold"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/search.py` around lines 63 - 68, The override logic currently uses falsy/default-value checks (if not body.filters, if body.limit == 10, if body.score_threshold == 0) which treats valid values like {} , 0, or 10 as “not provided”; change these to presence-based checks (e.g. test for None / presence) so explicit values are honored: update the conditions around body.filters, body.limit, and body.score_threshold to check for None (or an explicit "is provided" flag) and only fall back to resolved["filters"], resolved["limit"], and resolved["score_threshold"] when the field is actually missing; look for the block that assigns resolved_filters, resolved_limit, and resolved_score_threshold in search.py and replace the falsy/default comparisons with presence checks.src/api/v1/chat.py (1)
17-19:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove the unused
get_api_key_user_asyncimport to unblock Ruff.CI is currently failing with F401 at Line 18.
🔧 Minimal fix
from dependencies import ( - get_api_key_user_async, get_chat_service, get_knowledge_filter_service, get_session_manager, require_api_key_permission, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/chat.py` around lines 17 - 19, Remove the unused import get_api_key_user_async from the dependencies import list in src/api/v1/chat.py to satisfy Ruff F401; keep the needed get_chat_service import and ensure any references to get_api_key_user_async are not required elsewhere in this module before removing it.
♻️ Duplicate comments (2)
src/api/v1/search.py (1)
57-62:⚠️ Potential issue | 🟠 Major | ⚡ Quick winForward the caller JWT to
resolve_filter_id.Line 61 passes
jwt_token=None, which can fail filter resolution for JWT/OIDC-authenticated requests and is inconsistent with chat’s implementation.🔧 Minimal fix
resolved = await resolve_filter_id( body.filter_id, knowledge_filter_service, user_id=user.user_id, - jwt_token=None, + jwt_token=user.jwt_token, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/search.py` around lines 57 - 62, The call to resolve_filter_id in the search handler currently passes jwt_token=None which breaks JWT/OIDC-backed filter resolution; change the call to forward the caller's JWT instead of None by replacing jwt_token=None with the request's JWT token (e.g., jwt_token variable from the handler or the auth context) so the call becomes resolve_filter_id(body.filter_id, knowledge_filter_service, user_id=user.user_id, jwt_token=caller_jwt); ensure you reference the same token variable used elsewhere in this module (the handler's JWT/auth token).src/api/v1/chat.py (1)
149-154:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse presence-based overrides so explicit
{},10, and0are honored.Current checks treat falsy/default-valued inputs as omitted, so caller-provided overrides can be dropped.
🔧 Minimal fix
- resolved_filters = body.filters - resolved_limit = body.limit - resolved_score_threshold = body.score_threshold + provided_fields = body.model_fields_set + resolved_filters = body.filters + resolved_limit = body.limit + resolved_score_threshold = body.score_threshold @@ - if not body.filters: + if "filters" not in provided_fields: resolved_filters = resolved["filters"] - if body.limit == 10: + if "limit" not in provided_fields: resolved_limit = resolved["limit"] - if body.score_threshold == 0: + if "score_threshold" not in provided_fields: resolved_score_threshold = resolved["score_threshold"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/chat.py` around lines 149 - 154, The code is using falsy/value-equality checks that drop explicit overrides (e.g., {} for filters, 10 and 0 for numeric fields); update the presence checks to use "is None" semantics so explicit values are preserved: replace the conditions that reference body.filters, body.limit == 10, and body.score_threshold == 0 with presence checks that only fallback to resolved["filters"], resolved["limit"], and resolved["score_threshold"] when body.filters, body.limit, or body.score_threshold are None respectively (refer to body.filters, body.limit, body.score_threshold and resolved_filters/resolved_limit/resolved_score_threshold to locate the logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/api/v1/chat.py`:
- Around line 17-19: Remove the unused import get_api_key_user_async from the
dependencies import list in src/api/v1/chat.py to satisfy Ruff F401; keep the
needed get_chat_service import and ensure any references to
get_api_key_user_async are not required elsewhere in this module before removing
it.
In `@src/api/v1/search.py`:
- Around line 53-68: Extract the filter-resolution/merge logic from the route
handler in src/api/v1/search.py into a new service/helper function (e.g.,
resolve_and_merge_filters) that accepts the request body (or its fields),
knowledge_filter_service, user_id and returns resolved_filters, resolved_limit,
and resolved_score_threshold; inside that helper call the existing
resolve_filter_id(...) and apply the same merge rules (use resolved["filters"]
when body.filters is empty, resolved["limit"] when body.limit == 10, and
resolved["score_threshold"] when body.score_threshold == 0). Replace the inline
block in the route with a single call to this helper and ensure the route
remains only transport/validation while resolve_filter_id, resolved_filters,
resolved_limit, resolved_score_threshold, body, knowledge_filter_service, and
user.user_id are used to locate and wire up the change.
- Around line 63-68: The override logic currently uses falsy/default-value
checks (if not body.filters, if body.limit == 10, if body.score_threshold == 0)
which treats valid values like {} , 0, or 10 as “not provided”; change these to
presence-based checks (e.g. test for None / presence) so explicit values are
honored: update the conditions around body.filters, body.limit, and
body.score_threshold to check for None (or an explicit "is provided" flag) and
only fall back to resolved["filters"], resolved["limit"], and
resolved["score_threshold"] when the field is actually missing; look for the
block that assigns resolved_filters, resolved_limit, and
resolved_score_threshold in search.py and replace the falsy/default comparisons
with presence checks.
---
Duplicate comments:
In `@src/api/v1/chat.py`:
- Around line 149-154: The code is using falsy/value-equality checks that drop
explicit overrides (e.g., {} for filters, 10 and 0 for numeric fields); update
the presence checks to use "is None" semantics so explicit values are preserved:
replace the conditions that reference body.filters, body.limit == 10, and
body.score_threshold == 0 with presence checks that only fallback to
resolved["filters"], resolved["limit"], and resolved["score_threshold"] when
body.filters, body.limit, or body.score_threshold are None respectively (refer
to body.filters, body.limit, body.score_threshold and
resolved_filters/resolved_limit/resolved_score_threshold to locate the logic).
In `@src/api/v1/search.py`:
- Around line 57-62: The call to resolve_filter_id in the search handler
currently passes jwt_token=None which breaks JWT/OIDC-backed filter resolution;
change the call to forward the caller's JWT instead of None by replacing
jwt_token=None with the request's JWT token (e.g., jwt_token variable from the
handler or the auth context) so the call becomes
resolve_filter_id(body.filter_id, knowledge_filter_service,
user_id=user.user_id, jwt_token=caller_jwt); ensure you reference the same token
variable used elsewhere in this module (the handler's JWT/auth token).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 013eba90-89ca-487f-b96f-a5ce0d07400f
📒 Files selected for processing (3)
src/api/v1/chat.pysrc/api/v1/documents.pysrc/api/v1/search.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/api/v1/documents.py
Add server-side resolution for filter_id and extend document deletion to accept a filter_id. Introduces api.v1._filter_resolution to normalize a filter_id into concrete filters/limits/score_thresholds and to strip wildcards. v1 endpoints (chat, search, documents) now accept filter_id and merge resolved values with inline overrides; documents DELETE can delete all filenames referenced by a filter_id (with wildcard/empty data_sources rejected to avoid mass deletion). SDK updates (Python and TypeScript) allow DocumentsClient.delete to accept either filename or filter_id (mutually exclusive), include additional response fields (filenames, filter_id, per_file), and preserve idempotent semantics for filename deletes while surfacing filter-not-found errors. Tests updated/added to cover filter_id behavior, inline overrides, streaming, and validation. Additional minor logging/validation and type updates included.
Expand the DELETE /v1/documents component description to indicate it can delete single or multiple documents and requires exactly one of `filename` or `filter_id`. Also notes that wildcards are rejected for safety, improving API documentation and removing ambiguity about deletion semantics.
Ensure API-key authenticated searches set the auth context and pass the user's JWT to the search service so downstream tools can resolve the user. Adds import for set_auth_context, calls it when handling API-key requests (mirroring v1 chat), and passes user.jwt_token to search_service.search instead of None. Also cleans up an unused typing import.
Clarify search component docs to document `filter_id` and inline `filters` (inline filters override per-field). Document that /v1/documents/ingest is intentionally not customized/exposed via MCP because multipart/form-data uploads are not supported by FastMCP's from_fastapi conversion; clients should use the HTTP API or SDK to ingest. Add a RouteMap to explicitly exclude POST /v1/documents/ingest and consolidate route maps to expose all /v1/ routes (including GET) as MCP tools, with a note explaining that GETs are exposed as tools to better support LLM agent workflows.
6fc2eb7 to
8224191
Compare
Add server-side resolution for filter_id and extend document deletion to accept a filter_id. Introduces api.v1._filter_resolution to normalize a filter_id into concrete filters/limits/score_thresholds and to strip wildcards. v1 endpoints (chat, search, documents) now accept filter_id and merge resolved values with inline overrides; documents DELETE can delete all filenames referenced by a filter_id (with wildcard/empty data_sources rejected to avoid mass deletion). SDK updates (Python and TypeScript) allow DocumentsClient.delete to accept either filename or filter_id (mutually exclusive), include additional response fields (filenames, filter_id, per_file), and preserve idempotent semantics for filename deletes while surfacing filter-not-found errors. Tests updated/added to cover filter_id behavior, inline overrides, streaming, and validation. Additional minor logging/validation and type updates included.
Summary by CodeRabbit
New Features
Bug Fixes
Tests