fix: connector sync and override feature#1663
Conversation
Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior.
Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts.
WalkthroughThis PR implements filename-based duplicate detection during connector file ingestion with optional replacement. It threads a ChangesDuplicate-Aware Connector Sync
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for filename-based duplicate handling during connector ingests, enabling a user-driven “overwrite vs skip” workflow and backend enforcement via OpenSearch filename lookups/deletions.
Changes:
- Backend: propagate
replace_duplicatesthrough connector sync APIs/services and enforce filename-based dedupe (fail vs delete+ingest) in connector processors. - Frontend: run pre-sync duplicate checks for provider uploads, show an overwrite/skip dialog, and send
replace_duplicateson sync requests. - Utils/tests: adjust connector filename normalization + filename aliasing, and add unit tests for alias and dedupe behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_file_utils_filename_aliases.py | Adds tests for filename alias generation and connector filename normalization. |
| tests/unit/test_connector_processor_filename_dedupe.py | Adds async unit tests covering filename-collision behavior with/without overwrite. |
| src/utils/file_utils.py | Updates connector filename normalization behavior and expands filename alias matching. |
| src/models/processors.py | Adds filename-exists gating and overwrite deletion logic to connector processors. |
| src/connectors/service.py | Passes replace_duplicates into ConnectorFileProcessor for specific-file syncs. |
| src/connectors/langflow_connector_service.py | Passes replace_duplicates into LangflowConnectorFileProcessor for specific-file syncs. |
| src/api/connectors.py | Extends sync request model to accept replace_duplicates and forwards it to services. |
| src/api/connector_router.py | Threads replace_duplicates through the active connector service router. |
| frontend/components/knowledge-dropdown.tsx | Enhances duplicate dialog state to include duplicate filenames for folder uploads. |
| frontend/components/duplicate-handling-dialog.tsx | Displays duplicate filenames (up to a limit) and updates messaging/labels. |
| frontend/app/upload/[provider]/page.tsx | Adds pre-sync duplicate checks + overwrite/skip dialog and sends replace_duplicates. |
| frontend/app/api/mutations/useSyncConnector.ts | Extends sync mutation request type with replace_duplicates. |
Comments suppressed due to low confidence (3)
src/models/processors.py:571
- When
replace_duplicatesis true you delete by filename before computing/checking the incoming file hash. If the incoming hash already exists (soprocess_document_standardreturns{"status": "unchanged"}), you can delete the old filename document without creating a replacement (data loss). Compute/check hash first and handle the hash-already-exists case explicitly before runningdelete_document_by_filename.
This issue also appears on line 707 of the same file.
with auto_cleanup_tempfile(suffix=suffix) as tmp_path:
# Write content to temp file
with open(tmp_path, "wb") as f:
f.write(document.content)
src/models/processors.py:702
- Same as
ConnectorFileProcessor: filename lookup/delete usesdocument.filenameeven though the task filename is normalized viaclean_connector_filename(...)above. If the MIME-mapped extension is enforced, the dedupe query can miss existing docs and the error message can show a different name than what is indexed. Use the normalized filename consistently for lookup/delete/error text (and fororiginal_filenamewhere applicable).
return
await self.delete_document_by_filename(document.filename, opensearch_client)
# Create temporary file and compute hash to check for duplicates
suffix = get_file_extension(document.mimetype)
src/models/processors.py:711
delete_document_by_filename(...)runs before the hash duplicate check. If the new content’s hash already exists elsewhere, the processor may return "unchanged" and skip ingest after deleting the prior filename document. Reorder so hash existence is handled before deletion (or otherwise guarantee the replacement will be indexed) to prevent deleting without replacement.
# Compute hash and check if already exists
file_hash = hash_id(tmp_path)
if await self.check_document_exists(file_hash, opensearch_client):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| file_task.status = TaskStatus.FAILED | ||
| file_task.error = f"File with name '{document.filename}' already exists" | ||
| file_task.updated_at = time.time() | ||
| upload_task.failed_files += 1 | ||
| return |
| """ | ||
| clean_name = filename.replace(" ", "_").replace("/", "_") | ||
| suffix = get_file_extension(mimetype) | ||
| if suffix is None: | ||
| # Unknown type — keep whatever extension the file already has | ||
| return clean_name | ||
| if not clean_name.lower().endswith(suffix.lower()): | ||
| return clean_name + suffix | ||
| return clean_name | ||
| return filename | ||
| if not filename.lower().endswith(suffix.lower()): |
| {visibleNames.map((name) => ( | ||
| <li key={name} className="break-all"> |
| isOverwriteConfirmedRef.current = true; | ||
| const { connector, allFiles } = pendingSync; | ||
| submitSync(connector, allFiles, true); | ||
| setPendingSync(null); |
| if (pendingFolderUpload) { | ||
| isFolderOverwriteConfirmedRef.current = true; | ||
| const { allFiles, duplicateCount, unsupportedCount } = | ||
| const { allFiles, duplicateNames, unsupportedCount } = | ||
| pendingFolderUpload; | ||
| await uploadFolderBatches(allFiles, true); |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/processors.py (1)
711-716:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing
processed_filesincrement on hash-based early return.When the document hash already exists and the file is marked "unchanged", the method returns without incrementing
processed_files.🐛 Proposed fix
if await self.check_document_exists(file_hash, opensearch_client): file_task.status = TaskStatus.COMPLETED file_task.result = {"status": "unchanged", "id": file_hash} file_task.updated_at = time.time() upload_task.successful_files += 1 + upload_task.processed_files += 1 + upload_task.updated_at = time.time() return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/processors.py` around lines 711 - 716, The early-return branch that handles existing document hashes (the if that calls self.check_document_exists(...)) marks file_task as COMPLETED and increments upload_task.successful_files but fails to increment upload_task.processed_files; update that branch in the same block (where file_task.status, file_task.result, file_task.updated_at are set) to also increment upload_task.processed_files before returning so processed_files reflects the handled file.src/api/connectors.py (1)
452-458:⚠️ Potential issue | 🟠 Major | ⚡ Quick winForward
replace_duplicatesin the bucket-filter sync path.
replace_duplicatesis passed for explicit file selection but dropped for bucket-filter-based syncs, so the same request flag behaves inconsistently across valid ingest paths.Suggested fix
task_id = await connector_service.sync_specific_files( working_connection.connection_id, user.user_id, all_file_ids, jwt_token=jwt_token, ingest_settings=body.settings, + replace_duplicates=body.replace_duplicates, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/connectors.py` around lines 452 - 458, The bucket-filter sync call to connector_service.sync_specific_files is dropping the replace_duplicates flag, causing inconsistent behavior; modify the call in connectors.py to forward the flag (e.g., add replace_duplicates=body.replace_duplicates or the equivalent request field) alongside jwt_token and ingest_settings when invoking connector_service.sync_specific_files with working_connection.connection_id, user.user_id, and all_file_ids.
🧹 Nitpick comments (1)
src/utils/file_utils.py (1)
129-131: 💤 Low valueStale comment references removed behavior.
The comment says "Mirror clean_connector_filename's space/slash -> underscore" but
clean_connector_filenameno longer performs this transformation (it now preserves the filename verbatim). The comment should explain that connector-ingested files may have been sanitized historically or by upstream systems, so aliases must include underscore variants for lookup matching.📝 Suggested comment update
- # Mirror clean_connector_filename's space/slash -> underscore so lookups also - # match files indexed through a connector ingestion path. + # Connector-ingested files may have spaces/slashes replaced with underscores + # by upstream systems. Include underscore variants so lookups match both forms. aliases.extend(name.replace(" ", "_").replace("/", "_") for name in list(aliases))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/file_utils.py` around lines 129 - 131, The existing comment above the aliases.extend(...) line is stale because clean_connector_filename no longer replaces spaces/slashes with underscores; update that comment to state that connector-ingested filenames may have been sanitized historically or by upstream systems so we still generate underscore variants for lookup compatibility, and reference the aliases.extend(name.replace(" ", "_").replace("/", "_") for name in list(aliases)) expression and clean_connector_filename to make clear why the alias variants are kept for matching.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/app/upload/`[provider]/page.tsx:
- Around line 496-499: The probe File used for duplicateCheck is created without
preserving the original MIME type (see fakeFile and duplicateCheck usage), which
can cause backend filename normalization mismatches; update the probe
construction to include the original file's MIME type (use the file.type when
creating fakeFile) so the duplicate pre-check matches real uploads and
accurately detects duplicates.
In `@src/models/processors.py`:
- Around line 692-698: When filename collision is detected in the block that
calls await self.check_filename_exists(document.filename, opensearch_client) and
you take the early return because not self.replace_duplicates, increment
upload_task.processed_files (same fix as in ConnectorFileProcessor) before
setting file_task.status/updated_at and returning; ensure you update
upload_task.processed_files and persist any state changes to upload_task in the
same branch so processed_files reflects the skipped file.
- Around line 556-562: When a filename collision occurs in the code path that
checks await self.check_filename_exists(document.filename, opensearch_client)
and replace_duplicates is False, the method returns after incrementing
upload_task.failed_files but never increments upload_task.processed_files; fix
this by ensuring upload_task.processed_files is incremented on that early return
(or refactor to use a finally block like DocumentFileProcessor.process_item so
processed_files is always incremented regardless of early exits), updating the
block that sets file_task.status/ error and returns to also increment
upload_task.processed_files (or move the increment into a finally that encloses
the entire processing flow).
---
Outside diff comments:
In `@src/api/connectors.py`:
- Around line 452-458: The bucket-filter sync call to
connector_service.sync_specific_files is dropping the replace_duplicates flag,
causing inconsistent behavior; modify the call in connectors.py to forward the
flag (e.g., add replace_duplicates=body.replace_duplicates or the equivalent
request field) alongside jwt_token and ingest_settings when invoking
connector_service.sync_specific_files with working_connection.connection_id,
user.user_id, and all_file_ids.
In `@src/models/processors.py`:
- Around line 711-716: The early-return branch that handles existing document
hashes (the if that calls self.check_document_exists(...)) marks file_task as
COMPLETED and increments upload_task.successful_files but fails to increment
upload_task.processed_files; update that branch in the same block (where
file_task.status, file_task.result, file_task.updated_at are set) to also
increment upload_task.processed_files before returning so processed_files
reflects the handled file.
---
Nitpick comments:
In `@src/utils/file_utils.py`:
- Around line 129-131: The existing comment above the aliases.extend(...) line
is stale because clean_connector_filename no longer replaces spaces/slashes with
underscores; update that comment to state that connector-ingested filenames may
have been sanitized historically or by upstream systems so we still generate
underscore variants for lookup compatibility, and reference the
aliases.extend(name.replace(" ", "_").replace("/", "_") for name in
list(aliases)) expression and clean_connector_filename to make clear why the
alias variants are kept for matching.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 439d890b-7e66-4bc2-bd82-42aa791ee520
📒 Files selected for processing (12)
frontend/app/api/mutations/useSyncConnector.tsfrontend/app/upload/[provider]/page.tsxfrontend/components/duplicate-handling-dialog.tsxfrontend/components/knowledge-dropdown.tsxsrc/api/connector_router.pysrc/api/connectors.pysrc/connectors/langflow_connector_service.pysrc/connectors/service.pysrc/models/processors.pysrc/utils/file_utils.pytests/unit/test_connector_processor_filename_dedupe.pytests/unit/test_file_utils_filename_aliases.py
| const fakeFile = new File([], file.name); | ||
| const { exists } = await duplicateCheck(fakeFile); | ||
| return { file, isDuplicate: exists }; | ||
| } catch (err) { |
There was a problem hiding this comment.
Preserve MIME type when building the duplicate-check probe file.
The duplicate pre-check currently builds a probe File without MIME type, which can diverge from backend filename normalization and miss real duplicates.
Suggested fix
- const fakeFile = new File([], file.name);
+ const fakeFile = new File([], file.name, {
+ type: file.mimeType || "application/octet-stream",
+ });
const { exists } = await duplicateCheck(fakeFile);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const fakeFile = new File([], file.name); | |
| const { exists } = await duplicateCheck(fakeFile); | |
| return { file, isDuplicate: exists }; | |
| } catch (err) { | |
| const fakeFile = new File([], file.name, { | |
| type: file.mimeType || "application/octet-stream", | |
| }); | |
| const { exists } = await duplicateCheck(fakeFile); | |
| return { file, isDuplicate: exists }; | |
| } catch (err) { |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/app/upload/`[provider]/page.tsx around lines 496 - 499, The probe
File used for duplicateCheck is created without preserving the original MIME
type (see fakeFile and duplicateCheck usage), which can cause backend filename
normalization mismatches; update the probe construction to include the original
file's MIME type (use the file.type when creating fakeFile) so the duplicate
pre-check matches real uploads and accurately detects duplicates.
| if await self.check_filename_exists(document.filename, opensearch_client): | ||
| if not self.replace_duplicates: | ||
| file_task.status = TaskStatus.FAILED | ||
| file_task.error = f"File with name '{document.filename}' already exists" | ||
| file_task.updated_at = time.time() | ||
| upload_task.failed_files += 1 | ||
| return |
There was a problem hiding this comment.
Missing processed_files increment on early return.
When the filename collision check fails (replace_duplicates=False), the method returns after incrementing failed_files but doesn't increment processed_files. Compare with DocumentFileProcessor.process_item which uses a finally block to ensure processed_files is always incremented. This inconsistency may cause progress tracking issues where processed_files never equals total_files.
🐛 Proposed fix
if await self.check_filename_exists(document.filename, opensearch_client):
if not self.replace_duplicates:
file_task.status = TaskStatus.FAILED
file_task.error = f"File with name '{document.filename}' already exists"
file_task.updated_at = time.time()
upload_task.failed_files += 1
+ upload_task.processed_files += 1
+ upload_task.updated_at = time.time()
return
await self.delete_document_by_filename(document.filename, opensearch_client)Alternatively, consider adding a finally block like DocumentFileProcessor to ensure processed_files is always incremented.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/processors.py` around lines 556 - 562, When a filename collision
occurs in the code path that checks await
self.check_filename_exists(document.filename, opensearch_client) and
replace_duplicates is False, the method returns after incrementing
upload_task.failed_files but never increments upload_task.processed_files; fix
this by ensuring upload_task.processed_files is incremented on that early return
(or refactor to use a finally block like DocumentFileProcessor.process_item so
processed_files is always incremented regardless of early exits), updating the
block that sets file_task.status/ error and returns to also increment
upload_task.processed_files (or move the increment into a finally that encloses
the entire processing flow).
| if await self.check_filename_exists(document.filename, opensearch_client): | ||
| if not self.replace_duplicates: | ||
| file_task.status = TaskStatus.FAILED | ||
| file_task.error = f"File with name '{document.filename}' already exists" | ||
| file_task.updated_at = time.time() | ||
| upload_task.failed_files += 1 | ||
| return |
There was a problem hiding this comment.
Missing processed_files increment on early return (same issue as ConnectorFileProcessor).
Same as the ConnectorFileProcessor issue—when the filename collision check fails, processed_files is not incremented.
🐛 Proposed fix
if await self.check_filename_exists(document.filename, opensearch_client):
if not self.replace_duplicates:
file_task.status = TaskStatus.FAILED
file_task.error = f"File with name '{document.filename}' already exists"
file_task.updated_at = time.time()
upload_task.failed_files += 1
+ upload_task.processed_files += 1
+ upload_task.updated_at = time.time()
return
await self.delete_document_by_filename(document.filename, opensearch_client)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/processors.py` around lines 692 - 698, When filename collision is
detected in the block that calls await
self.check_filename_exists(document.filename, opensearch_client) and you take
the early return because not self.replace_duplicates, increment
upload_task.processed_files (same fix as in ConnectorFileProcessor) before
setting file_task.status/updated_at and returning; ensure you update
upload_task.processed_files and persist any state changes to upload_task in the
same branch so processed_files reflects the skipped file.
* Add filename-based duplicate handling for connectors Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior. * Use duplicateNames list and display names Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts. * Update page.tsx * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Add filename-based duplicate handling for connectors Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior. * Use duplicateNames list and display names Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts. * Update page.tsx * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* fix: Ensure SUCCESS status requires fetchable result in DoclingPollingService * style: ruff autofix (auto) * fix: Catch specific DoclingServeError when fetching task result after SUCCESS status * feat: update style for oss of the failed task in the task panel (#1647) * update style for oss of the failed task in the task panel * keep logic on click, remove unecessary useeffect * fix padding * wip implementing Saas style * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need and fixinf fallbacks of isTotalFailure * utils to reshape error until backend provide into * have Saas style for failed and complete labelstatus and width and border * few style adjustment to follow codebase pattern * adjust succeed and partially succeed case * adding comment for TODO implementation or more clarity * remove carbon icon package and replace carbon icon * add incident-reporter-icon --------- Co-authored-by: Olfa Maslah <olfamaslah@Olfas-MacBook-Pro.local> * fix: Encode IBM API key as Basic auth header (#1664) * Encode IBM API key as Basic auth header Add base64 encoding for the IBM auth path: import base64, construct a Basic auth token from X-Username and X-Api-Key (username:apikey), and store it in user.jwt_token and user.opensearch_credentials. Also set request.state.user before attaching the DB user ID so downstream code can access the created user object. * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: restart deployment if env changes (#1665) * restart deployment if env changes * unit test * lint * fix: Ensure Langflow .env variable definitions from LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT (#1667) * Ensure we dynamically update the list of Langflow .env environment variables with default values when the comma separated list defined in LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT changes * fix tests * fix additional linting errors --------- Co-authored-by: rodageve <rodrigo.geve@datastax.com> * chore: Retire openrag-mcp; switch docs to streamable HTTP (#1668) * Retire openrag-mcp; switch docs to streamable HTTP Remove the stdio-based MCP server and all in-repo MCP tooling, and update README to mark the package as retired. Deleted module files include the MCP entrypoint, server, config, registry and individual tools (chat, search, documents, settings). The README was rewritten to announce that openrag-mcp is retired, explain migration to the built-in streamable-HTTP /mcp endpoint, update Cursor/Claude examples to use URL+headers auth, list the new v1 API tools, and note that the last PyPI release is final. This change consolidates MCP functionality into the OpenRAG core and removes the subprocess/stdio implementation and its source code. * Mark MCP SDK retired and clean package metadata Update package metadata to reflect retirement and integration into the OpenRAG backend. Bump version to 0.3.0 and replace the project description with a retirement/migration note. Set Development Status to Inactive, remove explicit Python version classifiers, and clear runtime dependencies and the CLI script entrypoint. Also remove the hatch env pip-args setting; build-system and wheel package target remain unchanged. * chore: update uv.lock files after version bump * Update uv.lock --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix: connector sync and override feature (#1663) * Add filename-based duplicate handling for connectors Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior. * Use duplicateNames list and display names Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts. * Update page.tsx * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: update OAuth prompt to consent for connector mutation (#1657) * fix: implement transient error handling for Docling result fetch * style: ruff autofix (auto) * refactor: remove unused import of Optional in docling_polling_service.py * refactor: change PollOutcome to use StrEnum for better type safety * refactor: enhance task status endpoints with structured failure metadata * style: ruff autofix (auto) * revert "style: ruff autofix (auto)" This reverts commit bc8be33. * style: ruff autofix (auto) * fix: Ensure SUCCESS status requires fetchable result in DoclingPollingService * style: ruff autofix (auto) * fix: Catch specific DoclingServeError when fetching task result after SUCCESS status * fix: implement transient error handling for Docling result fetch * style: ruff autofix (auto) * refactor: remove unused import of Optional in docling_polling_service.py * refactor: change PollOutcome to use StrEnum for better type safety * refactor: enhance task status endpoints with structured failure metadata * style: ruff autofix (auto) * revert "style: ruff autofix (auto)" This reverts commit bc8be33. * style: ruff autofix (auto) * Update tests/unit/test_task_service_get_task_status2.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * style: ruff autofix (auto) * fix: handle timeout during Docling result fetch after SUCCESS status * fix: update task status checks to use enum values for consistency * fix: enhance failure metadata for duplicate file errors in ingestion * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Wallgau <46035189+Wallgau@users.noreply.github.com> Co-authored-by: Olfa Maslah <olfamaslah@Olfas-MacBook-Pro.local> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: ming <itestmycode@gmail.com> Co-authored-by: rodageve <78763007+rodageve@users.noreply.github.com> Co-authored-by: rodageve <rodrigo.geve@datastax.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Add end-to-end support for filename-based duplicate handling on connector ingests.
Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI.
Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true.
Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior.
Summary by CodeRabbit
Release Notes
New Features
replace_duplicatesflag to control duplicate handling behavior during file uploads and syncs.Tests