fix: Jieba keyword extraction for hyphenated and underscore identifiers by mkumm14 · Pull Request #33701 · langgenius/dify

mkumm14 · 2026-03-18T21:51:30Z

Summary

Fixes a bug in Economy-mode (keyword-based) indexing where compound identifiers containing hyphens or underscores — such as st-771 or model_function_description — could not be recalled by keyword search.

Root cause: Jieba's tokeniser splits these identifiers into separate tokens (st, 771, function, description) before TF-IDF scoring. The whole term was never indexed as a unit, so an exact keyword search for st-771 returned no results.

Three-layer fix in JiebaKeywordTableHandler:

Raw-text scan — after TF-IDF runs, scan the original text with r"[a-zA-Z0-9]+(?:[_-][a-zA-Z0-9]+)+" to recover all hyphen- and underscore-joined terms directly, bypassing jieba's segmentation entirely.

Subtoken suppression — remove the loose parts (st, 771, function) from the keyword set when the whole compound term is already present, preventing index pollution with overly generic single-token matches.

Expand-tokens regex — change r"\w+" to r"\w+(?:-\w+)*" in _expand_tokens_with_subtokens so the expansion step never re-splits a hyphenated term that survived into the keyword set.

Fixes #33621, fixes #28237, related to #11908

Screenshots

Before:

After:

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

gemini-code-assist · 2026-03-18T21:51:46Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the Economy-mode keyword extraction, specifically for compound identifiers containing hyphens or underscores. Previously, Jieba's tokenization would split these identifiers, making them unsearchable as whole units. The changes introduce a robust, multi-layered fix to ensure that such technical terms are correctly indexed and retrievable, significantly improving the accuracy of keyword-based search for these types of identifiers.

Highlights

Raw-text scan for compound identifiers: Implemented a raw-text scan using a regular expression to directly recover hyphenated and underscore-joined terms, bypassing Jieba's default segmentation.
Subtoken suppression: Added logic to remove individual sub-tokens (e.g., 'st', '771') from the keyword set when the complete compound term (e.g., 'st-771') is already present, preventing index pollution.
Expanded tokens regex update: Modified the regex in _expand_tokens_with_subtokens from r"\w+" to r"\w+(?:-\w+)*" to ensure that hyphenated terms are treated as atomic units and are not re-split during keyword expansion.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes a bug where compound identifiers with hyphens or underscores were not correctly indexed, by implementing a three-part fix: adjusting regex for tokenization, explicitly extracting compound terms, and suppressing their subtokens. The changes are logical and well-tested. I've found a couple of areas for improvement: one is a performance issue related to stopword checking, and the other is a potential regression in Unicode support in one of the new regular expressions, which is critical for a library like Jieba.

Copilot

Pull request overview

Fixes keyword-based (Economy-mode) indexing/recall for hyphenated/underscore compound identifiers by preserving/recovering them as atomic keywords throughout extraction and query expansion.

Changes:

Update fallback tokenization and token expansion regex to keep hyphenated identifiers intact.
Enhance extract_keywords() to recover _/- joined compound terms from raw text and suppress their split subtokens.
Add unit tests covering hyphenated/underscore identifier extraction and expansion behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py`	Preserve hyphenated tokens in fallback/expansion and add raw-text compound-term recovery + subtoken suppression.
`api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py`	New unit tests validating hyphen/underscore identifier behavior across extraction and expansion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

Fixes Economy-mode keyword indexing/retrieval for hyphenated and underscore compound identifiers (e.g., st-771, model_function_description) by preserving compounds through TF-IDF fallback tokenization, raw-text compound recovery, and query/index token expansion.

Changes:

Update fallback tokenization and query expansion regex to keep hyphenated identifiers intact.
Post-process TF-IDF keywords by recovering _/- compound terms directly from raw text and suppressing their split subtokens.
Add unit tests covering hyphen/underscore compound preservation and recovery behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py`	Adjusts tokenization/expansion regex and adds raw-text compound recovery + subtoken suppression to improve exact compound recall.
`api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py`	Adds tests to validate compound preservation in expansion and compound recovery in extraction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

Fixes Economy-mode keyword indexing/search for compound identifiers containing hyphens/underscores (e.g. st-771, model_function_description) by preserving them as atomic keywords during extraction and query expansion.

Changes:

Update fallback tokenization and token expansion regexes to keep hyphenated identifiers intact.
Extend extract_keywords() with a raw-text regex pass to add hyphen/underscore compound terms into the keyword set (plus an attempted subtoken cleanup step).
Add unit tests covering hyphen/underscore handling and query-expansion behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py	Preserves hyphenated tokens during fallback/expansion; adds raw-text compound term recovery during keyword extraction.
api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py	New tests for hyphen/underscore keyword handling and expansion behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…eba keyword extraction Jieba's tokeniser splits compound identifiers like 'st-771' (hyphen) and 'model_function_description' (underscore) into separate tokens before TF-IDF scoring. This caused Economy-mode keyword search to fail for any technical identifier containing these characters — the terms were never indexed as whole units and could not be recalled by exact keyword match. Three-layer fix in JiebaKeywordTableHandler: 1. Raw-text scan in extract_keywords: after TF-IDF runs, scan the original text with r"[a-zA-Z0-9]+(?:[_-][a-zA-Z0-9]+)+" to recover all hyphen- and underscore-joined terms, bypassing jieba's segmentation. 2. Subtoken suppression: remove loose parts ('st', '771', 'function') from the keyword set when the whole compound term is already present. 3. Expand-tokens regex: change r"\w+" to r"\w+(?:-\w+)*" in _expand_tokens_with_subtokens so expansion never re-splits a hyphated term that survived into the keyword set. Fixes langgenius#33621, fixes langgenius#28237, related to langgenius#11908

Copilot

Pull request overview

Fixes Economy-mode keyword indexing/search for compound identifiers containing hyphens/underscores (e.g., st-771, model_function_description) by preventing unintended token splitting and explicitly recovering these terms from raw text so exact keyword recall works.

Changes:

Update regex tokenization to keep hyphenated tokens intact (\w+(?:-\w+)*) in both fallback TF-IDF and query/token expansion.
Add raw-text regex scan in extract_keywords() to recover hyphen/underscore-joined identifiers into the keyword set.
Add unit tests covering hyphen/underscore handling and expansion behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py	Adjusts tokenization/expansion and adds raw-text recovery for compound identifiers during keyword extraction.
api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py	Adds regression/unit tests ensuring hyphenated and underscore identifiers remain searchable and aren’t re-split during expansion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

github-actions · 2026-03-20T02:25:59Z

Pyrefly Diff

base → PR

--- /tmp/pyrefly_base.txt	2026-03-20 02:25:43.289769070 +0000
+++ /tmp/pyrefly_pr.txt	2026-03-20 02:25:34.043736758 +0000
@@ -115,11 +115,11 @@
 ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
   --> core/prompt/utils/extract_thread_messages.py:11:5
 ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:143:36
+   --> core/rag/datasource/keyword/jieba/jieba.py:137:36
 ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:145:13
+   --> core/rag/datasource/keyword/jieba/jieba.py:139:13
 ERROR Cannot index into `set[Any]` [bad-index]
-   --> core/rag/datasource/keyword/jieba/jieba.py:158:29
+   --> core/rag/datasource/keyword/jieba/jieba.py:152:29
 ERROR No matching overload found for function `list.__init__` called with arguments: (object) [no-matching-overload]
   --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:34
 ERROR Class member `AnalyticdbVector.add_texts` overrides parent class `BaseVector` in an inconsistent manner [bad-override]
@@ -399,50 +399,50 @@
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:368:21
 ERROR Cannot index into `Generator[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/client.py:210:54
+   --> core/rag/extractor/watercrawl/client.py:188:54
 ERROR Cannot index into `bytes` [bad-index]
-   --> core/rag/extractor/watercrawl/client.py:210:54
+   --> core/rag/extractor/watercrawl/client.py:188:54
 ERROR Cannot index into `list[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/client.py:210:54
+   --> core/rag/extractor/watercrawl/client.py:188:54
 ERROR `None` is not subscriptable [unsupported-operation]
-   --> core/rag/extractor/watercrawl/client.py:210:54
+   --> core/rag/extractor/watercrawl/client.py:188:54
 ERROR Object of class `Generator` has no attribute `get`
 ERROR Cannot index into `Generator[Unknown]` [bad-index]
-  --> core/rag/extractor/watercrawl/provider.py:70:12
+  --> core/rag/extractor/watercrawl/provider.py:47:12
 ERROR Cannot index into `bytes` [bad-index]
-  --> core/rag/extractor/watercrawl/provider.py:70:12
+  --> core/rag/extractor/watercrawl/provider.py:47:12
 ERROR Cannot index into `list[Unknown]` [bad-index]
-  --> core/rag/extractor/watercrawl/provider.py:70:12
+  --> core/rag/extractor/watercrawl/provider.py:47:12
 ERROR `None` is not subscriptable [unsupported-operation]
-  --> core/rag/extractor/watercrawl/provider.py:70:12
+  --> core/rag/extractor/watercrawl/provider.py:47:12
 ERROR Object of class `Generator` has no attribute `get`
 ERROR Object of class `Generator` has no attribute `get`
 ERROR Object of class `Generator` has no attribute `get`
 ERROR Object of class `Generator` has no attribute `get`
-ERROR Argument `Generator[Unknown] | bytes | dict[Unknown, Unknown] | list[Unknown] | Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
-   --> core/rag/extractor/watercrawl/provider.py:110:37
+ERROR Argument `Generator[Unknown] | bytes | dict[Unknown, Unknown] | list[Unknown] | Unknown | None` is not assignable to parameter `result_object` with type `dict[Unknown, Unknown]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
+  --> core/rag/extractor/watercrawl/provider.py:87:37
 ERROR Cannot index into `Generator[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:135:20
+   --> core/rag/extractor/watercrawl/provider.py:110:20
 ERROR Cannot index into `bytes` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:135:20
+   --> core/rag/extractor/watercrawl/provider.py:110:20
 ERROR Cannot index into `list[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:135:20
+   --> core/rag/extractor/watercrawl/provider.py:110:20
 ERROR `None` is not subscriptable [unsupported-operation]
-   --> core/rag/extractor/watercrawl/provider.py:135:20
+   --> core/rag/extractor/watercrawl/provider.py:110:20
 ERROR Cannot index into `Generator[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:141:16
+   --> core/rag/extractor/watercrawl/provider.py:116:16
 ERROR Cannot index into `bytes` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:141:16
+   --> core/rag/extractor/watercrawl/provider.py:116:16
 ERROR Cannot index into `list[Unknown]` [bad-index]
-   --> core/rag/extractor/watercrawl/provider.py:141:16
+   --> core/rag/extractor/watercrawl/provider.py:116:16
 ERROR `None` is not subscriptable [unsupported-operation]
-   --> core/rag/extractor/watercrawl/provider.py:141:16
+   --> core/rag/extractor/watercrawl/provider.py:116:16
 ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | Any | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
-   --> core/rag/extractor/word_extractor.py:372:13
+   --> core/rag/extractor/word_extractor.py:371:13
 ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | Any | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
-   --> core/rag/extractor/word_extractor.py:393:25
+   --> core/rag/extractor/word_extractor.py:392:25
 ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
-   --> core/rag/extractor/word_extractor.py:426:24
+   --> core/rag/extractor/word_extractor.py:425:24
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
    --> core/rag/index_processor/processor/paragraph_index_processor.py:194:33
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -586,7 +586,7 @@
 ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
   --> tests/integration_tests/controllers/console/app/test_chat_message_permissions.py:30:22
 ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
-  --> tests/integration_tests/controllers/console/app/test_feedback_export_api.py:32:22
+  --> tests/integration_tests/controllers/console/app/test_feedback_export_api.py:31:22
 ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
   --> tests/integration_tests/controllers/console/app/test_model_config_permissions.py:28:22
 ERROR Attribute `current_tenant_id` of class `Account` is a read-only property and cannot be set [read-only]
@@ -606,15 +606,15 @@
 ERROR `unpatch` may be uninitialized [unbound-name]
   --> tests/integration_tests/plugin/__mock/http.py:62:9
 ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
-   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:610:21
+   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:609:21
 ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
-   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:620:21
+   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:619:21
 ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
-   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:631:21
+   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:630:21
 ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
-   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:640:21
+   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:639:21
 ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
-   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:806:21
+   --> tests/integration_tests/services/test_workflow_draft_variable_service.py:805:21
 ERROR Argument `Index | None` is not assignable to parameter `index` with type `Index` in function `tcvectordb.rpc.model.collection.RPCCollection.__init__` [bad-argument-type]
   --> tests/integration_tests/vdb/__mock/tcvectordb.py:90:13
 ERROR Argument `xinference_client.types.Embedding | None` is not assignable to parameter `embedding` with type `tcvectordb.model.collection_view.Embedding` in function `tcvectordb.rpc.model.collection.RPCCollection.__init__` [bad-argument-type]
@@ -940,31 +940,31 @@
 ERROR Argument `Literal['archive']` is not assignable to parameter `value` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
     --> tests/test_containers_integration_tests/services/test_account_service.py:3334:25
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:134:38
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:133:38
 ERROR `str` is not assignable to attribute `agent_mode` with type `Never` [bad-assignment]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:138:47
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:137:47
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:249:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:248:29
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:275:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:274:29
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:564:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:563:29
 ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:565:24
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:564:24
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:799:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:798:29
 ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:800:24
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:799:24
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:894:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:893:29
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:963:29
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:962:29
 ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/services/test_agent_service.py:964:24
+   --> tests/test_containers_integration_tests/services/test_agent_service.py:963:24
 ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-    --> tests/test_containers_integration_tests/services/test_agent_service.py:1004:29
+    --> tests/test_containers_integration_tests/services/test_agent_service.py:1003:29
 ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
-    --> tests/test_containers_integration_tests/services/test_agent_service.py:1005:24
+    --> tests/test_containers_integration_tests/services/test_agent_service.py:1004:24
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
    --> tests/test_containers_integration_tests/services/test_annotation_service.py:101:38
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1162,19 +1162,19 @@
 ERROR Argument `Literal['owner']` is not assignable to parameter `role` with type `SQLCoreOperations[TenantAccountRole] | TenantAccountRole` in function `models.account.TenantAccountJoin.__init__` [bad-argument-type]
   --> tests/test_containers_integration_tests/services/test_end_user_service.py:37:18
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/services/test_file_service.py:69:20
+  --> tests/test_containers_integration_tests/services/test_file_service.py:68:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/services/test_file_service.py:78:20
+  --> tests/test_containers_integration_tests/services/test_file_service.py:77:20
 ERROR Argument `str` is not assignable to parameter `role` with type `SQLCoreOperations[TenantAccountRole] | TenantAccountRole` in function `models.account.TenantAccountJoin.__init__` [bad-argument-type]
   --> tests/test_containers_integration_tests/services/test_human_input_delivery_test.py:32:18
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/services/test_message_export_service.py:53:20
+  --> tests/test_containers_integration_tests/services/test_message_export_service.py:52:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/services/test_message_export_service.py:58:63
+  --> tests/test_containers_integration_tests/services/test_message_export_service.py:57:63
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> tests/test_containers_integration_tests/services/test_message_service.py:119:38
+   --> tests/test_containers_integration_tests/services/test_message_service.py:118:38
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> tests/test_containers_integration_tests/services/test_message_service.py:122:81
+   --> tests/test_containers_integration_tests/services/test_message_service.py:121:81
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
    --> tests/test_containers_integration_tests/services/test_messages_clean_service.py:112:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
@@ -1417,6 +1417,28 @@
   --> tests/test_containers_integration_tests/services/tools/test_mcp_tools_manage_service.py:63:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
   --> tests/test_containers_integration_tests/services/tools/test_mcp_tools_manage_service.py:72:20
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:51:17
+ERROR Unexpected keyword argument `credentials` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:54:17
+ERROR Unexpected keyword argument `provider_type` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:55:17
+ERROR Unexpected keyword argument `description` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:60:17
+ERROR Unexpected keyword argument `icon` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:61:17
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:62:17
+ERROR Unexpected keyword argument `credentials` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:66:17
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.WorkflowToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:73:17
+ERROR Unexpected keyword argument `workflow_id` in function `models.tools.WorkflowToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:76:17
+ERROR Unexpected keyword argument `description` in function `models.tools.MCPToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:81:17
+ERROR Unexpected keyword argument `provider_icon` in function `models.tools.MCPToolProvider.__init__` [unexpected-keyword]
+  --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:82:17
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
    --> tests/test_containers_integration_tests/services/tools/test_workflow_tools_manage_service.py:109:38
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1512,57 +1534,57 @@
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
   --> tests/test_containers_integration_tests/tasks/test_add_document_to_index_task.py:62:20
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:71:20
+  --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:70:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:80:20
+  --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:79:20
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:293:23
+   --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:292:23
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:667:27
+   --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:666:27
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:96:20
+  --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:95:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:105:20
+   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:104:20
 ERROR `>` is not supported between `None` and `Literal[0]` [unsupported-operation]
-   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:312:16
+   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:311:16
 ERROR `>` is not supported between `None` and `Literal[0]` [unsupported-operation]
-   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:704:16
+   --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:703:16
 ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:111:20
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:110:20
 ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:121:20
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:120:20
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:296:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:295:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:299:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:298:22
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:408:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:407:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:411:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:410:22
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:476:36
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:475:36
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:479:26
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:478:26
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:531:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:530:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:534:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:533:22
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:616:36
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:615:36
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:619:26
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:618:26
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:729:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:728:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:732:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:731:22
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:815:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:814:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:818:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:817:22
 ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:970:32
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:969:32
 ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
-   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:973:22
+   --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:972:22
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
   --> tests/test_containers_integration_tests/tasks/test_clean_notion_document_task.py:89:23
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1944,7 +1966,7 @@
 ERROR Object of class `tuple` has no attribute `json` [missing-attribute]
    --> tests/unit_tests/controllers/console/auth/test_token_refresh.py:192:16
 ERROR Object of class `BadRequest` has no attribute `data` [missing-attribute]
-    --> tests/unit_tests/controllers/console/datasets/test_datasets.py:1635:16
+    --> tests/unit_tests/controllers/console/datasets/test_datasets.py:1634:16
 ERROR Module `services.errors.chunk` exists, but was not imported explicitly. You are relying on other modules to load it. [implicit-import]
    --> tests/unit_tests/controllers/console/datasets/test_datasets_segments.py:891:29
 ERROR Module `services.errors.chunk` exists, but was not imported explicitly. You are relying on other modules to load it. [implicit-import]
@@ -2076,29 +2098,29 @@
 ERROR `SimpleNamespace` is not assignable to attribute `request` with type `Request` [bad-assignment]
    --> tests/unit_tests/controllers/files/test_upload.py:170:26
 ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:171:44
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:168:44
 ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:171:44
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:168:44
 ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:188:31
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:185:31
 ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:188:31
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:185:31
 ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:203:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:200:35
 ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:203:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:200:35
 ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:225:44
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:222:44
 ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:225:44
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:222:44
 ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:253:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:250:35
 ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:268:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:265:35
 ERROR Argument `type[TestPluginData.test_should_raise_error_on_invalid_payload.InvalidPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:286:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:283:35
 ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
-   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:299:35
+   --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:296:35
 ERROR `SimpleNamespace` is not assignable to attribute `db` with type `SQLAlchemy` [bad-assignment]
   --> tests/unit_tests/controllers/mcp/test_mcp.py:19:17
 ERROR `SimpleNamespace` is not assignable to attribute `mcp_ns` with type `Namespace` [bad-assignment]
@@ -2122,17 +2144,17 @@
 ERROR Missing argument `session` in function `services.conversation_service.ConversationService.pagination_by_last_id` [missing-argument]
    --> tests/unit_tests/controllers/service_api/app/test_conversation.py:383:59
 ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `limit` [bad-argument-type]
-  --> tests/unit_tests/controllers/service_api/app/test_message.py:94:63
+  --> tests/unit_tests/controllers/service_api/app/test_message.py:93:63
 ERROR Argument value `Literal[101]` violates Pydantic `le` constraint `Literal[100]` for field `limit` [bad-argument-type]
-   --> tests/unit_tests/controllers/service_api/app/test_message.py:100:63
+  --> tests/unit_tests/controllers/service_api/app/test_message.py:99:63
 ERROR Argument `str | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
-   --> tests/unit_tests/controllers/service_api/app/test_message.py:138:20
+   --> tests/unit_tests/controllers/service_api/app/test_message.py:137:20
 ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `page` [bad-argument-type]
-   --> tests/unit_tests/controllers/service_api/app/test_message.py:170:31
+   --> tests/unit_tests/controllers/service_api/app/test_message.py:169:31
 ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `limit` [bad-argument-type]
-   --> tests/unit_tests/controllers/service_api/app/test_message.py:183:31
+   --> tests/unit_tests/controllers/service_api/app/test_message.py:182:31
 ERROR Argument value `Literal[102]` violates Pydantic `le` constraint `Literal[101]` for field `limit` [bad-argument-type]
-   --> tests/unit_tests/controllers/service_api/app/test_message.py:188:31
+   --> tests/unit_tests/controllers/service_api/app/test_message.py:187:31
 ERROR Argument `list[dict[str, Any]] | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
   --> tests/unit_tests/controllers/service_api/app/test_workflow.py:92:20
 ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `page` [bad-argument-type]
@@ -2802,7 +2824,7 @@
 ERROR Argument `SimpleNamespace` is not assignable to parameter `user` with type `Account | EndUser` in function `core.app.apps.common.workflow_response_converter.WorkflowResponseConverter.__init__` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_resumption.py:32:14
 ERROR Object of class `NoneType` has no attribute `data` [missing-attribute]
-   --> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_truncation.py:588:16
+   --> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_truncation.py:543:16
 ERROR Argument `dict[str, dict[str, str]]` is not assignable to parameter `override_config_dict` with type `AppModelConfigDict | None` in function `core.app.apps.completion.app_config_manager.CompletionAppConfigManager.get_app_config` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/completion/test_completion_app_config_manager.py:29:34
 ERROR Object of class `FunctionType` has no attribute `assert_not_called` [missing-attribute]
@@ -3292,7 +3314,7 @@
 ERROR Object of class `BlobChunkMessage` has no attribute `text`
 ERROR Object of class `BlobChunkMessage` has no attribute `json_object`
 ERROR No matching overload found for function `list.__init__` called with arguments: (Generator[Unknown] | None) [no-matching-overload]
-   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:404:20
+   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:403:20
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
   --> tests/unit_tests/core/datasource/test_datasource_manager.py:52:5
 ERROR Argument `SimpleNamespace` is not assignable to parameter `datasource_type` with type `DatasourceProviderType` in function `core.datasource.datasource_manager.DatasourceManager.get_datasource_plugin_provider` [bad-argument-type]
@@ -3324,8 +3346,6 @@
    --> tests/unit_tests/core/datasource/test_datasource_manager.py:573:34
 ERROR Object of class `StreamChunkEvent` has no attribute `node_run_result` [missing-attribute]
    --> tests/unit_tests/core/datasource/test_datasource_manager.py:624:12
-ERROR `in` is not supported between `Literal['Single Page']` and `None` [not-iterable]
-   --> tests/unit_tests/core/datasource/test_website_crawl.py:989:16
 ERROR Argument `Iterator[DatasourceMessage]` is not assignable to parameter `messages` with type `Generator[DatasourceMessage]` in function `core.datasource.utils.message_transformer.DatasourceFileMessageTransformer.transform_datasource_invoke_messages` [bad-argument-type]
   --> tests/unit_tests/core/datasource/utils/test_message_transformer.py:28:26
 ERROR Object of class `BlobChunkMessage` has no attribute `text`
@@ -3956,6 +3976,20 @@
    --> tests/unit_tests/core/prompt/test_simple_prompt_transform.py:410:19
 ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `context_files` with type `list[File] | None` in function `core.prompt.simple_prompt_transform.SimplePromptTransform._get_last_user_message` [bad-argument-type]
    --> tests/unit_tests/core/prompt/test_simple_prompt_transform.py:411:27
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+  --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:10:60
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+  --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:13:60
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+  --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:18:56
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+  --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:23:54
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+  --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:29:43
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+   --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:167:41
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+   --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:175:43
 ERROR Argument `Literal['manhattan']` is not assignable to parameter `distance_function` with type `Literal['cosine', 'euclidean']` in function `core.rag.datasource.vdb.alibabacloud_mysql.alibabacloud_mysql_vector.AlibabaCloudMySQLVectorConfig.__init__` [bad-argument-type]
    --> tests/unit_tests/core/rag/datasource/vdb/alibabacloud_mysql/test_alibabacloud_mysql_vector.py:682:35
 ERROR Argument `list[str]` is not assignable to parameter `docs` with type `Sequence[Document]` in function `core.rag.docstore.dataset_docstore.DatasetDocumentStore.add_documents` [bad-argument-type]
@@ -4086,28 +4120,10 @@
     --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3769:67
 ERROR `None` is not subscriptable [unsupported-operation]
     --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4009:16
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4476:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4528:40
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
     --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4533:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4549:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4579:36
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
     --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4584:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4592:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4628:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4656:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4714:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
-    --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4758:44
 ERROR Argument `Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.router.multi_dataset_react_route.ReactMultiDatasetRouter._handle_invoke_result` [bad-argument-type]
    --> tests/unit_tests/core/rag/retrieval/test_multi_dataset_react_route.py:184:52
 ERROR Argument `None` is not assignable to parameter `text` with type `str` in function `core.rag.splitter.text_splitter.RecursiveCharacterTextSplitter.split_text` [bad-argument-type]
@@ -4923,8 +4939,6 @@
    --> tests/unit_tests/core/workflow/graph_engine/test_table_runner.py:443:23
 ERROR `graph_runtime_state` is uninitialized [unbound-name]
    --> tests/unit_tests/core/workflow/graph_engine/test_table_runner.py:444:37
-ERROR Argument `SimpleNamespace` is not assignable to parameter `node` with type `Node[Unknown]` in function `dify_graph.graph_engine.worker.Worker._build_fallback_failure_event` [bad-argument-type]
-  --> tests/unit_tests/core/workflow/graph_engine/test_worker.py:29:50
 ERROR Argument `dict[str, dict[str, str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `dify_graph.nodes.base.node.Node.__init__` [bad-argument-type]
   --> tests/unit_tests/core/workflow/nodes/answer/test_answer.py:82:16
 ERROR Class member `_Version1._get_retry_config` overrides parent class `Node` in an inconsistent manner [bad-override]
@@ -5115,14 +5129,6 @@
    --> tests/unit_tests/core/workflow/nodes/iteration/iteration_node_spec.py:147:16
 ERROR `dict[str, list[str] | str]` is not assignable to TypedDict key `data` with type `BaseNodeData` [bad-typed-dict-key]
   --> tests/unit_tests/core/workflow/nodes/iteration/test_iteration_child_engine_errors.py:51:21
-ERROR `() -> nullcontext[IExecutionContext]` is not assignable to attribute `_capture_execution_context` with type `(self: IterationNode) -> IExecutionContext` [bad-assignment]
-  --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:24:39
-ERROR No matching overload found for function `contextlib.nullcontext.__init__` called with arguments: () [no-matching-overload]
-  --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:24:58
-ERROR `(current: LLMUsage, new: Unknown) -> LLMUsage | Unknown` is not assignable to attribute `_merge_usage` with type `(current: LLMUsage, new_usage: LLMUsage | None) -> LLMUsage` [bad-assignment]
-  --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:26:25
-ERROR `(*, index: int, item: object, execution_context: object) -> tuple[float, list[NodeRunSucceededEvent], str, dict[Unknown, Unknown], LLMUsage]` is not assignable to attribute `_execute_single_iteration_parallel` with type `(self: IterationNode, index: int, item: object, execution_context: IExecutionContext) -> tuple[float, list[GraphNodeEventBase], object | None, dict[str, Variable], LLMUsage]` [bad-assignment]
-  --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:44:47
 ERROR Argument `dict[str, dict[str, list[str] | str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode.__init__` [bad-argument-type]
    --> tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py:119:20
 ERROR Argument `dict[str, bool]` is not assignable to parameter `summary_index_setting` with type `SummaryIndexSettingDict | None` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode._invoke_knowledge_index` [bad-argument-type]

github-actions · 2026-03-20T02:46:32Z

Pyrefly Diff

No changes detected.

Copilot

Pull request overview

Fixes keyword extraction/expansion for hyphenated and underscore compound identifiers in Jieba-based (economy-mode) keyword indexing, so exact searches like st-771 / model_function_description can be recalled.

Changes:

Update fallback tokenization regex to preserve hyphenated identifiers as single tokens.
Enhance extract_keywords with a raw-text regex pass to add hyphen/underscore compound identifiers into the keyword set.
Add unit tests covering hyphen/underscore keyword extraction and query-side expansion behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py	Adjusts tokenization/keyword extraction to preserve and recover hyphen/underscore compounds.
api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py	Adds tests to validate compound identifier handling in expansion and extraction paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-23T10:04:04Z

Pyrefly Diff

No changes detected.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-31T02:01:32Z

Pyrefly Diff

No changes detected.

mkumm14 requested review from JohnJyong, QuantumGhost and laipz8200 as code owners March 18, 2026 21:51

Copilot AI review requested due to automatic review settings March 18, 2026 21:51

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 18, 2026

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py Outdated

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py Outdated

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

mkumm14 force-pushed the fix/jieba-hyphenated-keyword-splitting branch from 2ae6d1c to 5b36802 Compare March 18, 2026 22:12

Copilot AI review requested due to automatic review settings March 18, 2026 22:15

mkumm14 force-pushed the fix/jieba-hyphenated-keyword-splitting branch from 5b36802 to 650de2a Compare March 18, 2026 22:15

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py Outdated

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py

mkumm14 force-pushed the fix/jieba-hyphenated-keyword-splitting branch from 650de2a to 413d7fb Compare March 18, 2026 22:38

mkumm14 requested a review from Copilot March 18, 2026 22:38

Copilot AI reviewed Mar 18, 2026

View reviewed changes

mkumm14 force-pushed the fix/jieba-hyphenated-keyword-splitting branch from 413d7fb to 73662cc Compare March 18, 2026 22:50

Copilot AI review requested due to automatic review settings March 18, 2026 23:02

mkumm14 force-pushed the fix/jieba-hyphenated-keyword-splitting branch from 73662cc to 54efab3 Compare March 18, 2026 23:02

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

crazywoola previously approved these changes Mar 20, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 20, 2026

[autofix.ci] apply automated fixes

f8cef97

autofix-ci bot dismissed crazywoola’s stale review via f8cef97 March 20, 2026 02:26

github-actions bot added the needs revision for anti-slop label Mar 20, 2026

crazywoola removed the lgtm This PR has been approved by a maintainer label Mar 20, 2026

crazywoola changed the title ~~Fix Jieba keyword extraction for hyphenated and underscore identifiers~~ fix: Jieba keyword extraction for hyphenated and underscore identifiers Mar 20, 2026

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

882aeb5

Copilot AI review requested due to automatic review settings March 23, 2026 09:55

Copilot started reviewing on behalf of crazywoola March 23, 2026 09:55 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

crazywoola added 2 commits March 23, 2026 20:19

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

4ee96b6

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

4b07814

Copilot AI review requested due to automatic review settings March 23, 2026 12:27

Copilot started reviewing on behalf of crazywoola March 23, 2026 12:28 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py

crazywoola and others added 2 commits March 23, 2026 20:43

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

7804d01

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

31f4d36

Copilot AI review requested due to automatic review settings March 24, 2026 22:30

Copilot started reviewing on behalf of mkumm14 March 24, 2026 22:30 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into fix/jieba-hyphenated-keyword-splitting

dc79b08

github-actions bot added the needs-revision label Mar 31, 2026

Conversation

mkumm14 commented Mar 18, 2026

Summary

Screenshots

Checklist

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 20, 2026

Pyrefly Diff

Uh oh!

github-actions bot commented Mar 20, 2026

Pyrefly Diff

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Pyrefly Diff

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026