fix: Jieba keyword extraction for hyphenated and underscore identifiers#33701
fix: Jieba keyword extraction for hyphenated and underscore identifiers#33701mkumm14 wants to merge 8 commits intolanggenius:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in the Economy-mode keyword extraction, specifically for compound identifiers containing hyphens or underscores. Previously, Jieba's tokenization would split these identifiers, making them unsearchable as whole units. The changes introduce a robust, multi-layered fix to ensure that such technical terms are correctly indexed and retrievable, significantly improving the accuracy of keyword-based search for these types of identifiers. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes a bug where compound identifiers with hyphens or underscores were not correctly indexed, by implementing a three-part fix: adjusting regex for tokenization, explicitly extracting compound terms, and suppressing their subtokens. The changes are logical and well-tested. I've found a couple of areas for improvement: one is a performance issue related to stopword checking, and the other is a potential regression in Unicode support in one of the new regular expressions, which is critical for a library like Jieba.
There was a problem hiding this comment.
Pull request overview
Fixes keyword-based (Economy-mode) indexing/recall for hyphenated/underscore compound identifiers by preserving/recovering them as atomic keywords throughout extraction and query expansion.
Changes:
- Update fallback tokenization and token expansion regex to keep hyphenated identifiers intact.
- Enhance
extract_keywords()to recover_/-joined compound terms from raw text and suppress their split subtokens. - Add unit tests covering hyphenated/underscore identifier extraction and expansion behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py |
Preserve hyphenated tokens in fallback/expansion and add raw-text compound-term recovery + subtoken suppression. |
api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py |
New unit tests validating hyphen/underscore identifier behavior across extraction and expansion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
2ae6d1c to
5b36802
Compare
5b36802 to
650de2a
Compare
There was a problem hiding this comment.
Pull request overview
Fixes Economy-mode keyword indexing/retrieval for hyphenated and underscore compound identifiers (e.g., st-771, model_function_description) by preserving compounds through TF-IDF fallback tokenization, raw-text compound recovery, and query/index token expansion.
Changes:
- Update fallback tokenization and query expansion regex to keep hyphenated identifiers intact.
- Post-process TF-IDF keywords by recovering
_/-compound terms directly from raw text and suppressing their split subtokens. - Add unit tests covering hyphen/underscore compound preservation and recovery behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py |
Adjusts tokenization/expansion regex and adds raw-text compound recovery + subtoken suppression to improve exact compound recall. |
api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py |
Adds tests to validate compound preservation in expansion and compound recovery in extraction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
650de2a to
413d7fb
Compare
There was a problem hiding this comment.
Pull request overview
Fixes Economy-mode keyword indexing/search for compound identifiers containing hyphens/underscores (e.g. st-771, model_function_description) by preserving them as atomic keywords during extraction and query expansion.
Changes:
- Update fallback tokenization and token expansion regexes to keep hyphenated identifiers intact.
- Extend
extract_keywords()with a raw-text regex pass to add hyphen/underscore compound terms into the keyword set (plus an attempted subtoken cleanup step). - Add unit tests covering hyphen/underscore handling and query-expansion behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py | Preserves hyphenated tokens during fallback/expansion; adds raw-text compound term recovery during keyword extraction. |
| api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py | New tests for hyphen/underscore keyword handling and expansion behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
413d7fb to
73662cc
Compare
…eba keyword extraction
Jieba's tokeniser splits compound identifiers like 'st-771' (hyphen) and
'model_function_description' (underscore) into separate tokens before
TF-IDF scoring. This caused Economy-mode keyword search to fail for any
technical identifier containing these characters — the terms were never
indexed as whole units and could not be recalled by exact keyword match.
Three-layer fix in JiebaKeywordTableHandler:
1. Raw-text scan in extract_keywords: after TF-IDF runs, scan the
original text with r"[a-zA-Z0-9]+(?:[_-][a-zA-Z0-9]+)+" to recover
all hyphen- and underscore-joined terms, bypassing jieba's segmentation.
2. Subtoken suppression: remove loose parts ('st', '771', 'function')
from the keyword set when the whole compound term is already present.
3. Expand-tokens regex: change r"\w+" to r"\w+(?:-\w+)*" in
_expand_tokens_with_subtokens so expansion never re-splits a hyphated
term that survived into the keyword set.
Fixes langgenius#33621, fixes langgenius#28237, related to langgenius#11908
73662cc to
54efab3
Compare
There was a problem hiding this comment.
Pull request overview
Fixes Economy-mode keyword indexing/search for compound identifiers containing hyphens/underscores (e.g., st-771, model_function_description) by preventing unintended token splitting and explicitly recovering these terms from raw text so exact keyword recall works.
Changes:
- Update regex tokenization to keep hyphenated tokens intact (
\w+(?:-\w+)*) in both fallback TF-IDF and query/token expansion. - Add raw-text regex scan in
extract_keywords()to recover hyphen/underscore-joined identifiers into the keyword set. - Add unit tests covering hyphen/underscore handling and expansion behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py | Adjusts tokenization/expansion and adds raw-text recovery for compound identifiers during keyword extraction. |
| api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py | Adds regression/unit tests ensuring hyphenated and underscore identifiers remain searchable and aren’t re-split during expansion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-03-20 02:25:43.289769070 +0000
+++ /tmp/pyrefly_pr.txt 2026-03-20 02:25:34.043736758 +0000
@@ -115,11 +115,11 @@
ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
--> core/prompt/utils/extract_thread_messages.py:11:5
ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:143:36
+ --> core/rag/datasource/keyword/jieba/jieba.py:137:36
ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:145:13
+ --> core/rag/datasource/keyword/jieba/jieba.py:139:13
ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:158:29
+ --> core/rag/datasource/keyword/jieba/jieba.py:152:29
ERROR No matching overload found for function `list.__init__` called with arguments: (object) [no-matching-overload]
--> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:34
ERROR Class member `AnalyticdbVector.add_texts` overrides parent class `BaseVector` in an inconsistent manner [bad-override]
@@ -399,50 +399,50 @@
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:368:21
ERROR Cannot index into `Generator[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/client.py:210:54
+ --> core/rag/extractor/watercrawl/client.py:188:54
ERROR Cannot index into `bytes` [bad-index]
- --> core/rag/extractor/watercrawl/client.py:210:54
+ --> core/rag/extractor/watercrawl/client.py:188:54
ERROR Cannot index into `list[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/client.py:210:54
+ --> core/rag/extractor/watercrawl/client.py:188:54
ERROR `None` is not subscriptable [unsupported-operation]
- --> core/rag/extractor/watercrawl/client.py:210:54
+ --> core/rag/extractor/watercrawl/client.py:188:54
ERROR Object of class `Generator` has no attribute `get`
ERROR Cannot index into `Generator[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:70:12
+ --> core/rag/extractor/watercrawl/provider.py:47:12
ERROR Cannot index into `bytes` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:70:12
+ --> core/rag/extractor/watercrawl/provider.py:47:12
ERROR Cannot index into `list[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:70:12
+ --> core/rag/extractor/watercrawl/provider.py:47:12
ERROR `None` is not subscriptable [unsupported-operation]
- --> core/rag/extractor/watercrawl/provider.py:70:12
+ --> core/rag/extractor/watercrawl/provider.py:47:12
ERROR Object of class `Generator` has no attribute `get`
ERROR Object of class `Generator` has no attribute `get`
ERROR Object of class `Generator` has no attribute `get`
ERROR Object of class `Generator` has no attribute `get`
-ERROR Argument `Generator[Unknown] | bytes | dict[Unknown, Unknown] | list[Unknown] | Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
- --> core/rag/extractor/watercrawl/provider.py:110:37
+ERROR Argument `Generator[Unknown] | bytes | dict[Unknown, Unknown] | list[Unknown] | Unknown | None` is not assignable to parameter `result_object` with type `dict[Unknown, Unknown]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
+ --> core/rag/extractor/watercrawl/provider.py:87:37
ERROR Cannot index into `Generator[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:135:20
+ --> core/rag/extractor/watercrawl/provider.py:110:20
ERROR Cannot index into `bytes` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:135:20
+ --> core/rag/extractor/watercrawl/provider.py:110:20
ERROR Cannot index into `list[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:135:20
+ --> core/rag/extractor/watercrawl/provider.py:110:20
ERROR `None` is not subscriptable [unsupported-operation]
- --> core/rag/extractor/watercrawl/provider.py:135:20
+ --> core/rag/extractor/watercrawl/provider.py:110:20
ERROR Cannot index into `Generator[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:141:16
+ --> core/rag/extractor/watercrawl/provider.py:116:16
ERROR Cannot index into `bytes` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:141:16
+ --> core/rag/extractor/watercrawl/provider.py:116:16
ERROR Cannot index into `list[Unknown]` [bad-index]
- --> core/rag/extractor/watercrawl/provider.py:141:16
+ --> core/rag/extractor/watercrawl/provider.py:116:16
ERROR `None` is not subscriptable [unsupported-operation]
- --> core/rag/extractor/watercrawl/provider.py:141:16
+ --> core/rag/extractor/watercrawl/provider.py:116:16
ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | Any | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
- --> core/rag/extractor/word_extractor.py:372:13
+ --> core/rag/extractor/word_extractor.py:371:13
ERROR Pyrefly detected conflicting types while breaking a dependency cycle: `str | Any | None` is not assignable to `None`. Adding explicit type annotations might possibly help. [bad-assignment]
- --> core/rag/extractor/word_extractor.py:393:25
+ --> core/rag/extractor/word_extractor.py:392:25
ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
- --> core/rag/extractor/word_extractor.py:426:24
+ --> core/rag/extractor/word_extractor.py:425:24
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/paragraph_index_processor.py:194:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -586,7 +586,7 @@
ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
--> tests/integration_tests/controllers/console/app/test_chat_message_permissions.py:30:22
ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
- --> tests/integration_tests/controllers/console/app/test_feedback_export_api.py:32:22
+ --> tests/integration_tests/controllers/console/app/test_feedback_export_api.py:31:22
ERROR Argument `Literal['normal']` is not assignable to parameter `value` with type `AppStatus | SQLCoreOperations[AppStatus]` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
--> tests/integration_tests/controllers/console/app/test_model_config_permissions.py:28:22
ERROR Attribute `current_tenant_id` of class `Account` is a read-only property and cannot be set [read-only]
@@ -606,15 +606,15 @@
ERROR `unpatch` may be uninitialized [unbound-name]
--> tests/integration_tests/plugin/__mock/http.py:62:9
ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
- --> tests/integration_tests/services/test_workflow_draft_variable_service.py:610:21
+ --> tests/integration_tests/services/test_workflow_draft_variable_service.py:609:21
ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
- --> tests/integration_tests/services/test_workflow_draft_variable_service.py:620:21
+ --> tests/integration_tests/services/test_workflow_draft_variable_service.py:619:21
ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
- --> tests/integration_tests/services/test_workflow_draft_variable_service.py:631:21
+ --> tests/integration_tests/services/test_workflow_draft_variable_service.py:630:21
ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
- --> tests/integration_tests/services/test_workflow_draft_variable_service.py:640:21
+ --> tests/integration_tests/services/test_workflow_draft_variable_service.py:639:21
ERROR Object of class `TestWorkflowDraftVariableServiceResetVariable` has no attribute `_test_user_id` [missing-attribute]
- --> tests/integration_tests/services/test_workflow_draft_variable_service.py:806:21
+ --> tests/integration_tests/services/test_workflow_draft_variable_service.py:805:21
ERROR Argument `Index | None` is not assignable to parameter `index` with type `Index` in function `tcvectordb.rpc.model.collection.RPCCollection.__init__` [bad-argument-type]
--> tests/integration_tests/vdb/__mock/tcvectordb.py:90:13
ERROR Argument `xinference_client.types.Embedding | None` is not assignable to parameter `embedding` with type `tcvectordb.model.collection_view.Embedding` in function `tcvectordb.rpc.model.collection.RPCCollection.__init__` [bad-argument-type]
@@ -940,31 +940,31 @@
ERROR Argument `Literal['archive']` is not assignable to parameter `value` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
--> tests/test_containers_integration_tests/services/test_account_service.py:3334:25
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:134:38
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:133:38
ERROR `str` is not assignable to attribute `agent_mode` with type `Never` [bad-assignment]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:138:47
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:137:47
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:249:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:248:29
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:275:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:274:29
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:564:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:563:29
ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:565:24
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:564:24
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:799:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:798:29
ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:800:24
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:799:24
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:894:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:893:29
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:963:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:962:29
ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:964:24
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:963:24
ERROR Argument `Literal['account']` is not assignable to parameter `created_by_role` with type `CreatorUserRole | SQLCoreOperations[CreatorUserRole]` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:1004:29
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:1003:29
ERROR Argument `str | None` is not assignable to parameter `created_by` with type `SQLCoreOperations[str] | str` in function `models.model.MessageAgentThought.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_agent_service.py:1005:24
+ --> tests/test_containers_integration_tests/services/test_agent_service.py:1004:24
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
--> tests/test_containers_integration_tests/services/test_annotation_service.py:101:38
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1162,19 +1162,19 @@
ERROR Argument `Literal['owner']` is not assignable to parameter `role` with type `SQLCoreOperations[TenantAccountRole] | TenantAccountRole` in function `models.account.TenantAccountJoin.__init__` [bad-argument-type]
--> tests/test_containers_integration_tests/services/test_end_user_service.py:37:18
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_file_service.py:69:20
+ --> tests/test_containers_integration_tests/services/test_file_service.py:68:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_file_service.py:78:20
+ --> tests/test_containers_integration_tests/services/test_file_service.py:77:20
ERROR Argument `str` is not assignable to parameter `role` with type `SQLCoreOperations[TenantAccountRole] | TenantAccountRole` in function `models.account.TenantAccountJoin.__init__` [bad-argument-type]
--> tests/test_containers_integration_tests/services/test_human_input_delivery_test.py:32:18
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_message_export_service.py:53:20
+ --> tests/test_containers_integration_tests/services/test_message_export_service.py:52:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_message_export_service.py:58:63
+ --> tests/test_containers_integration_tests/services/test_message_export_service.py:57:63
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
- --> tests/test_containers_integration_tests/services/test_message_service.py:119:38
+ --> tests/test_containers_integration_tests/services/test_message_service.py:118:38
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
- --> tests/test_containers_integration_tests/services/test_message_service.py:122:81
+ --> tests/test_containers_integration_tests/services/test_message_service.py:121:81
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
--> tests/test_containers_integration_tests/services/test_messages_clean_service.py:112:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
@@ -1417,6 +1417,28 @@
--> tests/test_containers_integration_tests/services/tools/test_mcp_tools_manage_service.py:63:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
--> tests/test_containers_integration_tests/services/tools/test_mcp_tools_manage_service.py:72:20
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:51:17
+ERROR Unexpected keyword argument `credentials` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:54:17
+ERROR Unexpected keyword argument `provider_type` in function `models.tools.ApiToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:55:17
+ERROR Unexpected keyword argument `description` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:60:17
+ERROR Unexpected keyword argument `icon` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:61:17
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:62:17
+ERROR Unexpected keyword argument `credentials` in function `models.tools.BuiltinToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:66:17
+ERROR Unexpected keyword argument `icon_dark` in function `models.tools.WorkflowToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:73:17
+ERROR Unexpected keyword argument `workflow_id` in function `models.tools.WorkflowToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:76:17
+ERROR Unexpected keyword argument `description` in function `models.tools.MCPToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:81:17
+ERROR Unexpected keyword argument `provider_icon` in function `models.tools.MCPToolProvider.__init__` [unexpected-keyword]
+ --> tests/test_containers_integration_tests/services/tools/test_tools_transform_service.py:82:17
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
--> tests/test_containers_integration_tests/services/tools/test_workflow_tools_manage_service.py:109:38
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1512,57 +1534,57 @@
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
--> tests/test_containers_integration_tests/tasks/test_add_document_to_index_task.py:62:20
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:71:20
+ --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:70:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:80:20
+ --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:79:20
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
- --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:293:23
+ --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:292:23
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
- --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:667:27
+ --> tests/test_containers_integration_tests/tasks/test_batch_clean_document_task.py:666:27
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:96:20
+ --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:95:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:105:20
+ --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:104:20
ERROR `>` is not supported between `None` and `Literal[0]` [unsupported-operation]
- --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:312:16
+ --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:311:16
ERROR `>` is not supported between `None` and `Literal[0]` [unsupported-operation]
- --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:704:16
+ --> tests/test_containers_integration_tests/tasks/test_batch_create_segment_to_index_task.py:703:16
ERROR Argument `Literal['active']` is not assignable to parameter `status` with type `AccountStatus | SQLCoreOperations[AccountStatus]` in function `models.account.Account.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:111:20
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:110:20
ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:121:20
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:120:20
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:296:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:295:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:299:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:298:22
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:408:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:407:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:411:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:410:22
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:476:36
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:475:36
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:479:26
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:478:26
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:531:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:530:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:534:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:533:22
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:616:36
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:615:36
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:619:26
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:618:26
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:729:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:728:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:732:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:731:22
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:815:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:814:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:818:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:817:22
ERROR Argument `str | None` is not assignable to parameter `indexing_technique` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:970:32
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:969:32
ERROR Argument `str | None` is not assignable to parameter `doc_form` with type `str` in function `celery.app.task.Task.__call__` [bad-argument-type]
- --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:973:22
+ --> tests/test_containers_integration_tests/tasks/test_clean_dataset_task.py:972:22
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
--> tests/test_containers_integration_tests/tasks/test_clean_notion_document_task.py:89:23
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
@@ -1944,7 +1966,7 @@
ERROR Object of class `tuple` has no attribute `json` [missing-attribute]
--> tests/unit_tests/controllers/console/auth/test_token_refresh.py:192:16
ERROR Object of class `BadRequest` has no attribute `data` [missing-attribute]
- --> tests/unit_tests/controllers/console/datasets/test_datasets.py:1635:16
+ --> tests/unit_tests/controllers/console/datasets/test_datasets.py:1634:16
ERROR Module `services.errors.chunk` exists, but was not imported explicitly. You are relying on other modules to load it. [implicit-import]
--> tests/unit_tests/controllers/console/datasets/test_datasets_segments.py:891:29
ERROR Module `services.errors.chunk` exists, but was not imported explicitly. You are relying on other modules to load it. [implicit-import]
@@ -2076,29 +2098,29 @@
ERROR `SimpleNamespace` is not assignable to attribute `request` with type `Request` [bad-assignment]
--> tests/unit_tests/controllers/files/test_upload.py:170:26
ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:171:44
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:168:44
ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:171:44
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:168:44
ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:188:31
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:185:31
ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:188:31
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:185:31
ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:203:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:200:35
ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:203:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:200:35
ERROR Missing argument `tenant_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:225:44
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:222:44
ERROR Missing argument `user_model` in function `controllers.inner_api.plugin.wraps.decorated_view` [missing-argument]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:225:44
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:222:44
ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:253:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:250:35
ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:268:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:265:35
ERROR Argument `type[TestPluginData.test_should_raise_error_on_invalid_payload.InvalidPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:286:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:283:35
ERROR Argument `type[PluginTestPayload]` is not assignable to parameter `payload_type` with type `type[BaseModel]` in function `controllers.inner_api.plugin.wraps.plugin_data` [bad-argument-type]
- --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:299:35
+ --> tests/unit_tests/controllers/inner_api/plugin/test_plugin_wraps.py:296:35
ERROR `SimpleNamespace` is not assignable to attribute `db` with type `SQLAlchemy` [bad-assignment]
--> tests/unit_tests/controllers/mcp/test_mcp.py:19:17
ERROR `SimpleNamespace` is not assignable to attribute `mcp_ns` with type `Namespace` [bad-assignment]
@@ -2122,17 +2144,17 @@
ERROR Missing argument `session` in function `services.conversation_service.ConversationService.pagination_by_last_id` [missing-argument]
--> tests/unit_tests/controllers/service_api/app/test_conversation.py:383:59
ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `limit` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:94:63
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:93:63
ERROR Argument value `Literal[101]` violates Pydantic `le` constraint `Literal[100]` for field `limit` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:100:63
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:99:63
ERROR Argument `str | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:138:20
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:137:20
ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `page` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:170:31
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:169:31
ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `limit` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:183:31
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:182:31
ERROR Argument value `Literal[102]` violates Pydantic `le` constraint `Literal[101]` for field `limit` [bad-argument-type]
- --> tests/unit_tests/controllers/service_api/app/test_message.py:188:31
+ --> tests/unit_tests/controllers/service_api/app/test_message.py:187:31
ERROR Argument `list[dict[str, Any]] | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
--> tests/unit_tests/controllers/service_api/app/test_workflow.py:92:20
ERROR Argument value `Literal[0]` violates Pydantic `ge` constraint `Literal[1]` for field `page` [bad-argument-type]
@@ -2802,7 +2824,7 @@
ERROR Argument `SimpleNamespace` is not assignable to parameter `user` with type `Account | EndUser` in function `core.app.apps.common.workflow_response_converter.WorkflowResponseConverter.__init__` [bad-argument-type]
--> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_resumption.py:32:14
ERROR Object of class `NoneType` has no attribute `data` [missing-attribute]
- --> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_truncation.py:588:16
+ --> tests/unit_tests/core/app/apps/common/test_workflow_response_converter_truncation.py:543:16
ERROR Argument `dict[str, dict[str, str]]` is not assignable to parameter `override_config_dict` with type `AppModelConfigDict | None` in function `core.app.apps.completion.app_config_manager.CompletionAppConfigManager.get_app_config` [bad-argument-type]
--> tests/unit_tests/core/app/apps/completion/test_completion_app_config_manager.py:29:34
ERROR Object of class `FunctionType` has no attribute `assert_not_called` [missing-attribute]
@@ -3292,7 +3314,7 @@
ERROR Object of class `BlobChunkMessage` has no attribute `text`
ERROR Object of class `BlobChunkMessage` has no attribute `json_object`
ERROR No matching overload found for function `list.__init__` called with arguments: (Generator[Unknown] | None) [no-matching-overload]
- --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:404:20
+ --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:403:20
ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
--> tests/unit_tests/core/datasource/test_datasource_manager.py:52:5
ERROR Argument `SimpleNamespace` is not assignable to parameter `datasource_type` with type `DatasourceProviderType` in function `core.datasource.datasource_manager.DatasourceManager.get_datasource_plugin_provider` [bad-argument-type]
@@ -3324,8 +3346,6 @@
--> tests/unit_tests/core/datasource/test_datasource_manager.py:573:34
ERROR Object of class `StreamChunkEvent` has no attribute `node_run_result` [missing-attribute]
--> tests/unit_tests/core/datasource/test_datasource_manager.py:624:12
-ERROR `in` is not supported between `Literal['Single Page']` and `None` [not-iterable]
- --> tests/unit_tests/core/datasource/test_website_crawl.py:989:16
ERROR Argument `Iterator[DatasourceMessage]` is not assignable to parameter `messages` with type `Generator[DatasourceMessage]` in function `core.datasource.utils.message_transformer.DatasourceFileMessageTransformer.transform_datasource_invoke_messages` [bad-argument-type]
--> tests/unit_tests/core/datasource/utils/test_message_transformer.py:28:26
ERROR Object of class `BlobChunkMessage` has no attribute `text`
@@ -3956,6 +3976,20 @@
--> tests/unit_tests/core/prompt/test_simple_prompt_transform.py:410:19
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `context_files` with type `list[File] | None` in function `core.prompt.simple_prompt_transform.SimplePromptTransform._get_last_user_message` [bad-argument-type]
--> tests/unit_tests/core/prompt/test_simple_prompt_transform.py:411:27
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:10:60
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:13:60
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:18:56
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:23:54
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:29:43
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:167:41
+ERROR Argument `None` is not assignable to parameter `process_rule` with type `dict[Unknown, Unknown]` in function `core.rag.cleaner.clean_processor.CleanProcessor.clean` [bad-argument-type]
+ --> tests/unit_tests/core/rag/cleaner/test_clean_processor.py:175:43
ERROR Argument `Literal['manhattan']` is not assignable to parameter `distance_function` with type `Literal['cosine', 'euclidean']` in function `core.rag.datasource.vdb.alibabacloud_mysql.alibabacloud_mysql_vector.AlibabaCloudMySQLVectorConfig.__init__` [bad-argument-type]
--> tests/unit_tests/core/rag/datasource/vdb/alibabacloud_mysql/test_alibabacloud_mysql_vector.py:682:35
ERROR Argument `list[str]` is not assignable to parameter `docs` with type `Sequence[Document]` in function `core.rag.docstore.dataset_docstore.DatasetDocumentStore.add_documents` [bad-argument-type]
@@ -4086,28 +4120,10 @@
--> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3769:67
ERROR `None` is not subscriptable [unsupported-operation]
--> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4009:16
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4476:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4528:40
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
--> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4533:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4549:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4579:36
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
--> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4584:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4592:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4628:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4656:36
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4714:40
-ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4758:44
ERROR Argument `Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.router.multi_dataset_react_route.ReactMultiDatasetRouter._handle_invoke_result` [bad-argument-type]
--> tests/unit_tests/core/rag/retrieval/test_multi_dataset_react_route.py:184:52
ERROR Argument `None` is not assignable to parameter `text` with type `str` in function `core.rag.splitter.text_splitter.RecursiveCharacterTextSplitter.split_text` [bad-argument-type]
@@ -4923,8 +4939,6 @@
--> tests/unit_tests/core/workflow/graph_engine/test_table_runner.py:443:23
ERROR `graph_runtime_state` is uninitialized [unbound-name]
--> tests/unit_tests/core/workflow/graph_engine/test_table_runner.py:444:37
-ERROR Argument `SimpleNamespace` is not assignable to parameter `node` with type `Node[Unknown]` in function `dify_graph.graph_engine.worker.Worker._build_fallback_failure_event` [bad-argument-type]
- --> tests/unit_tests/core/workflow/graph_engine/test_worker.py:29:50
ERROR Argument `dict[str, dict[str, str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `dify_graph.nodes.base.node.Node.__init__` [bad-argument-type]
--> tests/unit_tests/core/workflow/nodes/answer/test_answer.py:82:16
ERROR Class member `_Version1._get_retry_config` overrides parent class `Node` in an inconsistent manner [bad-override]
@@ -5115,14 +5129,6 @@
--> tests/unit_tests/core/workflow/nodes/iteration/iteration_node_spec.py:147:16
ERROR `dict[str, list[str] | str]` is not assignable to TypedDict key `data` with type `BaseNodeData` [bad-typed-dict-key]
--> tests/unit_tests/core/workflow/nodes/iteration/test_iteration_child_engine_errors.py:51:21
-ERROR `() -> nullcontext[IExecutionContext]` is not assignable to attribute `_capture_execution_context` with type `(self: IterationNode) -> IExecutionContext` [bad-assignment]
- --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:24:39
-ERROR No matching overload found for function `contextlib.nullcontext.__init__` called with arguments: () [no-matching-overload]
- --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:24:58
-ERROR `(current: LLMUsage, new: Unknown) -> LLMUsage | Unknown` is not assignable to attribute `_merge_usage` with type `(current: LLMUsage, new_usage: LLMUsage | None) -> LLMUsage` [bad-assignment]
- --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:26:25
-ERROR `(*, index: int, item: object, execution_context: object) -> tuple[float, list[NodeRunSucceededEvent], str, dict[Unknown, Unknown], LLMUsage]` is not assignable to attribute `_execute_single_iteration_parallel` with type `(self: IterationNode, index: int, item: object, execution_context: IExecutionContext) -> tuple[float, list[GraphNodeEventBase], object | None, dict[str, Variable], LLMUsage]` [bad-assignment]
- --> tests/unit_tests/core/workflow/nodes/iteration/test_parallel_iteration_duration.py:44:47
ERROR Argument `dict[str, dict[str, list[str] | str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode.__init__` [bad-argument-type]
--> tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py:119:20
ERROR Argument `dict[str, bool]` is not assignable to parameter `summary_index_setting` with type `SummaryIndexSettingDict | None` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode._invoke_knowledge_index` [bad-argument-type]
|
Pyrefly DiffNo changes detected. |
There was a problem hiding this comment.
Pull request overview
Fixes keyword extraction/expansion for hyphenated and underscore compound identifiers in Jieba-based (economy-mode) keyword indexing, so exact searches like st-771 / model_function_description can be recalled.
Changes:
- Update fallback tokenization regex to preserve hyphenated identifiers as single tokens.
- Enhance
extract_keywordswith a raw-text regex pass to add hyphen/underscore compound identifiers into the keyword set. - Add unit tests covering hyphen/underscore keyword extraction and query-side expansion behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py | Adjusts tokenization/keyword extraction to preserve and recover hyphen/underscore compounds. |
| api/tests/unit_tests/core/rag/datasource/keyword/test_jieba_keyword_table_handler.py | Adds tests to validate compound identifier handling in expansion and extraction paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pyrefly DiffNo changes detected. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pyrefly DiffNo changes detected. |
Summary
Fixes a bug in Economy-mode (keyword-based) indexing where compound identifiers containing hyphens or underscores — such as st-771 or model_function_description — could not be recalled by keyword search.
Root cause: Jieba's tokeniser splits these identifiers into separate tokens (st, 771, function, description) before TF-IDF scoring. The whole term was never indexed as a unit, so an exact keyword search for st-771 returned no results.
Three-layer fix in JiebaKeywordTableHandler:
Raw-text scan — after TF-IDF runs, scan the original text with r"[a-zA-Z0-9]+(?:[_-][a-zA-Z0-9]+)+" to recover all hyphen- and underscore-joined terms directly, bypassing jieba's segmentation entirely.
Subtoken suppression — remove the loose parts (st, 771, function) from the keyword set when the whole compound term is already present, preventing index pollution with overly generic single-token matches.
Expand-tokens regex — change r"\w+" to r"\w+(?:-\w+)*" in _expand_tokens_with_subtokens so the expansion step never re-splits a hyphenated term that survived into the keyword set.
Fixes #33621, fixes #28237, related to #11908
Screenshots
Before:

After:

Checklist
make lintandmake type-check(backend) andcd web && npx lint-staged(frontend) to appease the lint gods