Skip to content

refactor(api): replace dict with SummaryIndexSettingDict TypedDict in core/rag#33633

Merged
asukaminato0721 merged 1 commit intolanggenius:mainfrom
bittoby:refactor/core-rag-summary-index-setting-typeddict
Mar 18, 2026
Merged

refactor(api): replace dict with SummaryIndexSettingDict TypedDict in core/rag#33633
asukaminato0721 merged 1 commit intolanggenius:mainfrom
bittoby:refactor/core-rag-summary-index-setting-typeddict

Conversation

@bittoby
Copy link
Copy Markdown
Contributor

@bittoby bittoby commented Mar 18, 2026

Summary

Converts summary_index_setting: dict annotations to a SummaryIndexSettingDict TypedDict across all files that pass or consume this configuration.

This was prioritized first because summary_index_setting is the only remaining foundational (cross-file) dict pattern in core/rag — it propagates through 9 files across index_processor, summary_index, workflow/knowledge_index, and services/. All other remaining dict patterns in core/rag are file-local and can be converted independently.

Changes

  • Define SummaryIndexSettingDict (enable, model_name, model_provider_name, summary_prompt) in index_processor_base.py as the foundational shared type
  • Update all method signatures in the dependency chain:
    • core/rag/index_processor/index_processor_base.py — abstract method generate_summary_preview
    • core/rag/index_processor/index_processor.pyindex_and_clean, get_preview_output
    • core/rag/index_processor/processor/{paragraph,parent_child,qa}_index_processor.pygenerate_summary_preview, generate_summary
    • core/rag/summary_index/summary_index.pygenerate_and_vectorize_summary
  • Update callers outside core/rag:
    • core/workflow/nodes/knowledge_index/entities.pyKnowledgeIndexNodeData.summary_index_setting field
    • core/workflow/nodes/knowledge_index/knowledge_index_node.py_invoke_knowledge_index
    • services/summary_index_service.pygenerate_summary, generate_and_vectorize_summary, _process_segment_summary

Test plan

  • make lint passes
  • make type-check passes (basedpyright + pyrefly + mypy — 0 errors)
  • uv run --project api pytest api/tests/unit_tests — all tests pass

Part of #32863 (core/rag summary_index_setting dict chain)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. refactor labels Mar 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-03-18 04:12:43.949118999 +0000
+++ /tmp/pyrefly_pr.txt	2026-03-18 04:12:35.018119443 +0000
@@ -458,7 +458,7 @@
 ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
   --> core/rag/retrieval/router/multi_dataset_function_call_router.py:31:58
 ERROR Argument `Dataset | None` is not assignable to parameter `dataset` with type `Dataset` in function `services.summary_index_service.SummaryIndexService.generate_and_vectorize_summary` [bad-argument-type]
-  --> core/rag/summary_index/summary_index.py:74:85
+  --> core/rag/summary_index/summary_index.py:79:85
 ERROR Yielded type `Generator[ToolInvokeMessage] | ToolInvokeMessage | list[ToolInvokeMessage]` is not assignable to declared yield type `ToolInvokeMessage` [invalid-yield]
   --> core/tools/__base/tool.py:72:23
 ERROR yield from value must be iterable, got `Generator[ToolInvokeMessage] | ToolInvokeMessage | list[ToolInvokeMessage]` [invalid-yield]
@@ -552,25 +552,25 @@
 ERROR `handled_tenant_count` was assigned in the current scope before the nonlocal declaration [unknown-name]
   --> services/plugin/plugin_migration.py:72:34
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> services/summary_index_service.py:284:29
+   --> services/summary_index_service.py:285:29
 ERROR Object of class `NoneType` has no attribute `summary_index_node_id` [missing-attribute]
-   --> services/summary_index_service.py:363:21
-ERROR Object of class `NoneType` has no attribute `summary_index_node_hash` [missing-attribute]
    --> services/summary_index_service.py:364:21
-ERROR Object of class `NoneType` has no attribute `tokens` [missing-attribute]
+ERROR Object of class `NoneType` has no attribute `summary_index_node_hash` [missing-attribute]
    --> services/summary_index_service.py:365:21
-ERROR Object of class `NoneType` has no attribute `status` [missing-attribute]
+ERROR Object of class `NoneType` has no attribute `tokens` [missing-attribute]
    --> services/summary_index_service.py:366:21
+ERROR Object of class `NoneType` has no attribute `status` [missing-attribute]
+   --> services/summary_index_service.py:367:21
 ERROR Object of class `NoneType` has no attribute `summary_content` [missing-attribute]
-   --> services/summary_index_service.py:369:21
+   --> services/summary_index_service.py:370:21
 ERROR Object of class `NoneType` has no attribute `updated_at` [missing-attribute]
-   --> services/summary_index_service.py:371:21
+   --> services/summary_index_service.py:372:21
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
-   --> services/summary_index_service.py:397:25
+   --> services/summary_index_service.py:398:25
 ERROR Object of class `NoneType` has no attribute `updated_at` [missing-attribute]
-   --> services/summary_index_service.py:406:24
+   --> services/summary_index_service.py:407:24
 ERROR Argument `datetime | object` is not assignable to parameter `value` with type `SQLCoreOperations[datetime] | datetime` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
-   --> services/summary_index_service.py:407:53
+   --> services/summary_index_service.py:408:53
 ERROR Object of class `dict` has no attribute `encode`
 ERROR Object of class `dict` has no attribute `encode`
 ERROR Runtime checkable protocol `Generator` has an unsafe overlap with type `Generator[Mapping[str, Any] | str, Any] | Mapping[str, Any]` [unsafe-overlap]
@@ -4060,6 +4060,8 @@
    --> tests/unit_tests/core/rag/indexing/processor/test_parent_child_index_processor.py:339:63
 ERROR Argument `Literal['semantic_search']` is not assignable to parameter `retrieval_method` with type `RetrievalMethod` in function `core.rag.index_processor.processor.qa_index_processor.QAIndexProcessor.retrieve` [bad-argument-type]
    --> tests/unit_tests/core/rag/indexing/processor/test_qa_index_processor.py:266:39
+ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
+   --> tests/unit_tests/core/rag/indexing/processor/test_qa_index_processor.py:332:78
 ERROR Method `extract` inherited from class `BaseIndexProcessor` has no implementation and cannot be accessed via `super()` [missing-attribute]
   --> tests/unit_tests/core/rag/indexing/test_index_processor_base.py:15:16
 ERROR Method `transform` inherited from class `BaseIndexProcessor` has no implementation and cannot be accessed via `super()` [missing-attribute]
@@ -5123,6 +5125,8 @@
   --> tests/unit_tests/core/workflow/nodes/iteration/test_iteration_child_engine_errors.py:51:21
 ERROR Argument `dict[str, dict[str, list[str] | str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode.__init__` [bad-argument-type]
    --> tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py:119:20
+ERROR Argument `dict[str, bool]` is not assignable to parameter `summary_index_setting` with type `SummaryIndexSettingDict | None` in function `core.workflow.nodes.knowledge_index.knowledge_index_node.KnowledgeIndexNode._invoke_knowledge_index` [bad-argument-type]
+   --> tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py:640:35
 ERROR Argument `dict[str, dict[str, list[str] | str] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `core.workflow.nodes.knowledge_retrieval.knowledge_retrieval_node.KnowledgeRetrievalNode.__init__` [bad-argument-type]
    --> tests/unit_tests/core/workflow/nodes/knowledge_retrieval/test_knowledge_retrieval_node.py:110:20
 ERROR Argument `dict[str, dict[str, Any] | str]` is not assignable to parameter `config` with type `NodeConfigDict` in function `core.workflow.nodes.knowledge_retrieval.knowledge_retrieval_node.KnowledgeRetrievalNode.__init__` [bad-argument-type]
@@ -5814,8 +5818,16 @@
    --> tests/unit_tests/services/test_recommended_app_service.py:440:20
 ERROR Argument `SimpleNamespace` is not assignable to parameter `value` with type `ModuleType` in function `_pytest.monkeypatch.MonkeyPatch.setitem` [bad-argument-type]
   --> tests/unit_tests/services/test_summary_index_service.py:90:9
+ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
+  --> tests/unit_tests/services/test_summary_index_service.py:96:93
+ERROR Key `a` is not defined in TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
+  --> tests/unit_tests/services/test_summary_index_service.py:96:94
 ERROR Argument `SimpleNamespace` is not assignable to parameter `value` with type `ModuleType` in function `_pytest.monkeypatch.MonkeyPatch.setitem` [bad-argument-type]
    --> tests/unit_tests/services/test_summary_index_service.py:112:9
+ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_summary_index_service.py:116:82
+ERROR Key `a` is not defined in TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_summary_index_service.py:116:83
 ERROR Argument `SimpleNamespace` is not assignable to parameter `value` with type `ModuleType` in function `_pytest.monkeypatch.MonkeyPatch.setitem` [bad-argument-type]
    --> tests/unit_tests/services/test_summary_index_service.py:750:45
 ERROR Argument `SimpleNamespace` is not assignable to parameter `value` with type `ModuleType` in function `_pytest.monkeypatch.MonkeyPatch.setitem` [bad-argument-type]

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 18, 2026
@asukaminato0721 asukaminato0721 merged commit 3454224 into langgenius:main Mar 18, 2026
16 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the summary_index_setting configuration plumbing in core/rag (and its workflow/service callers) by replacing broad dict type annotations with a shared SummaryIndexSettingDict TypedDict, improving type safety across the summary-index generation chain.

Changes:

  • Introduces SummaryIndexSettingDict (TypedDict) as the shared type for summary_index_setting.
  • Updates method signatures across RAG index processors and summary-index generation to use SummaryIndexSettingDict (optionally nullable where applicable).
  • Updates workflow node data and invocation paths to propagate the TypedDict type.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
api/core/rag/index_processor/index_processor_base.py Defines the shared SummaryIndexSettingDict and updates abstract processor interface typing.
api/core/rag/index_processor/index_processor.py Types summary_index_setting in indexing + preview generation flows.
api/core/rag/index_processor/processor/paragraph_index_processor.py Types summary preview + summary generation entrypoints with the shared TypedDict.
api/core/rag/index_processor/processor/parent_child_index_processor.py Types summary preview signature for parent/child processor.
api/core/rag/index_processor/processor/qa_index_processor.py Types summary preview signature for QA processor.
api/core/rag/summary_index/summary_index.py Types summary_index_setting for summary generation/vectorization orchestrator.
api/services/summary_index_service.py Types summary_index_setting across summary generation/vectorization service APIs.
api/core/workflow/nodes/knowledge_index/entities.py Types workflow node summary_index_setting field using the shared TypedDict.
api/core/workflow/nodes/knowledge_index/knowledge_index_node.py Types the node invocation path summary_index_setting parameter using the shared TypedDict.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +40 to +41
class SummaryIndexSettingDict(TypedDict):
enable: bool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer refactor size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants