refactor(api): migrate core RAG layer to SQLAlchemy 2.0 select() API#34965
Merged
asukaminato0721 merged 4 commits intolanggenius:mainfrom Apr 11, 2026
Merged
Conversation
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-11 15:08:58.178007466 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-11 15:08:49.556076277 +0000
@@ -290,13 +290,13 @@
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/paragraph_index_processor.py:204:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:245:33
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:243:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:246:16
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:244:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:209:33
+ --> core/rag/index_processor/processor/qa_index_processor.py:208:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:210:16
+ --> core/rag/index_processor/processor/qa_index_processor.py:209:16
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
--> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
|
Contributor
There was a problem hiding this comment.
Pull request overview
Migrates the core RAG pipeline’s SQLAlchemy ORM usage from legacy session.query(...) patterns to SQLAlchemy 2.0-style select() / update() statements to improve compatibility and typing (per issue #22668).
Changes:
- Replaced multiple
session.query(...).filter/.../where(...).all()/first()reads withsession.scalar(select(...))andsession.scalars(select(...)).all(). - Replaced a bulk
query.update(..., synchronize_session=False)withsession.execute(update(...).values(...))for hit count updates. - Refactored a grouped subquery + outer join dataset availability query to use
select(...).subquery()+session.scalars(...).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| api/core/rag/index_processor/index_processor.py | Moves document/dataset lookups, aggregate sum query, and segment status updates to SA 2.0 select()/update() APIs. |
| api/core/rag/index_processor/processor/parent_child_index_processor.py | Updates segment lookup in clean() to session.scalars(select(...)). |
| api/core/rag/index_processor/processor/qa_index_processor.py | Updates segment lookup in clean() to session.scalars(select(...)). |
| api/core/rag/retrieval/dataset_retrieval.py | Converts dataset/document fetches, hit_count bulk update, and available-datasets subquery/join to SA 2.0 style. |
| api/core/rag/summary_index/summary_index.py | Converts preview-mode dataset/document/segment/summary queries to SA 2.0 select() + scalar/scalars accessors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Author
|
Hi, @asukaminato0721 |
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-11 15:33:13.154649983 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-11 15:33:06.351453686 +0000
@@ -290,13 +290,13 @@
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/paragraph_index_processor.py:204:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:245:33
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:243:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:246:16
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:244:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:209:33
+ --> core/rag/index_processor/processor/qa_index_processor.py:208:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:210:16
+ --> core/rag/index_processor/processor/qa_index_processor.py:209:16
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
--> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
@@ -4931,39 +4931,39 @@
ERROR Object of class `FunctionType` has no attribute `call_count` [missing-attribute]
--> tests/unit_tests/core/rag/rerank/test_reranker.py:1630:16
ERROR Argument `list[float] | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:1949:20
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:1948:20
ERROR Could not find name `metadata_name` [unknown-name]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2769:29
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2768:29
ERROR Could not find name `metadata_name` [unknown-name]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2770:29
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2769:29
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval._handle_invoke_result` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3782:64
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3781:64
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval._handle_invoke_result` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3786:67
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3785:67
ERROR `None` is not subscriptable [unsupported-operation]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4025:16
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4024:16
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4559:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4546:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4611:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4598:40
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataFilteringCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4616:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4603:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4632:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4619:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4662:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4649:36
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataFilteringCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4667:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4654:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4675:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4662:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4711:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4698:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4739:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4726:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4797:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4784:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4841:44
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4828:44
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.router.multi_dataset_react_route.ReactMultiDatasetRouter._handle_invoke_result` [bad-argument-type]
--> tests/unit_tests/core/rag/retrieval/test_multi_dataset_react_route.py:199:52
ERROR Argument `None` is not assignable to parameter `text` with type `str` in function `core.rag.splitter.text_splitter.RecursiveCharacterTextSplitter.split_text` [bad-argument-type]
|
Contributor
Author
|
Okay, I will fix them perfectly. |
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-11 15:56:02.462146579 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-11 15:55:53.899090686 +0000
@@ -290,13 +290,13 @@
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/paragraph_index_processor.py:204:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:245:33
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:243:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/parent_child_index_processor.py:246:16
+ --> core/rag/index_processor/processor/parent_child_index_processor.py:244:16
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:209:33
+ --> core/rag/index_processor/processor/qa_index_processor.py:208:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
- --> core/rag/index_processor/processor/qa_index_processor.py:210:16
+ --> core/rag/index_processor/processor/qa_index_processor.py:209:16
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
--> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
@@ -4931,39 +4931,39 @@
ERROR Object of class `FunctionType` has no attribute `call_count` [missing-attribute]
--> tests/unit_tests/core/rag/rerank/test_reranker.py:1630:16
ERROR Argument `list[float] | None` is not assignable to parameter `obj` with type `Sized` in function `len` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:1949:20
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:1948:20
ERROR Could not find name `metadata_name` [unknown-name]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2769:29
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2768:29
ERROR Could not find name `metadata_name` [unknown-name]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2770:29
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:2769:29
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval._handle_invoke_result` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3782:64
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3781:64
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval._handle_invoke_result` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3786:67
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:3785:67
ERROR `None` is not subscriptable [unsupported-operation]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4025:16
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4024:16
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4559:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4546:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4611:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4598:40
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataFilteringCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4616:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4603:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4632:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4619:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4662:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4649:36
ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataFilteringCondition | None` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4667:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4654:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.single_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4675:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4662:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4711:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4698:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4739:36
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4726:36
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4797:40
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4784:40
ERROR Argument `list[SimpleNamespace]` is not assignable to parameter `available_datasets` with type `list[Dataset]` in function `core.rag.retrieval.dataset_retrieval.DatasetRetrieval.multiple_retrieve` [bad-argument-type]
- --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4841:44
+ --> tests/unit_tests/core/rag/retrieval/test_dataset_retrieval.py:4828:44
ERROR Argument `Iterator[Any | Unknown] | Iterator[Any]` is not assignable to parameter `invoke_result` with type `Generator[Unknown]` in function `core.rag.retrieval.router.multi_dataset_react_route.ReactMultiDatasetRouter._handle_invoke_result` [bad-argument-type]
--> tests/unit_tests/core/rag/retrieval/test_multi_dataset_react_route.py:199:52
ERROR Argument `None` is not assignable to parameter `text` with type `str` in function `core.rag.splitter.text_splitter.RecursiveCharacterTextSplitter.split_text` [bad-argument-type]
|
5 tasks
asukaminato0721
approved these changes
Apr 11, 2026
HanqingZ
pushed a commit
to HanqingZ/dify
that referenced
this pull request
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates SQLAlchemy ORM calls in the core RAG pipeline from the legacy 1.x session.query() patterns to the modern 2.0 select() / update() API, as tracked in issue #22668.
Files changed:
api/core/rag/index_processor/index_processor.pyapi/core/rag/index_processor/processor/parent_child_index_processor.pyapi/core/rag/index_processor/processor/qa_index_processor.pyapi/core/rag/retrieval/dataset_retrieval.pyapi/core/rag/summary_index/summary_index.pyPatterns replaced:
Before
session.query(DocumentSegment).filter(...).update({...}, synchronize_session=False)session.query(Dataset).where(...).all()After
session.execute(update(DocumentSegment).where(...).values(...))session.scalars(select(Dataset).where(...)).all()Also migrates a complex subquery + outerjoin pattern from session.query() as the subquery base to select() as the base (fully SA 2.0 compatible).