Skip to content

refactor: core/app pipeline, core/datasource, and core/indexing_runner#34359

Merged
asukaminato0721 merged 2 commits intolanggenius:mainfrom
RenzoMXD:refactor/select-core-app-indexing
Apr 1, 2026
Merged

refactor: core/app pipeline, core/datasource, and core/indexing_runner#34359
asukaminato0721 merged 2 commits intolanggenius:mainfrom
RenzoMXD:refactor/select-core-app-indexing

Conversation

@RenzoMXD
Copy link
Copy Markdown
Contributor

@RenzoMXD RenzoMXD commented Apr 1, 2026

Summary

  • Migrate 25 db.session.query() calls to SQLAlchemy 2.0 select() style across 4 files: pipeline_runner.py, pipeline_generator.py, datasource_file_manager.py, and indexing_runner.py
  • Use session.get() for PK lookups, scalar(select(...).limit(1)) for non-PK filtered queries, scalars(select(...)) for multi-row queries, execute(delete(...)) for bulk deletes, execute(update(...).values(...)) for bulk updates, and scalar(select(func.count())) for count queries
  • Update test mock wiring in 4 test files (mock plumbing only — no test logic or assertion changes)

Test plan

  • make type-check passes for all 4 changed source files (basedpyright)
  • All 113 related unit tests pass (0 failures, matching clean main)

Part of #22668

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. refactor labels Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-01 00:26:17.932690376 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-01 00:26:07.360672106 +0000
@@ -3021,13 +3021,13 @@
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:65:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:148:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:143:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:173:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:168:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:203:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:194:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:266:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:253:37
 ERROR `Literal['generated-conversation-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
   --> tests/unit_tests/core/app/apps/test_advanced_chat_app_generator.py:53:22
 ERROR `Literal['generated-message-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
@@ -3873,7 +3873,7 @@
 ERROR Object of class `BlobChunkMessage` has no attribute `text`
 ERROR Object of class `BlobChunkMessage` has no attribute `json_object`
 ERROR No matching overload found for function `list.__init__` called with arguments: (Generator[Unknown] | None) [no-matching-overload]
-   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:404:20
+   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:390:20
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
   --> tests/unit_tests/core/datasource/test_datasource_manager.py:52:5
 ERROR Argument `SimpleNamespace` is not assignable to parameter `datasource_type` with type `DatasourceProviderType` in function `core.datasource.datasource_manager.DatasourceManager.get_datasource_plugin_provider` [bad-argument-type]
@@ -5184,13 +5184,13 @@
 ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
    --> tests/unit_tests/core/rag/indexing/processor/test_qa_index_processor.py:333:78
 ERROR Argument `None` is not assignable to parameter `state` with type `InstanceState[Any]` in function `sqlalchemy.orm.exc.ObjectDeletedError.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:934:84
+   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:945:84
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1045:13
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1063:13
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1060:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1078:71
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1071:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1089:71
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
     --> tests/unit_tests/core/rag/rerank/test_reranker.py:1578:9
 ERROR Object of class `FunctionType` has no attribute `call_args` [missing-attribute]

@RenzoMXD RenzoMXD force-pushed the refactor/select-core-app-indexing branch from 976bae1 to 680ec03 Compare April 1, 2026 00:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-01 00:47:53.147056397 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-01 00:47:42.788905975 +0000
@@ -3021,13 +3021,13 @@
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:65:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:148:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:143:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:173:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:168:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:203:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:194:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:266:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:253:37
 ERROR `Literal['generated-conversation-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
   --> tests/unit_tests/core/app/apps/test_advanced_chat_app_generator.py:53:22
 ERROR `Literal['generated-message-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
@@ -3873,7 +3873,7 @@
 ERROR Object of class `BlobChunkMessage` has no attribute `text`
 ERROR Object of class `BlobChunkMessage` has no attribute `json_object`
 ERROR No matching overload found for function `list.__init__` called with arguments: (Generator[Unknown] | None) [no-matching-overload]
-   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:404:20
+   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:390:20
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
   --> tests/unit_tests/core/datasource/test_datasource_manager.py:52:5
 ERROR Argument `SimpleNamespace` is not assignable to parameter `datasource_type` with type `DatasourceProviderType` in function `core.datasource.datasource_manager.DatasourceManager.get_datasource_plugin_provider` [bad-argument-type]
@@ -5184,13 +5184,13 @@
 ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
    --> tests/unit_tests/core/rag/indexing/processor/test_qa_index_processor.py:333:78
 ERROR Argument `None` is not assignable to parameter `state` with type `InstanceState[Any]` in function `sqlalchemy.orm.exc.ObjectDeletedError.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:934:84
+   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:924:84
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1045:13
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1038:13
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1060:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1053:71
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1071:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1064:71
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
     --> tests/unit_tests/core/rag/rerank/test_reranker.py:1578:9
 ERROR Object of class `FunctionType` has no attribute `call_args` [missing-attribute]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-01 00:49:48.514788947 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-01 00:49:37.831720618 +0000
@@ -3021,13 +3021,13 @@
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:65:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:148:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:143:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:173:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:168:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:203:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:194:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:266:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:253:37
 ERROR `Literal['generated-conversation-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
   --> tests/unit_tests/core/app/apps/test_advanced_chat_app_generator.py:53:22
 ERROR `Literal['generated-message-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
@@ -3873,7 +3873,7 @@
 ERROR Object of class `BlobChunkMessage` has no attribute `text`
 ERROR Object of class `BlobChunkMessage` has no attribute `json_object`
 ERROR No matching overload found for function `list.__init__` called with arguments: (Generator[Unknown] | None) [no-matching-overload]
-   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:404:20
+   --> tests/unit_tests/core/datasource/test_datasource_file_manager.py:390:20
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
   --> tests/unit_tests/core/datasource/test_datasource_manager.py:52:5
 ERROR Argument `SimpleNamespace` is not assignable to parameter `datasource_type` with type `DatasourceProviderType` in function `core.datasource.datasource_manager.DatasourceManager.get_datasource_plugin_provider` [bad-argument-type]
@@ -5184,13 +5184,13 @@
 ERROR Missing required key `enable` for TypedDict `SummaryIndexSettingDict` [bad-typed-dict-key]
    --> tests/unit_tests/core/rag/indexing/processor/test_qa_index_processor.py:333:78
 ERROR Argument `None` is not assignable to parameter `state` with type `InstanceState[Any]` in function `sqlalchemy.orm.exc.ObjectDeletedError.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:934:84
+   --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:924:84
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1045:13
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1038:13
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1060:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1053:71
 ERROR Argument `Literal['completed']` is not assignable to parameter `after_indexing_status` with type `IndexingStatus` in function `core.indexing_runner.IndexingRunner._update_document_index_status` [bad-argument-type]
-    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1071:71
+    --> tests/unit_tests/core/rag/indexing/test_indexing_runner.py:1064:71
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
     --> tests/unit_tests/core/rag/rerank/test_reranker.py:1578:9
 ERROR Object of class `FunctionType` has no attribute `call_args` [missing-attribute]

@RenzoMXD
Copy link
Copy Markdown
Contributor Author

RenzoMXD commented Apr 1, 2026

@asukaminato0721 Please review. Thanks.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 1, 2026
@asukaminato0721 asukaminato0721 requested a review from Copilot April 1, 2026 02:20
@asukaminato0721 asukaminato0721 added this pull request to the merge queue Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors ORM usage in the core RAG indexing/pipeline/datasource paths to SQLAlchemy 2.0 “select()/session.get()” style, and updates unit-test mocks accordingly to improve typing and modernize query patterns.

Changes:

  • Replaced db.session.query(...) lookups with session.get(...), session.scalar(select(...).limit(1)), and session.scalars(select(...)) across core modules.
  • Migrated bulk delete/update/count patterns to session.execute(delete(...)), session.execute(update(...).values(...)), and select(func.count()).
  • Updated unit-test mock plumbing to match the new Session APIs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
api/core/indexing_runner.py Migrates dataset/user/document/segment queries and bulk updates/deletes to SQLAlchemy 2.0 style.
api/core/datasource/datasource_file_manager.py Switches UploadFile/MessageFile/ToolFile retrieval to session.get() PK lookups.
api/core/app/apps/pipeline/pipeline_runner.py Switches EndUser/Pipeline retrieval to session.get() and workflow/document fetches to scalar(select(...)).
api/core/app/apps/pipeline/pipeline_generator.py Switches workflow fetch to session.get() inside generation path.
api/tests/unit_tests/core/rag/indexing/test_indexing_runner.py Updates DB mocking to use session.get()/session.scalar() for IndexingRunner tests.
api/tests/unit_tests/core/datasource/test_datasource_file_manager.py Updates DB mocking to use session.get() for file retrieval tests.
api/tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py Updates some DB mocking to use session.get()/session.scalar() in PipelineRunner tests.
api/tests/unit_tests/core/app/apps/pipeline/test_pipeline_generator.py Updates workflow-not-found test to mock session.get() instead of query chaining.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

db.session.scalar(
select(func.count())
.select_from(DatasetDocument)
.where(DatasetDocument.id == document_id, DatasetDocument.is_paused == True)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nullable boolean columns, prefer Document.is_paused.is_(True) instead of == True to avoid SQLAlchemy boolean-comparison warnings and to match the project’s existing pattern (e.g., api/services/dataset_service.py:1259).

Suggested change
.where(DatasetDocument.id == document_id, DatasetDocument.is_paused == True)
.where(DatasetDocument.id == document_id, DatasetDocument.is_paused.is_(True))

Copilot uses AI. Check for mistakes.
Comment on lines +187 to 191
end_user = MagicMock(session_id="sess")

session = MagicMock()
session.query.side_effect = [query_end_user, query_pipeline]
session.get.side_effect = [end_user, pipeline]
mocker.patch.object(module.db, "session", session)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file still contains cases mocking db.session.query(...) (e.g., test_run_pipeline_not_found / test_run_workflow_not_initialized), but PipelineRunner.run() now uses db.session.get(...). Because MagicMock auto-creates missing attributes, those tests can accidentally pass without exercising the intended branch. Please update the remaining tests to mock session.get return values/side_effects so they fail for the right reason (pipeline missing vs workflow missing).

Copilot uses AI. Check for mistakes.
Comment on lines 347 to 349
session = MagicMock()
session.query.return_value.where.return_value.first.return_value = None
session.get.return_value = None
mocker.patch.object(module.db, "session", session)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineGenerator._generate() now uses db.session.get(Workflow, workflow_id), but this test module still has at least one test (test_generate_success_returns_converted) mocking session.query(...).where(...).first(). Since MagicMock will provide a truthy session.get() by default, that test can become a false positive. Update it to set session.get.return_value = workflow (and remove/avoid the unused query mock) so the test validates the correct behavior.

Copilot uses AI. Check for mistakes.
Merged via the queue into langgenius:main with commit 4bd3886 Apr 1, 2026
30 of 31 checks passed
@RenzoMXD RenzoMXD deleted the refactor/select-core-app-indexing branch April 1, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer refactor size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants