Skip to content

refactor(services): migrate dataset_service and clear_free_plan_tenant_expired_logs to SQLAlchemy 2.0 select() API#34970

Merged
asukaminato0721 merged 5 commits intolanggenius:mainfrom
wdeveloper16:refactor/sqlalchemy2-dataset-and-cleanup-services
Apr 12, 2026
Merged

refactor(services): migrate dataset_service and clear_free_plan_tenant_expired_logs to SQLAlchemy 2.0 select() API#34970
asukaminato0721 merged 5 commits intolanggenius:mainfrom
wdeveloper16:refactor/sqlalchemy2-dataset-and-cleanup-services

Conversation

@wdeveloper16
Copy link
Copy Markdown
Contributor

@wdeveloper16 wdeveloper16 commented Apr 11, 2026

part of #22668

Files:

  • api/services/dataset_service.py
  • api/services/clear_free_plan_tenant_expired_logs.py

Description:
Migrates remaining service files:

  • dataset_service.py: filter_by(dataset_id=...).where(Model.dataset_id == ...)
  • clear_free_plan_tenant_expired_logs.py: session.query(Tenant.id).count()
    session.scalar(select(func.count(Tenant.id))), plus scalars/bulk deletes

…t_expired_logs to SQLAlchemy 2.0 select() API
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. refactor labels Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 19:37:23.438409702 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 19:37:14.681333933 +0000
@@ -323,12 +323,14 @@
    --> services/audio_service.py:151:52
 ERROR Argument `str | None` is not assignable to parameter `value` with type `SQLCoreOperations[str] | str` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
    --> services/conversation_service.py:131:33
+ERROR Object of class `Result` has no attribute `rowcount` [missing-attribute]
+    --> services/dataset_service.py:1467:29
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2117:56
+    --> services/dataset_service.py:2119:56
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2630:44
+    --> services/dataset_service.py:2632:44
 ERROR `None` is not assignable to attribute `rules` with type `Never` [bad-assignment]
-    --> services/dataset_service.py:2805:51
+    --> services/dataset_service.py:2807:51
 ERROR Class member `DocumentIndexingTaskProxy.NORMAL_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
   --> services/document_indexing_proxy/document_indexing_task_proxy.py:11:5
 ERROR Class member `DocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 19:39:28.535290168 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 19:39:20.623281324 +0000
@@ -323,12 +323,14 @@
    --> services/audio_service.py:151:52
 ERROR Argument `str | None` is not assignable to parameter `value` with type `SQLCoreOperations[str] | str` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
    --> services/conversation_service.py:131:33
+ERROR Object of class `Result` has no attribute `rowcount` [missing-attribute]
+    --> services/dataset_service.py:1467:29
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2117:56
+    --> services/dataset_service.py:2119:56
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2630:44
+    --> services/dataset_service.py:2632:44
 ERROR `None` is not assignable to attribute `rules` with type `Never` [bad-assignment]
-    --> services/dataset_service.py:2805:51
+    --> services/dataset_service.py:2807:51
 ERROR Class member `DocumentIndexingTaskProxy.NORMAL_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
   --> services/document_indexing_proxy/document_indexing_task_proxy.py:11:5
 ERROR Class member `DocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]

@wdeveloper16 wdeveloper16 requested a review from laipz8200 as a code owner April 11, 2026 20:02
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 20:03:24.864297158 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 20:03:16.973322642 +0000
@@ -324,11 +324,11 @@
 ERROR Argument `str | None` is not assignable to parameter `value` with type `SQLCoreOperations[str] | str` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
    --> services/conversation_service.py:131:33
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2117:56
+    --> services/dataset_service.py:2119:56
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2630:44
+    --> services/dataset_service.py:2632:44
 ERROR `None` is not assignable to attribute `rules` with type `Never` [bad-assignment]
-    --> services/dataset_service.py:2805:51
+    --> services/dataset_service.py:2807:51
 ERROR Class member `DocumentIndexingTaskProxy.NORMAL_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
   --> services/document_indexing_proxy/document_indexing_task_proxy.py:11:5
 ERROR Class member `DocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
@@ -6347,9 +6347,9 @@
 ERROR Argument `FakeRepo` is not assignable to parameter `workflow_run_repo` with type `APIWorkflowRunRepository | None` in function `services.retention.workflow_run.clear_free_plan_expired_workflow_run_logs.WorkflowRunCleanup.__init__` [bad-argument-type]
    --> tests/unit_tests/services/test_clear_free_plan_expired_workflow_run_logs.py:114:49
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:407:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:385:13
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:501:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:460:13
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]
    --> tests/unit_tests/services/test_dataset_service_dataset.py:310:93
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 20:37:27.598601410 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 20:37:19.347408618 +0000
@@ -324,11 +324,11 @@
 ERROR Argument `str | None` is not assignable to parameter `value` with type `SQLCoreOperations[str] | str` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
    --> services/conversation_service.py:131:33
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2117:56
+    --> services/dataset_service.py:2119:56
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2630:44
+    --> services/dataset_service.py:2632:44
 ERROR `None` is not assignable to attribute `rules` with type `Never` [bad-assignment]
-    --> services/dataset_service.py:2805:51
+    --> services/dataset_service.py:2807:51
 ERROR Class member `DocumentIndexingTaskProxy.NORMAL_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
   --> services/document_indexing_proxy/document_indexing_task_proxy.py:11:5
 ERROR Class member `DocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
@@ -6347,9 +6347,9 @@
 ERROR Argument `FakeRepo` is not assignable to parameter `workflow_run_repo` with type `APIWorkflowRunRepository | None` in function `services.retention.workflow_run.clear_free_plan_expired_workflow_run_logs.WorkflowRunCleanup.__init__` [bad-argument-type]
    --> tests/unit_tests/services/test_clear_free_plan_expired_workflow_run_logs.py:114:49
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:407:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:385:13
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:501:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:460:13
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]
    --> tests/unit_tests/services/test_dataset_service_dataset.py:310:93
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 21:17:32.364749197 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 21:17:22.743636530 +0000
@@ -324,11 +324,11 @@
 ERROR Argument `str | None` is not assignable to parameter `value` with type `SQLCoreOperations[str] | str` in function `sqlalchemy.orm.base.Mapped.__set__` [bad-argument-type]
    --> services/conversation_service.py:131:33
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2117:56
+    --> services/dataset_service.py:2119:56
 ERROR `dict[str, bool | dict[str, Any] | str | None]` is not assignable to variable `data_source_info` with type `dict[str, bool | str]` [bad-assignment]
-    --> services/dataset_service.py:2630:44
+    --> services/dataset_service.py:2632:44
 ERROR `None` is not assignable to attribute `rules` with type `Never` [bad-assignment]
-    --> services/dataset_service.py:2805:51
+    --> services/dataset_service.py:2807:51
 ERROR Class member `DocumentIndexingTaskProxy.NORMAL_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
   --> services/document_indexing_proxy/document_indexing_task_proxy.py:11:5
 ERROR Class member `DocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
@@ -6347,9 +6347,9 @@
 ERROR Argument `FakeRepo` is not assignable to parameter `workflow_run_repo` with type `APIWorkflowRunRepository | None` in function `services.retention.workflow_run.clear_free_plan_expired_workflow_run_logs.WorkflowRunCleanup.__init__` [bad-argument-type]
    --> tests/unit_tests/services/test_clear_free_plan_expired_workflow_run_logs.py:114:49
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:407:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:385:13
 ERROR Class member `FixedDateTime.now` overrides parent class `datetime` in an inconsistent manner [bad-override]
-   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:501:13
+   --> tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py:460:13
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]
    --> tests/unit_tests/services/test_dataset_service_dataset.py:310:93
 ERROR Argument `SimpleNamespace` is not assignable to parameter `account` with type `Account` in function `services.dataset_service.DatasetService.create_empty_dataset` [bad-argument-type]

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 12, 2026
@asukaminato0721 asukaminato0721 added this pull request to the merge queue Apr 12, 2026
@asukaminato0721 asukaminato0721 requested a review from Copilot April 12, 2026 01:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the SQLAlchemy 2.0 migration work (issue #22668) by refactoring remaining service-layer ORM usage away from legacy Session.query(...) patterns and updating unit tests accordingly.

Changes:

  • Refactor DatasetService to use Session.scalar(select(...)) for single-row ORM retrieval and Session.execute(update(...)) for bulk updates.
  • Refactor ClearFreePlanTenantExpiredLogs to use select()/scalars() for reads, delete() + execute() for bulk deletes, and scalar(select(func.count(...))) for counts.
  • Update unit tests to mock scalar/scalars/execute results instead of query().filter().update()/delete()/count() chains.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
api/services/dataset_service.py Migrates targeted dataset queries/updates to SQLAlchemy 2.0 select() / update() patterns.
api/services/clear_free_plan_tenant_expired_logs.py Migrates reads/counts/deletes to select()/scalars()/scalar() and delete() with execute().
api/tests/unit_tests/services/test_dataset_service_document.py Updates mocks for update_documents_need_summary to use session.execute(...).rowcount.
api/tests/unit_tests/services/test_dataset_service_dataset.py Updates mocks for external knowledge binding lookup to use session.scalar(...).
api/tests/unit_tests/services/test_clear_free_plan_tenant_expired_logs.py Updates mocks/assertions to match scalars() reads and execute(delete(...)) deletes and scalar count queries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Merged via the queue into langgenius:main with commit 7515eee Apr 12, 2026
32 checks passed
HanqingZ pushed a commit to HanqingZ/dify that referenced this pull request Apr 23, 2026
…t_expired_logs to SQLAlchemy 2.0 select() API (langgenius#34970)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer refactor size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants