refactor: select in external_knowledge_service by RenzoMXD · Pull Request #34493 · langgenius/dify

RenzoMXD · 2026-04-03T03:02:55Z

Summary

Migrate all 10 db.session.query() calls to SQLAlchemy 2.x select() style in external_knowledge_service.py
Replace .filter_by().first() with db.session.scalar(select().where().limit(1))
Replace .filter_by().count() with db.session.scalar(select(func.count()).where())
Update test mocks in test_external_dataset_service.py and external_dataset_service.py to match new patterns (unavoidable)

Note: test_fetch_external_knowledge_retrieval_non_200_status_returns_empty_list is a pre-existing failure on main (test expects empty list but code raises ValueError).

Test plan

140 unit tests pass (1 pre-existing failure deselected)
Basedpyright type check passes (0 errors)

Part of #22668

github-actions · 2026-04-03T03:04:19Z

Pyrefly Diff

base → PR

--- /tmp/pyrefly_base.txt	2026-04-03 03:04:05.833578513 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-03 03:03:57.184574213 +0000
@@ -6262,7 +6262,7 @@
 ERROR Argument `None` is not assignable to parameter `response` with type `Response` in function `httpx._exceptions.HTTPStatusError.__init__` [bad-argument-type]
   --> tests/unit_tests/services/enterprise/test_plugin_manager_service.py:61:26
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `services.external_knowledge_service.ExternalDatasetService.fetch_external_knowledge_retrieval` [bad-argument-type]
-   --> tests/unit_tests/services/external_dataset_service.py:849:36
+   --> tests/unit_tests/services/external_dataset_service.py:851:36
 ERROR Cannot index into `list[Unknown]` [bad-index]
    --> tests/unit_tests/services/hit_service.py:430:20
 ERROR Cannot index into `object` [bad-index]
@@ -6547,11 +6547,11 @@
 ERROR Argument `None` is not assignable to parameter `api_settings` with type `dict[Unknown, Unknown]` in function `services.external_knowledge_service.ExternalDatasetService.validate_api_list` [bad-argument-type]
    --> tests/unit_tests/services/test_external_dataset_service.py:401:54
 ERROR Argument `str | None` is not assignable to parameter `s` with type `bytearray | bytes | str` in function `json.loads` [bad-argument-type]
-   --> tests/unit_tests/services/test_external_dataset_service.py:893:31
+   --> tests/unit_tests/services/test_external_dataset_service.py:880:31
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1478:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1417:16
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1479:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1418:16
 ERROR Argument `Literal['invalid']` is not assignable to parameter `session_factory` with type `Engine | sessionmaker[Unknown] | None` in function `services.file_service.FileService.__init__` [bad-argument-type]
   --> tests/unit_tests/services/test_file_service.py:48:41
 ERROR `in` is not supported between `Literal['form_id=test-form']` and `None` [not-iterable]

github-actions · 2026-04-03T03:06:05Z

Pyrefly Diff

base → PR

--- /tmp/pyrefly_base.txt	2026-04-03 03:05:52.834213706 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-03 03:05:43.975152628 +0000
@@ -6262,7 +6262,7 @@
 ERROR Argument `None` is not assignable to parameter `response` with type `Response` in function `httpx._exceptions.HTTPStatusError.__init__` [bad-argument-type]
   --> tests/unit_tests/services/enterprise/test_plugin_manager_service.py:61:26
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `services.external_knowledge_service.ExternalDatasetService.fetch_external_knowledge_retrieval` [bad-argument-type]
-   --> tests/unit_tests/services/external_dataset_service.py:849:36
+   --> tests/unit_tests/services/external_dataset_service.py:851:36
 ERROR Cannot index into `list[Unknown]` [bad-index]
    --> tests/unit_tests/services/hit_service.py:430:20
 ERROR Cannot index into `object` [bad-index]
@@ -6547,11 +6547,11 @@
 ERROR Argument `None` is not assignable to parameter `api_settings` with type `dict[Unknown, Unknown]` in function `services.external_knowledge_service.ExternalDatasetService.validate_api_list` [bad-argument-type]
    --> tests/unit_tests/services/test_external_dataset_service.py:401:54
 ERROR Argument `str | None` is not assignable to parameter `s` with type `bytearray | bytes | str` in function `json.loads` [bad-argument-type]
-   --> tests/unit_tests/services/test_external_dataset_service.py:893:31
+   --> tests/unit_tests/services/test_external_dataset_service.py:880:31
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1478:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1417:16
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1479:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1418:16
 ERROR Argument `Literal['invalid']` is not assignable to parameter `session_factory` with type `Engine | sessionmaker[Unknown] | None` in function `services.file_service.FileService.__init__` [bad-argument-type]
   --> tests/unit_tests/services/test_file_service.py:48:41
 ERROR `in` is not supported between `Literal['form_id=test-form']` and `None` [not-iterable]

Copilot

Pull request overview

Refactors ExternalDatasetService DB access in external_knowledge_service.py to use SQLAlchemy 2.x select() / Session.scalar() patterns, updating unit-test mocks accordingly as part of the repo-wide effort to improve typing and modernize ORM usage (Issue #22668).

Changes:

Migrated multiple db.session.query(...).filter_by(...).first()/count() usages to select(...).where(...) with db.session.scalar(...).
Updated count queries to select(func.count(...)) and ensured scalar results are handled safely.
Updated unit tests’ DB-session mocks to match the new scalar()-based query style.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
api/services/external_knowledge_service.py	Replaces legacy `Query` API usage with SQLAlchemy 2.x `select()` / `scalar()` patterns for external knowledge API/binding lookups.
api/tests/unit_tests/services/test_external_dataset_service.py	Updates mocks from `session.query()` chains to `session.scalar()` return values/side effects.
api/tests/unit_tests/services/external_dataset_service.py	Updates mocks to use `session.scalar()` and adapts multi-call scenarios via `side_effect`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T03:47:15Z

+            select(ExternalKnowledgeApis)
+            .where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)


In fetch_external_knowledge_retrieval, the API template lookup only filters by ExternalKnowledgeApis.id and does not constrain by tenant_id. Since there are no FK constraints between external_knowledge_bindings.external_knowledge_api_id and external_knowledge_apis.id, a malformed/corrupted binding could cause a cross-tenant API template read. Consider adding ExternalKnowledgeApis.tenant_id == tenant_id (or external_knowledge_binding.tenant_id) to the where() clause to enforce tenant isolation at query time.

Suggested change

select(ExternalKnowledgeApis)

.where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)

select(ExternalKnowledgeApis).where(

ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id,

ExternalKnowledgeApis.tenant_id == tenant_id,

)

Copilot · 2026-04-03T03:47:16Z

    def test_fetch_external_knowledge_retrieval_non_200_status_returns_empty_list(self, mock_db_session: MagicMock):
        """
        Non‑200 responses should be treated as an empty result set.
        """

        binding = ExternalDatasetTestDataFactory.create_external_binding()
        api = Mock(spec=ExternalKnowledgeApis)
        api.settings = '{"endpoint":"https://example.com","api_key":"secret"}'

-        mock_db_session.query.return_value.filter_by.return_value.first.side_effect = [
+        mock_db_session.scalar.side_effect = [
            binding,
            api,
        ]


This test asserts that a non-200 response is treated as an empty result set, but ExternalDatasetService.fetch_external_knowledge_retrieval() currently raises ValueError(response.text) for non-200 (see api/services/external_knowledge_service.py:350-351). Either update the test to expect the exception, or update the service implementation to match the documented/tested behavior (alternatively mark the test xfail with a clear reason if it’s intentionally tracking a known issue).

RenzoMXD · 2026-04-03T03:57:53Z

Thanks. 😊

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

refactor: select in external_knowledge_service

b5bb016

RenzoMXD requested review from JohnJyong, QuantumGhost and laipz8200 as code owners April 3, 2026 03:02

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. refactor labels Apr 3, 2026

[autofix.ci] apply automated fixes

59a37d9

asukaminato0721 enabled auto-merge April 3, 2026 03:41

asukaminato0721 approved these changes Apr 3, 2026

View reviewed changes

asukaminato0721 added this pull request to the merge queue Apr 3, 2026

dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 3, 2026

asukaminato0721 requested a review from Copilot April 3, 2026 03:42

Copilot started reviewing on behalf of asukaminato0721 April 3, 2026 03:43 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Merged via the queue into langgenius:main with commit 608958d Apr 3, 2026
31 of 33 checks passed

RenzoMXD deleted the refactor/select-external-knowledge-service branch April 3, 2026 03:57

HanqingZ pushed a commit to HanqingZ/dify that referenced this pull request Apr 23, 2026

refactor: select in external_knowledge_service (langgenius#34493)

1afed38

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: select in external_knowledge_service#34493

refactor: select in external_knowledge_service#34493
asukaminato0721 merged 2 commits intolanggenius:mainfrom
RenzoMXD:refactor/select-external-knowledge-service

RenzoMXD commented Apr 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Uh oh!

RenzoMXD commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		select(ExternalKnowledgeApis)
		.where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)

-            select(ExternalKnowledgeApis)
-            .where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)
+            select(ExternalKnowledgeApis).where(
+                ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id,
+                ExternalKnowledgeApis.tenant_id == tenant_id,
+            )

Conversation

RenzoMXD commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 3, 2026

Pyrefly Diff

Uh oh!

github-actions Bot commented Apr 3, 2026

Pyrefly Diff

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RenzoMXD commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RenzoMXD commented Apr 3, 2026 •

edited

Loading