Skip to content

refactor: select in external_knowledge_service#34493

Merged
asukaminato0721 merged 2 commits intolanggenius:mainfrom
RenzoMXD:refactor/select-external-knowledge-service
Apr 3, 2026
Merged

refactor: select in external_knowledge_service#34493
asukaminato0721 merged 2 commits intolanggenius:mainfrom
RenzoMXD:refactor/select-external-knowledge-service

Conversation

@RenzoMXD
Copy link
Copy Markdown
Contributor

@RenzoMXD RenzoMXD commented Apr 3, 2026

Summary

  • Migrate all 10 db.session.query() calls to SQLAlchemy 2.x select() style in external_knowledge_service.py
  • Replace .filter_by().first() with db.session.scalar(select().where().limit(1))
  • Replace .filter_by().count() with db.session.scalar(select(func.count()).where())
  • Update test mocks in test_external_dataset_service.py and external_dataset_service.py to match new patterns (unavoidable)

Note: test_fetch_external_knowledge_retrieval_non_200_status_returns_empty_list is a pre-existing failure on main (test expects empty list but code raises ValueError).

Test plan

  • 140 unit tests pass (1 pre-existing failure deselected)
  • Basedpyright type check passes (0 errors)

Part of #22668

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. refactor labels Apr 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-03 03:04:05.833578513 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-03 03:03:57.184574213 +0000
@@ -6262,7 +6262,7 @@
 ERROR Argument `None` is not assignable to parameter `response` with type `Response` in function `httpx._exceptions.HTTPStatusError.__init__` [bad-argument-type]
   --> tests/unit_tests/services/enterprise/test_plugin_manager_service.py:61:26
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `services.external_knowledge_service.ExternalDatasetService.fetch_external_knowledge_retrieval` [bad-argument-type]
-   --> tests/unit_tests/services/external_dataset_service.py:849:36
+   --> tests/unit_tests/services/external_dataset_service.py:851:36
 ERROR Cannot index into `list[Unknown]` [bad-index]
    --> tests/unit_tests/services/hit_service.py:430:20
 ERROR Cannot index into `object` [bad-index]
@@ -6547,11 +6547,11 @@
 ERROR Argument `None` is not assignable to parameter `api_settings` with type `dict[Unknown, Unknown]` in function `services.external_knowledge_service.ExternalDatasetService.validate_api_list` [bad-argument-type]
    --> tests/unit_tests/services/test_external_dataset_service.py:401:54
 ERROR Argument `str | None` is not assignable to parameter `s` with type `bytearray | bytes | str` in function `json.loads` [bad-argument-type]
-   --> tests/unit_tests/services/test_external_dataset_service.py:893:31
+   --> tests/unit_tests/services/test_external_dataset_service.py:880:31
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1478:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1417:16
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1479:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1418:16
 ERROR Argument `Literal['invalid']` is not assignable to parameter `session_factory` with type `Engine | sessionmaker[Unknown] | None` in function `services.file_service.FileService.__init__` [bad-argument-type]
   --> tests/unit_tests/services/test_file_service.py:48:41
 ERROR `in` is not supported between `Literal['form_id=test-form']` and `None` [not-iterable]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-03 03:05:52.834213706 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-03 03:05:43.975152628 +0000
@@ -6262,7 +6262,7 @@
 ERROR Argument `None` is not assignable to parameter `response` with type `Response` in function `httpx._exceptions.HTTPStatusError.__init__` [bad-argument-type]
   --> tests/unit_tests/services/enterprise/test_plugin_manager_service.py:61:26
 ERROR Argument `SimpleNamespace` is not assignable to parameter `metadata_condition` with type `MetadataCondition | None` in function `services.external_knowledge_service.ExternalDatasetService.fetch_external_knowledge_retrieval` [bad-argument-type]
-   --> tests/unit_tests/services/external_dataset_service.py:849:36
+   --> tests/unit_tests/services/external_dataset_service.py:851:36
 ERROR Cannot index into `list[Unknown]` [bad-index]
    --> tests/unit_tests/services/hit_service.py:430:20
 ERROR Cannot index into `object` [bad-index]
@@ -6547,11 +6547,11 @@
 ERROR Argument `None` is not assignable to parameter `api_settings` with type `dict[Unknown, Unknown]` in function `services.external_knowledge_service.ExternalDatasetService.validate_api_list` [bad-argument-type]
    --> tests/unit_tests/services/test_external_dataset_service.py:401:54
 ERROR Argument `str | None` is not assignable to parameter `s` with type `bytearray | bytes | str` in function `json.loads` [bad-argument-type]
-   --> tests/unit_tests/services/test_external_dataset_service.py:893:31
+   --> tests/unit_tests/services/test_external_dataset_service.py:880:31
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1478:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1417:16
 ERROR `None` is not subscriptable [unsupported-operation]
-    --> tests/unit_tests/services/test_external_dataset_service.py:1479:16
+    --> tests/unit_tests/services/test_external_dataset_service.py:1418:16
 ERROR Argument `Literal['invalid']` is not assignable to parameter `session_factory` with type `Engine | sessionmaker[Unknown] | None` in function `services.file_service.FileService.__init__` [bad-argument-type]
   --> tests/unit_tests/services/test_file_service.py:48:41
 ERROR `in` is not supported between `Literal['form_id=test-form']` and `None` [not-iterable]

@asukaminato0721 asukaminato0721 added this pull request to the merge queue Apr 3, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 3, 2026
@asukaminato0721 asukaminato0721 requested a review from Copilot April 3, 2026 03:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors ExternalDatasetService DB access in external_knowledge_service.py to use SQLAlchemy 2.x select() / Session.scalar() patterns, updating unit-test mocks accordingly as part of the repo-wide effort to improve typing and modernize ORM usage (Issue #22668).

Changes:

  • Migrated multiple db.session.query(...).filter_by(...).first()/count() usages to select(...).where(...) with db.session.scalar(...).
  • Updated count queries to select(func.count(...)) and ensured scalar results are handled safely.
  • Updated unit tests’ DB-session mocks to match the new scalar()-based query style.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
api/services/external_knowledge_service.py Replaces legacy Query API usage with SQLAlchemy 2.x select() / scalar() patterns for external knowledge API/binding lookups.
api/tests/unit_tests/services/test_external_dataset_service.py Updates mocks from session.query() chains to session.scalar() return values/side effects.
api/tests/unit_tests/services/external_dataset_service.py Updates mocks to use session.scalar() and adapts multi-call scenarios via side_effect.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +316 to +317
select(ExternalKnowledgeApis)
.where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fetch_external_knowledge_retrieval, the API template lookup only filters by ExternalKnowledgeApis.id and does not constrain by tenant_id. Since there are no FK constraints between external_knowledge_bindings.external_knowledge_api_id and external_knowledge_apis.id, a malformed/corrupted binding could cause a cross-tenant API template read. Consider adding ExternalKnowledgeApis.tenant_id == tenant_id (or external_knowledge_binding.tenant_id) to the where() clause to enforce tenant isolation at query time.

Suggested change
select(ExternalKnowledgeApis)
.where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)
select(ExternalKnowledgeApis).where(
ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id,
ExternalKnowledgeApis.tenant_id == tenant_id,
)

Copilot uses AI. Check for mistakes.
Comment on lines 897 to 909
def test_fetch_external_knowledge_retrieval_non_200_status_returns_empty_list(self, mock_db_session: MagicMock):
"""
Non‑200 responses should be treated as an empty result set.
"""

binding = ExternalDatasetTestDataFactory.create_external_binding()
api = Mock(spec=ExternalKnowledgeApis)
api.settings = '{"endpoint":"https://example.com","api_key":"secret"}'

mock_db_session.query.return_value.filter_by.return_value.first.side_effect = [
mock_db_session.scalar.side_effect = [
binding,
api,
]
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test asserts that a non-200 response is treated as an empty result set, but ExternalDatasetService.fetch_external_knowledge_retrieval() currently raises ValueError(response.text) for non-200 (see api/services/external_knowledge_service.py:350-351). Either update the test to expect the exception, or update the service implementation to match the documented/tested behavior (alternatively mark the test xfail with a clear reason if it’s intentionally tracking a known issue).

Copilot uses AI. Check for mistakes.
Merged via the queue into langgenius:main with commit 608958d Apr 3, 2026
31 of 33 checks passed
@RenzoMXD
Copy link
Copy Markdown
Contributor Author

RenzoMXD commented Apr 3, 2026

Thanks. 😊

@RenzoMXD RenzoMXD deleted the refactor/select-external-knowledge-service branch April 3, 2026 03:57
HanqingZ pushed a commit to HanqingZ/dify that referenced this pull request Apr 23, 2026
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer refactor size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants