feat(workflow): pass project/public scope to knowledge retrieval by zt15242 · Pull Request #33855 · langgenius/dify

zt15242 · 2026-03-21T18:03:52Z

Summary

Pass project_id and include_public through the workflow knowledge retrieval path so dataset retrieval can enforce project/public space scope correctly.

Changes

add project_id and include_public to workflow/app config dataset entity
propagate scope fields through workflow converter -> node data -> retrieval request
apply dataset availability filtering by project/public scope in dataset retrieval
add unit tests for:
- workflow converter scope propagation
- knowledge retrieval node request propagation
- dataset retrieval scope handoff

Verification

Passed:

api/tests/unit_tests/services/workflow/test_workflow_converter.py
api/tests/unit_tests/core/workflow/nodes/knowledge_retrieval/test_knowledge_retrieval_node.py
api/tests/unit_tests/core/rag/retrieval/test_dataset_retrieval_scope_patch.py

Result:

26 passed

Docker validation:

docker compose config ✅
docker compose build api ✅

gemini-code-assist · 2026-03-21T18:04:29Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to knowledge retrieval by implementing project and public scope filtering for datasets. The changes enable more precise control over which datasets are accessible within different contexts, such as specific projects or publicly available data. This ensures that the knowledge retrieval process correctly enforces data visibility rules, improving data governance and relevance through schema updates, API modifications, and core logic adjustments.

Highlights

Enhanced Dataset Model with Scoping: The dataset model has been extended to include project_id and space_type fields, allowing for granular control over dataset visibility and access within personal, project, or public scopes. A new DatasetSpaceType enum was introduced to define these categories.
Workflow Knowledge Retrieval Scope Propagation: The workflow's knowledge retrieval path now correctly propagates project_id and include_public parameters from the app configuration through the workflow node data to the final retrieval request. This ensures that dataset retrieval respects the defined project and public space scopes.
API and Service Layer Updates for Scoping: Dataset listing, creation, and update APIs have been modified to accept and filter by the new project_id and space_type fields. The core DatasetService now includes logic to apply these filters when retrieving datasets.
Improved Tenant Isolation for Dataset Access: All dataset retrieval calls across console controllers have been refactored to utilize a new get_dataset_in_tenant method. This change enhances tenant isolation by ensuring that datasets are always retrieved within the context of the current tenant.
Database Migration for New Fields: A new Alembic migration script has been added to introduce the project_id and space_type columns to the datasets table in the database, along with appropriate indexes.
Comprehensive Unit Test Coverage: New unit tests have been added and existing ones updated to validate the correct propagation of scope parameters through the workflow and the accurate application of the new filtering logic in dataset retrieval.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces project and public scoping for knowledge retrieval by adding project_id and space_type fields to datasets. The changes are propagated through various layers of the application, from API controllers to the core retrieval logic, and include necessary database migrations and unit tests. The implementation is mostly solid, but I've identified a logical flaw in the dataset scoping logic that could lead to incorrect retrieval results, and an opportunity to reduce code duplication for better maintainability. My feedback focuses on correcting the scoping bug and improving the code structure.

gemini-code-assist · 2026-03-21T18:06:26Z

+            space_scope = []
+            if project_id:
+                space_scope.append(
+                    and_(
+                        Dataset.space_type == DatasetSpaceType.PROJECT.value,
+                        Dataset.project_id == project_id,
+                    )
+                )
+            if include_public:
+                space_scope.append(Dataset.space_type == DatasetSpaceType.PUBLIC.value)
+            if not space_scope:
+                space_scope.append(Dataset.space_type != DatasetSpaceType.PUBLIC.value)


There's a logical issue in how the space_scope is constructed. When project_id is None and include_public is True, the current logic incorrectly filters for only public datasets. The expected behavior should be to retrieve both public datasets and the user's accessible non-public (personal, project) datasets.

The proposed suggestion refactors this logic to be more explicit and correct, ensuring the right combination of scopes is applied in all cases.

space_scope = [] if project_id: space_scope.append( and_( Dataset.space_type == DatasetSpaceType.PROJECT.value, Dataset.project_id == project_id, ) ) else: # If no project is specified, user can access their non-public datasets. space_scope.append(Dataset.space_type != DatasetSpaceType.PUBLIC.value) if include_public: space_scope.append(Dataset.space_type == DatasetSpaceType.PUBLIC.value)

gemini-code-assist · 2026-03-21T18:06:26Z

+def _validate_space_type(value: str | None) -> str | None:
+    if value is None:
+        return None
+    try:
+        return DatasetSpaceType(value).value
+    except ValueError:
+        raise ValueError("Invalid space_type. Allowed values: personal, project, public.")


This validation logic for space_type is duplicated in api/controllers/service_api/dataset/dataset.py. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this logic should be extracted into a shared utility function. This would ensure that any future changes to the validation only need to be made in one place.

root added 6 commits March 15, 2026 23:43

feat: add project space fields for datasets

b7882db

feat: add project space dataset filtering and tenant guards

642b5ef

feat: tighten tenant checks for dataset console access

53d5deb

feat: scope dataset segments access by tenant

810de28

feat: validate dataset space type across create flows

58899b6

feat(workflow): pass project scope to knowledge retrieval

8a082be

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 21, 2026

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

dosubot bot mentioned this pull request Mar 23, 2026

No search result from Knowledge retrieval node in chatflow/workflow, but retrieval testing in knowledge is ok #33909

Closed

6 tasks

dosubot bot mentioned this pull request Apr 6, 2026

Knowledge Retrieval node returns [] most of the time after v1.13.x update (HQ-HYBRID + reranker + metadata filter) #34588

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflow): pass project/public scope to knowledge retrieval#33855

feat(workflow): pass project/public scope to knowledge retrieval#33855
zt15242 wants to merge 6 commits intolanggenius:mainfrom
zt15242:feature/project-space

zt15242 commented Mar 21, 2026

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 21, 2026

Uh oh!

gemini-code-assist bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zt15242 commented Mar 21, 2026

Summary

Changes

Verification

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant