feat(workflow): pass project/public scope to knowledge retrieval#33855
feat(workflow): pass project/public scope to knowledge retrieval#33855zt15242 wants to merge 6 commits intolanggenius:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to knowledge retrieval by implementing project and public scope filtering for datasets. The changes enable more precise control over which datasets are accessible within different contexts, such as specific projects or publicly available data. This ensures that the knowledge retrieval process correctly enforces data visibility rules, improving data governance and relevance through schema updates, API modifications, and core logic adjustments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces project and public scoping for knowledge retrieval by adding project_id and space_type fields to datasets. The changes are propagated through various layers of the application, from API controllers to the core retrieval logic, and include necessary database migrations and unit tests. The implementation is mostly solid, but I've identified a logical flaw in the dataset scoping logic that could lead to incorrect retrieval results, and an opportunity to reduce code duplication for better maintainability. My feedback focuses on correcting the scoping bug and improving the code structure.
| space_scope = [] | ||
| if project_id: | ||
| space_scope.append( | ||
| and_( | ||
| Dataset.space_type == DatasetSpaceType.PROJECT.value, | ||
| Dataset.project_id == project_id, | ||
| ) | ||
| ) | ||
| if include_public: | ||
| space_scope.append(Dataset.space_type == DatasetSpaceType.PUBLIC.value) | ||
| if not space_scope: | ||
| space_scope.append(Dataset.space_type != DatasetSpaceType.PUBLIC.value) |
There was a problem hiding this comment.
There's a logical issue in how the space_scope is constructed. When project_id is None and include_public is True, the current logic incorrectly filters for only public datasets. The expected behavior should be to retrieve both public datasets and the user's accessible non-public (personal, project) datasets.
The proposed suggestion refactors this logic to be more explicit and correct, ensuring the right combination of scopes is applied in all cases.
space_scope = []
if project_id:
space_scope.append(
and_(
Dataset.space_type == DatasetSpaceType.PROJECT.value,
Dataset.project_id == project_id,
)
)
else:
# If no project is specified, user can access their non-public datasets.
space_scope.append(Dataset.space_type != DatasetSpaceType.PUBLIC.value)
if include_public:
space_scope.append(Dataset.space_type == DatasetSpaceType.PUBLIC.value)| def _validate_space_type(value: str | None) -> str | None: | ||
| if value is None: | ||
| return None | ||
| try: | ||
| return DatasetSpaceType(value).value | ||
| except ValueError: | ||
| raise ValueError("Invalid space_type. Allowed values: personal, project, public.") |
There was a problem hiding this comment.
This validation logic for space_type is duplicated in api/controllers/service_api/dataset/dataset.py. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this logic should be extracted into a shared utility function. This would ensure that any future changes to the validation only need to be made in one place.
Summary
Pass
project_idandinclude_publicthrough the workflow knowledge retrieval path so dataset retrieval can enforce project/public space scope correctly.Changes
project_idandinclude_publicto workflow/app config dataset entityVerification
Passed:
Result:
Docker validation:
docker compose config✅docker compose build api✅