Skip to content

refactor: decouple workflow knowledge access via repository injection#32071

Open
shuv-amp wants to merge 2 commits intolanggenius:mainfrom
shuv-amp:feature/graph-engine-decoupling
Open

refactor: decouple workflow knowledge access via repository injection#32071
shuv-amp wants to merge 2 commits intolanggenius:mainfrom
shuv-amp:feature/graph-engine-decoupling

Conversation

@shuv-amp
Copy link

@shuv-amp shuv-amp commented Feb 6, 2026

Summary

  • Introduce a knowledge repository protocol with a SQLAlchemy implementation that is tenant-scoped and uses per-call sessions.
  • Inject repositories into workflow and pipeline execution paths, including single-step runs and sub-graph execution.
  • Refactor KnowledgeRetrievalNode to use repository access and add focused unit coverage.

Testing

  • make lint
  • uv run --directory api --dev -- basedpyright --threads 8
  • uv --directory api run mypy --exclude-gitignore --exclude 'tests/' --exclude 'migrations/' --check-untyped-defs --disable-error-code=import-untyped .
  • uv --directory api run ty check
  • uv run --project api pytest api/tests/unit_tests/core/workflow api/tests/unit_tests/core/app/apps -q

Related Issue

Part of #30269.
Implements the Repository pattern to decouple the Graph Engine from Flask-SQLAlchemy, setting the stage for the FastAPI migration.

Copilot AI review requested due to automatic review settings February 6, 2026 11:45
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Feb 6, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shuv-amp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural improvement by implementing a repository pattern for knowledge base access within the workflow system. By decoupling data access logic from core workflow components, it enhances modularity, simplifies testing, and provides a more robust and maintainable structure for handling knowledge-related operations across different execution contexts. The changes ensure that database sessions are managed per-call, preventing shared state issues in concurrent environments.

Highlights

  • Introduced Knowledge Repository Pattern: A new KnowledgeRepository protocol and its SQLAlchemyKnowledgeRepository implementation have been added to abstract database access for knowledge-related operations. This decouples the KnowledgeRetrievalNode from direct database interactions.
  • Repository Injection into Workflow Execution: The newly created Repositories container, holding the KnowledgeRepository instance, is now injected into various workflow execution paths, including single-step runs, sub-graph executions, and pipeline runners. This ensures that nodes requiring data access receive their dependencies through a unified mechanism.
  • Refactored KnowledgeRetrievalNode: The KnowledgeRetrievalNode has been refactored to utilize the injected KnowledgeRepository for all its data access needs, removing direct SQLAlchemy queries and model imports. This significantly improves the node's modularity and testability.
  • Enhanced Test Coverage: New unit tests have been added specifically for the KnowledgeRetrievalNode to validate its interaction with the KnowledgeRepository and ensure proper behavior, including handling cases where the repository is not configured.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • api/.importlinter
    • Removed direct database imports for knowledge_retrieval_node to enforce the new repository pattern.
  • api/core/app/apps/pipeline/pipeline_runner.py
    • Imported SQLAlchemyKnowledgeRepository and Repositories.
    • Instantiated Repositories with SQLAlchemyKnowledgeRepository.
    • Passed the repositories object to workflow execution methods.
  • api/core/app/apps/workflow/app_generator.py
    • Imported SQLAlchemyKnowledgeRepository and Repositories.
    • Refactored session factory instantiation and usage to align with repository injection.
    • Instantiated and passed Repositories to WorkflowAppRunner.
  • api/core/app/apps/workflow/app_runner.py
    • Removed unused logging import.
    • Imported Repositories and added it as an instance attribute.
    • Updated methods to pass the repositories object to graph initialization.
  • api/core/app/apps/workflow_app_runner.py
    • Imported Repositories.
    • Added repositories parameter to graph initialization and single-node execution preparation methods.
  • api/core/repositories/sqlalchemy_knowledge_repository.py
    • Added new file defining SQLAlchemyKnowledgeRepository for SQLAlchemy-based knowledge data access.
    • Implemented methods for fetching datasets, documents, metadata, and logging rate limits, using per-call sessions.
  • api/core/workflow/entities/graph_init_params.py
    • Added repositories field to GraphInitParams to allow injection of repository instances.
    • Configured Pydantic to allow arbitrary types for the repositories field.
  • api/core/workflow/entities/repositories.py
    • Added new file defining the Repositories dataclass as a container for various repository interfaces.
  • api/core/workflow/nodes/iteration/iteration_node.py
    • Modified iteration node to pass repositories from graph_init_params to sub-graph engine creation.
  • api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py
    • Removed direct database imports and replaced them with calls to the injected KnowledgeRepository.
    • Introduced _get_knowledge_repo and _close_repository_session methods for repository interaction.
    • Updated rate limit logging to use the repository.
    • Improved error logging for knowledge retrieval failures.
  • api/core/workflow/nodes/loop/loop_node.py
    • Modified loop node to pass repositories from graph_init_params to sub-graph engine creation.
  • api/core/workflow/repositories/init.py
    • Implemented lazy loading for OrderConfig and WorkflowNodeExecutionRepository to prevent potential import cycles.
  • api/core/workflow/repositories/knowledge_repository.py
    • Added new file defining the KnowledgeRepository protocol and associated entity protocols (DatasetEntity, DocumentEntity, DatasetMetadataEntity).
  • api/core/workflow/workflow_entry.py
    • Imported Repositories.
    • Added repositories parameter to single_step_run and passed it to GraphInitParams.
  • api/services/rag_pipeline/rag_pipeline.py
    • Imported SQLAlchemyKnowledgeRepository and Repositories.
    • Instantiated Repositories and passed it to WorkflowEntry.single_step_run for draft workflow nodes and datasource variable setting.
  • api/services/workflow_service.py
    • Imported SQLAlchemyKnowledgeRepository and Repositories.
    • Instantiated Repositories and passed it to WorkflowEntry.single_step_run for draft workflow node execution.
  • api/tests/unit_tests/core/app/apps/test_workflow_app_runner_single_node.py
    • Updated test calls to _prepare_single_node_execution to include the new repositories parameter.
  • api/tests/unit_tests/core/workflow/nodes/knowledge_retrieval/test_retrieval_node_mock.py
    • Added new unit tests to verify KnowledgeRetrievalNode's interaction with the KnowledgeRepository and its error handling when a repository is missing.
Activity
  • The author has provided a comprehensive summary of the changes, including the introduction of a knowledge repository protocol and its SQLAlchemy implementation, injection into workflow paths, and refactoring of the KnowledgeRetrievalNode.
  • Testing instructions are included in the PR description, covering linting, type checking with basedpyright and mypy, and unit tests for core workflow components, indicating thorough local validation by the author.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a well-executed architectural refactoring that successfully decouples the KnowledgeRetrievalNode from direct database access. The introduction of the KnowledgeRepository protocol and its SQLAlchemy implementation, along with the use of dependency injection to provide repositories to the workflow context, significantly improves maintainability, testability, and separation of concerns. The changes are systematic and clean, and the new unit tests for the KnowledgeRetrievalNode are a great addition. I have one suggestion regarding a potential architectural layering improvement.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request decouples the KnowledgeRetrievalNode from direct database access by introducing a repository pattern with dependency injection. The changes improve testability, maintainability, and architectural boundaries within the workflow execution system.

Changes:

  • Introduced a KnowledgeRepository protocol and SQLAlchemyKnowledgeRepository implementation with per-method session management to ensure thread safety
  • Added a Repositories container and injection mechanism through GraphInitParams to propagate repository instances through workflow execution paths
  • Refactored KnowledgeRetrievalNode to use repository methods instead of direct db.session access, and added comprehensive unit tests with mocks

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
api/core/workflow/repositories/knowledge_repository.py Defines Protocol interfaces for Dataset, Document, and KnowledgeRepository operations
api/core/repositories/sqlalchemy_knowledge_repository.py Implements KnowledgeRepository with SQLAlchemy, using per-method sessions for thread safety
api/core/workflow/entities/repositories.py Introduces Repositories container for dependency injection
api/core/workflow/entities/graph_init_params.py Adds optional repositories field to GraphInitParams for propagation
api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py Refactors node to use injected repository instead of direct db access
api/core/workflow/workflow_entry.py Adds repositories parameter to single_step_run method
api/core/app/apps/workflow/app_generator.py Creates and injects Repositories instance in workflow runner
api/core/app/apps/workflow/app_runner.py Accepts and propagates repositories through workflow execution
api/core/app/apps/workflow_app_runner.py Propagates repositories through graph initialization paths
api/core/app/apps/pipeline/pipeline_runner.py Creates and injects repositories for RAG pipeline execution
api/core/workflow/nodes/iteration/iteration_node.py Propagates repositories to nested graph engines
api/core/workflow/nodes/loop/loop_node.py Propagates repositories to nested graph engines
api/services/workflow_service.py Creates repository instances for draft workflow node execution
api/services/rag_pipeline/rag_pipeline.py Creates repository instances for RAG pipeline draft node execution
api/tests/unit_tests/core/workflow/nodes/knowledge_retrieval/test_retrieval_node_mock.py Adds comprehensive unit tests verifying repository usage and error handling
api/tests/unit_tests/core/app/apps/test_workflow_app_runner_single_node.py Updates test to account for new repositories parameter
api/core/workflow/repositories/init.py Refactors imports to use lazy loading pattern to avoid circular dependencies
api/.importlinter Removes obsolete ignore rules for removed direct database dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 255 to 256
# avoid blocking at retrieval
db.session.close()
knowledge_repo.close_session()
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to knowledge_repo.close_session() here is redundant. Since the SQLAlchemyKnowledgeRepository implementation uses per-method sessions (each repository method creates and closes its own session with context managers), this explicit call to close_session() does nothing - it's implemented as a no-op that simply returns. This line can be safely removed as it serves no purpose with the new repository design.

Copilot uses AI. Check for mistakes.
@shuv-amp
Copy link
Author

shuv-amp commented Feb 6, 2026

addressed the review points: moved metadata filter translation into the sqlalchemy knowledge repository to avoid the cross-layer dependency, and removed the redundant knowledge_repo.close_session call in the knowledge retrieval node. reran: uv run --project api pytest api/tests/unit_tests/core/workflow api/tests/unit_tests/core/app/apps -q (625 passed, 3 skipped).

@crazywoola
Copy link
Member

Please link an issue in the description :)
Besides, it seems promising to me. cc @laipz8200 @JohnJyong

@shuv-amp
Copy link
Author

shuv-amp commented Feb 7, 2026

Thanks for the review @crazywoola! I've linked issue #30269 in the description as requested. Appreciate the feedback!

@shuv-amp shuv-amp changed the title Decouple workflow knowledge access via repository injection refactor: decouple workflow knowledge access via repository injection Feb 7, 2026
@shuv-amp
Copy link
Author

shuv-amp commented Feb 8, 2026

Thanks for the AI review feedback! I went ahead and moved that metadata filter logic into the repository to fix the layering issue. I also removed the redundant session close call.

Everything should be sorted now, just waiting on the CI to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants