feat(workflow): add document metadata configuration for Knowledge Base node #31438

ZeroZ-lab · 2026-01-23T07:19:00Z

Implement comprehensive document metadata support in Knowledge Base workflow node, allowing users to configure metadata values through both constants and variables.

Backend changes:

Add DocMetadata model with support for constant values and variable selectors
Implement metadata processing in KnowledgeIndexNode with variable resolution
Add batch query optimization to prevent N+1 queries
Implement metadata validation and binding creation
Add comprehensive unit tests for node and service layers

Frontend changes:

Add MetadataSection component with type-aware input controls
- String type: text input
- Number type: number input with validation
- Time type: date picker (Unix timestamp)
Implement variable filtering based on metadata data type
- String metadata: only string variables
- Number metadata: only number/integer variables
- Time metadata: only time-related number variables (timestamp, time, date, at)
Add VarReferencePicker with 360px min-width for better UX
Standardize font size to text-[13px] across all inputs
Add i18n support for all user-facing strings

Technical improvements:

Use SQLAlchemy attributes.flag_modified() for JSON field updates
Optimize logging to follow project standards (warnings and exceptions only)
Add type safety with proper TypeScript definitions
Implement proper error handling with user-friendly messages

Important

Make sure you have read our contribution guidelines
Ensure there is an associated issue and you have been assigned to it
Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

fix #31437

Screenshots

Before	After
...
...
...
...

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

…e node Implement comprehensive document metadata support in Knowledge Base workflow node, allowing users to configure metadata values through both constants and variables. Backend changes: - Add DocMetadata model with support for constant values and variable selectors - Implement metadata processing in KnowledgeIndexNode with variable resolution - Add batch query optimization to prevent N+1 queries - Implement metadata validation and binding creation - Add comprehensive unit tests for node and service layers Frontend changes: - Add MetadataSection component with type-aware input controls - String type: text input - Number type: number input with validation - Time type: date picker (Unix timestamp) - Implement variable filtering based on metadata data type - String metadata: only string variables - Number metadata: only number/integer variables - Time metadata: only time-related number variables (timestamp, *time*, *date*, *at*) - Add VarReferencePicker with 360px min-width for better UX - Standardize font size to text-[13px] across all inputs - Add i18n support for all user-facing strings Technical improvements: - Use SQLAlchemy attributes.flag_modified() for JSON field updates - Optimize logging to follow project standards (warnings and exceptions only) - Add type safety with proper TypeScript definitions - Implement proper error handling with user-friendly messages Co-Authored-By: Claude <noreply@anthropic.com>

gemini-code-assist · 2026-01-23T07:19:32Z

Summary of Changes

Hello @ZeroZ-lab, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Knowledge Base workflow node by introducing robust document metadata configuration. Users can now define and assign custom metadata to documents, leveraging both static values and dynamic variables from the workflow. This feature provides greater flexibility for organizing and enriching knowledge base content, ultimately improving data management and retrieval within workflows.

Highlights

Document Metadata Model: Introduced a new DocMetadata model in the backend to define the structure for document metadata, supporting various data types (string, integer, float, list of strings).
Knowledge Base Node Integration: Implemented comprehensive metadata processing within the KnowledgeIndexNode to handle document metadata configuration, including variable resolution for dynamic values and batch query optimization to prevent N+1 issues.
Frontend Metadata Configuration UI: Developed a new MetadataSection component for the Knowledge Base node, offering a user-friendly interface to configure document metadata. This includes type-aware input controls (text, number, date picker) and intelligent variable filtering based on metadata data types.
Metadata Management Enhancements: Added functionality to the DatasetService to save documents with associated metadata and updated the MetadataService to prevent deletion of metadata currently in use by a workflow pipeline.
Technical Improvements & Testing: Incorporated SQLAlchemy's attributes.flag_modified() for efficient JSON field updates, optimized logging, enhanced type safety with TypeScript, improved error handling, and added dedicated unit tests for metadata processing in both workflow nodes and dataset services.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-implemented feature for document metadata in Knowledge Base nodes. The backend changes are robust, including optimizations like batch fetching to avoid N+1 queries. The frontend implementation provides a good user experience with type-aware controls and clear variable selection.

I've identified a few areas for improvement:

A critical issue in a new unit test that prevents it from testing the intended logic.
A high-severity issue where a check for in-use metadata only considers draft workflows, potentially allowing deletion of metadata used in published applications.
A few medium-severity suggestions to improve performance and code quality in the backend and test suite.

Overall, this is a great contribution. Addressing the identified issues will make it even more solid.

I am having trouble creating individual review comments. Click here to see my feedback.

api/tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py (108)

This test is intended to verify custom metadata handling, but it mocks _invoke_knowledge_index, which is where the metadata processing logic actually resides. As a result, the test doesn't execute the code it's supposed to be testing, and the assertions will fail.

To fix this, you should remove the line node._invoke_knowledge_index = MagicMock(). The test already mocks IndexProcessorFactory, which should be sufficient to isolate the test from the actual indexing logic while allowing the metadata processing code to run.

api/services/metadata_service.py (113-117)

The check_metadata_used_in_pipeline function only checks the draft version of the workflow for metadata usage (version=Workflow.VERSION_DRAFT). This is risky because it doesn't account for metadata being used in a published workflow. A user could delete metadata that is essential for a live application, causing it to fail.

To prevent this, the check should be extended to include the published workflow as well. You can look at RagPipelineService for examples of how to query for both draft and published workflows.

api/core/workflow/nodes/knowledge_index/knowledge_index_node.py (172-177)

For better readability and conciseness, you can use a dictionary comprehension to create metadata_name_map. This is more idiomatic Python and can be slightly more performant.

                metadata_name_map: dict[str, str] = {
                    md.id: md.name
                    for md in db.session.scalars(
                        select(DatasetMetadata).where(DatasetMetadata.dataset_id == dataset.id)
                    ).all()
                }

api/services/dataset_service.py (1856-1867)

The current implementation for creating DatasetMetadataBinding records uses a nested loop, which can lead to a large number of individual INSERT statements and database calls, especially during batch document uploads. This could impact performance.

To improve efficiency, I recommend using db.session.bulk_insert_mappings to perform a single bulk insert operation.

                    if metadata_bindings_to_create and document_ids:
                        bindings_to_add = [
                            {
                                "tenant_id": dataset.tenant_id,
                                "dataset_id": dataset.id,
                                "document_id": doc_id,
                                "metadata_id": metadata_id,
                                "created_by": account.id,
                            }
                            for doc_id in document_ids
                            for metadata_id, _ in metadata_bindings_to_create
                        ]
                        db.session.bulk_insert_mappings(DatasetMetadataBinding, bindings_to_add)
                        db.session.commit()

api/tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py (48-55)

This side_effect assignment for pool.get is immediately overridden by another side_effect function definition on line 77. This makes the code confusing and seems to be leftover from development. Please remove these lines to improve test clarity.

api/tests/unit_tests/services/test_dataset_service_metadata.py (137)

The assertion assert mock_dependencies["db"].add.call_count >= 1 is quite weak. It only confirms that something was added to the session, but doesn't verify that a DatasetMetadataBinding was created with the correct attributes.

To make this test more robust, I suggest adding a more specific assertion to check that a DatasetMetadataBinding instance is added to the session and that its attributes (like metadata_id and document_id) are correct. You can iterate through mock_dependencies["db"].add.call_args_list to find the DatasetMetadataBinding instance and assert its properties.

- Fix test_delete_metadata_not_found: use pytest.raises() since service raises ValueError instead of returning None - Fix test_run_with_custom_metadata: remove mock of _invoke_knowledge_index that bypassed metadata processing logic, add missing BATCH/ORIGINAL_DOCUMENT_ID variable mocks, remove redundant side_effect code - Fix test_save_document_with_dataset_id_ignores_lock_not_owned: add missing enable_built_in_metadata and doc_metadata attributes to knowledge_config - Fix test_save_document_with_metadata: add .filter().all() mock chain for DatasetMetadata query - Enhance check_metadata_used_in_pipeline to check both draft and current published workflows (via pipeline.workflow_id) to prevent deletion of metadata actively used in production Co-Authored-By: Claude <noreply@anthropic.com>

Add proper input styling (border, rounded corners, background) to the metadata value container in Knowledge Base node for visual consistency with other form inputs. Co-Authored-By: Claude <noreply@anthropic.com>

Rewrite workflow query conditions to avoid type checking error with list.append() having inconsistent types. Co-Authored-By: Claude <noreply@anthropic.com>

Mock attributes.flag_modified to avoid SQLAlchemy internal state requirements. Update assertion to verify flag_modified call instead of db.session.add call.

Resolve conflicts between doc_metadata feature and summary_index feature: - api/core/workflow/nodes/knowledge_index/entities.py: merge both fields - api/core/workflow/nodes/knowledge_index/knowledge_index_node.py: merge imports - web/app/components/workflow/nodes/knowledge-base/hooks/use-config.ts: merge handlers - web/app/components/workflow/nodes/knowledge-base/types.ts: merge type definitions - web/app/components/workflow/nodes/knowledge-base/panel.tsx: merge handler usage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…lab/dify into feat-rag-pipline-metadata

- Add system-xs-regular and truncate to date-picker for consistent font size - Fix value container alignment with w-0 grow overflow-hidden pattern - Change delete button from × to trash icon with destructive hover - Align condition-date layout with other condition components Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ZeroZ-lab · 2026-02-09T03:19:44Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

…adataBinding` for new and duplicate documents and improving frontend metadata input handling.

…rkflow nodes, enhance data consistency for document metadata bindings, and improve metadata usage check robustness.

JohnJyong · 2026-02-09T09:10:59Z

the function disable built-in metadata also need to add metadata using check.
the others LGTM, thanks for your contribution @ZeroZ-lab

github-actions bot added the web This relates to changes on the web. label Jan 23, 2026

Merge branch 'main' into feat-rag-pipline-metadata

163a924

[autofix.ci] apply automated fixes

8a31d52

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

ZeroZ-lab and others added 2 commits January 23, 2026 16:29

🎨 style(web): add border and background to metadata value input

7997ef2

Add proper input styling (border, rounded corners, background) to the metadata value container in Knowledge Base node for visual consistency with other form inputs. Co-Authored-By: Claude <noreply@anthropic.com>

ZeroZ-lab force-pushed the feat-rag-pipline-metadata branch from dd457bb to 7997ef2 Compare January 23, 2026 08:40

ZeroZ-lab and others added 9 commits January 23, 2026 16:40

Merge branch 'main' into feat-rag-pipline-metadata

c85b57f

[autofix.ci] apply automated fixes

c0607d4

🐛 fix(api): fix type error in check_metadata_used_in_pipeline

15777a6

Rewrite workflow query conditions to avoid type checking error with list.append() having inconsistent types. Co-Authored-By: Claude <noreply@anthropic.com>

[autofix.ci] apply automated fixes

cf198ef

[autofix.ci] apply automated fixes (attempt 2/3)

f566da5

✅ test(api): fix mock setup in knowledge_index_node test

1df04fb

Mock attributes.flag_modified to avoid SQLAlchemy internal state requirements. Update assertion to verify flag_modified call instead of db.session.add call.

🎨 test(api): remove unused import

7759ab0

Merge branch 'main' into feat-rag-pipline-metadata

67016a7

[autofix.ci] apply automated fixes

96b7f2c

ZeroZ-lab marked this pull request as ready for review January 23, 2026 13:18

ZeroZ-lab requested review from JohnJyong, QuantumGhost, iamjoel and zxhlyh as code owners January 23, 2026 13:18

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 💪 enhancement New feature or request labels Jan 23, 2026

ZeroZ-lab and others added 5 commits January 26, 2026 10:01

Merge branch 'main' into feat-rag-pipline-metadata

8574d3e

Merge branch 'main' into feat-rag-pipline-metadata

c844453

refactor: format dataset model imports for readability

eba2f06

Merge branch 'main' into feat-rag-pipline-metadata

14bbb1d

ZeroZ-lab and others added 8 commits February 2, 2026 11:24

Merge branch 'main' into feat-rag-pipline-metadata

5886466

Merge branch 'main' into feat-rag-pipline-metadata

67a798f

style: remove fixed font size from date picker text

26248e3

Merge branch 'feat-rag-pipline-metadata' of https://github.com/ZeroZ-…

890755d

…lab/dify into feat-rag-pipline-metadata

Merge branch 'main' into feat-rag-pipline-metadata

72f2025

Merge branch 'main' into feat-rag-pipline-metadata

6d19d1d

Merge branch 'main' into feat-rag-pipline-metadata

d6cec2e

ZeroZ-lab requested a review from laipz8200 as a code owner February 5, 2026 08:18

autofix-ci bot and others added 2 commits February 5, 2026 08:22

[autofix.ci] apply automated fixes

8b97823

Merge branch 'main' into feat-rag-pipline-metadata

a391cc3

ZeroZ-lab added 2 commits February 9, 2026 12:33

Merge branch 'main' into feat-rag-pipline-metadata

5424ae2

Merge branch 'main' into feat-rag-pipline-metadata

dbbae7a

ZeroZ-lab marked this pull request as draft February 9, 2026 06:05

ZeroZ-lab and others added 5 commits February 9, 2026 14:06

Merge branch 'main' into feat-rag-pipline-metadata

c87483e

feat: Enhance document metadata management by introducing `DatasetMet…

c30bd6e

…adataBinding` for new and duplicate documents and improving frontend metadata input handling.

feat: Introduce ConstantValueInput for dynamic metadata input in wo…

42e36d0

…rkflow nodes, enhance data consistency for document metadata bindings, and improve metadata usage check robustness.

Merge branch 'main' into feat-rag-pipline-metadata

a7c2e1e

[autofix.ci] apply automated fixes

cc33385

ZeroZ-lab marked this pull request as ready for review February 9, 2026 06:33

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Feb 9, 2026

ZeroZ-lab added 2 commits February 10, 2026 14:53

Merge branch 'main' into feat-rag-pipline-metadata

f0d9fe8

Merge branch 'main' into feat-rag-pipline-metadata

b6a636e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflow): add document metadata configuration for Knowledge Base node #31438

feat(workflow): add document metadata configuration for Knowledge Base node #31438

Uh oh!

ZeroZ-lab commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ZeroZ-lab commented Feb 9, 2026

Uh oh!

JohnJyong commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(workflow): add document metadata configuration for Knowledge Base node #31438

Are you sure you want to change the base?

feat(workflow): add document metadata configuration for Knowledge Base node #31438

Uh oh!

Conversation

ZeroZ-lab commented Jan 23, 2026

Summary

Screenshots

Checklist

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

api/tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py (108)

api/services/metadata_service.py (113-117)

api/core/workflow/nodes/knowledge_index/knowledge_index_node.py (172-177)

api/services/dataset_service.py (1856-1867)

api/tests/unit_tests/core/workflow/nodes/knowledge_index/test_knowledge_index_node.py (48-55)

api/tests/unit_tests/services/test_dataset_service_metadata.py (137)

Uh oh!

ZeroZ-lab commented Feb 9, 2026

Code review

Uh oh!

JohnJyong commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants