Skip to content

Conversation

@NeatGuyCoding
Copy link
Contributor

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

Fix #30449

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

…ralized

utility function, ensuring consistent and secure handling of special characters
across all database queries.

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 1, 2026 10:00
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 1, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NeatGuyCoding, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the API's security by refactoring and centralizing the handling of SQL LIKE pattern escaping. A new utility function ensures that special characters in user-supplied search keywords are properly escaped, preventing them from being interpreted as SQL wildcards. This change is crucial for mitigating potential SQL injection risks and maintaining data integrity across various search and filtering functionalities within the application.

Highlights

  • Centralized SQL LIKE Escaping: Introduced a new escape_like_pattern utility function in libs/helper.py to consistently handle special characters (%, _, \) in SQL LIKE clauses.
  • Enhanced Security: Integrated the new escaping function across various API endpoints and services to prevent SQL injection vulnerabilities by ensuring user-provided input in LIKE queries is treated as literal text.
  • Comprehensive Integration: Applied the centralized escaping logic to conversation search, dataset segment retrieval, RAG metadata filtering, app name search, tag name search, and workflow app logs.
  • Robust Test Coverage: Added dedicated unit tests for the escape_like_pattern function and new integration tests for annotation, app, tag, and workflow app services to validate the correct behavior with special characters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized utility function, escape_like_pattern, to consistently and securely handle special characters (%, _, \) in SQL LIKE patterns. This function is applied across various parts of the API where user-provided keywords are used in database queries, mitigating potential SQL injection vulnerabilities through LIKE wildcards. The changes are well-implemented, replacing previous ad-hoc escaping logic with the new standardized function and correctly using the ESCAPE clause in SQLAlchemy queries. The addition of comprehensive unit and integration tests validates the new escaping logic and ensures its correctness across different scenarios. The overall change significantly improves the security and maintainability of the codebase.

from libs.helper import escape_like_pattern

escaped_keyword = escape_like_pattern(keyword)
query = query.where(sa.and_(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of sa.and_ is redundant here as there is only a single condition. You can simplify the code by removing it for better readability.

Suggested change
query = query.where(sa.and_(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\")))
query = query.where(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\"))

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors SQL LIKE pattern escaping logic by introducing a centralized utility function escape_like_pattern in libs/helper.py. The utility properly escapes special SQL LIKE wildcard characters (%, _, \) to prevent SQL injection vulnerabilities when user input is used in LIKE queries. The refactoring replaces inconsistent inline escaping logic across multiple services, controllers, and data access layers with a single, well-tested implementation.

Key changes:

  • Added escape_like_pattern() utility function with comprehensive unit tests
  • Updated all SQL LIKE queries across services to use the centralized escaping function with proper ESCAPE clause
  • Added integration tests to verify special character handling prevents wildcard exploitation

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
api/libs/helper.py Introduces the centralized escape_like_pattern() utility function with documentation
api/tests/unit_tests/libs/test_helper.py Adds comprehensive unit tests for the escape utility function
api/services/workflow_app_service.py Updates workflow app log search to use centralized escaping, replacing previous unicode_escape approach
api/services/tag_service.py Updates tag name search to use centralized escaping with ESCAPE clause
api/services/external_knowledge_service.py Updates external knowledge API search to use centralized escaping
api/services/dataset_service.py Updates dataset name, child chunk content, and segment content searches to use centralized escaping
api/services/conversation_service.py Replaces inline escaping with centralized utility for conversation variable search
api/services/app_service.py Updates app name search to use centralized escaping
api/services/annotation_service.py Updates annotation question/content search to use centralized escaping
api/core/rag/retrieval/dataset_retrieval.py Updates metadata filter LIKE operations to use centralized escaping
api/core/rag/datasource/vdb/iris/iris_vector.py Updates IRIS full-text LIKE search fallback to use centralized escaping
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py Updates ClickZetta LIKE search to use centralized escaping
api/controllers/console/datasets/datasets_segments.py Updates segment content and keywords search to use centralized escaping
api/controllers/console/app/conversation.py Updates both completion and chat conversation searches to use centralized escaping
api/tests/test_containers_integration_tests/services/test_workflow_app_service.py Adds integration tests for special character handling in workflow logs
api/tests/test_containers_integration_tests/services/test_tag_service.py Adds integration tests for special character handling in tag search
api/tests/test_containers_integration_tests/services/test_app_service.py Adds integration tests for special character handling in app search
api/tests/test_containers_integration_tests/services/test_annotation_service.py Adds integration tests for special character handling in annotation search

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Escape special characters in a string for safe use in SQL LIKE patterns.
This function escapes the special characters used in SQL LIKE patterns:
- Backslash (\\) -> \\
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment shows incorrect escaping representation. In the docstring:

  • Line 40 says "Backslash (\) -> \" but should say "Backslash (\) -> \\"
  • Line 41 says "Percent (%) -> \%" which is correct
  • Line 42 says "Underscore () -> \" which is correct

The actual implementation and examples are correct, but the description in lines 40-42 should clearly show that a single backslash becomes two backslashes (four in the Python string representation).

Suggested change
- Backslash (\\) -> \\
- Backslash (\\) -> \\\\

Copilot uses AI. Check for mistakes.
Comment on lines 365 to 375
# Test 4: Search with % should NOT match 100% (verifies escaping works)
workflow_run.inputs = json.dumps({"search_term": "100% different", "input2": "other_value"})
workflow_run.outputs = json.dumps({"result": "100% different result", "status": "success"})
db.session.commit()

result = service.get_paginate_workflow_app_logs(
session=db_session_with_containers, app_model=app, keyword="50%", page=1, limit=20
)
# Should not find the 100% entry when searching for 50%
assert result["total"] == 0

Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has a logic issue. It reuses the same workflow_run object for all four test cases, modifying its inputs and outputs each time. This means that when Test 4 runs, the workflow_run contains "100% different" data, not "50% discount" data. The test assertion on line 374 expects total == 0, but this is only correct because the data was overwritten, not because the escaping is working correctly.

To properly test the escaping functionality, each test case should create separate workflow runs with distinct data, or the test should verify that searching for "50%" finds exactly the record containing "50% discount" and not the one containing "100% different".

Copilot uses AI. Check for mistakes.
logger = logging.getLogger(__name__)


def escape_like_pattern(pattern: str) -> str:
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation should handle the case where the pattern is None. Currently the function signature shows def escape_like_pattern(pattern: str) -> str:, but the function can accept and return None (as shown in the test case and the if not pattern check). The type annotation should be def escape_like_pattern(pattern: str | None) -> str | None: to accurately reflect the function's behavior.

Suggested change
def escape_like_pattern(pattern: str) -> str:
def escape_like_pattern(pattern: str | None) -> str | None:

Copilot uses AI. Check for mistakes.
- Search treats special characters as literal characters, not wildcards
- SQL injection via LIKE wildcards is prevented
"""
fake = Faker()
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable fake is not used.

Suggested change
fake = Faker()

Copilot uses AI. Check for mistakes.
app_service = AppService()

# Create apps with special characters in names
app_with_percent = app_service.create_app(
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable app_with_percent is not used.

Suggested change
app_with_percent = app_service.create_app(
app_service.create_app(

Copilot uses AI. Check for mistakes.
account,
)

app_with_underscore = app_service.create_app(
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable app_with_underscore is not used.

Copilot uses AI. Check for mistakes.
account,
)

app_with_backslash = app_service.create_app(
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable app_with_backslash is not used.

Suggested change
app_with_backslash = app_service.create_app(
app_service.create_app(

Copilot uses AI. Check for mistakes.
)

# Create app that should NOT match
app_no_match = app_service.create_app(
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable app_no_match is not used.

Copilot uses AI. Check for mistakes.
- SQL injection via LIKE wildcards is prevented
"""
# Arrange: Create test data
fake = Faker()
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable fake is not used.

Suggested change
fake = Faker()

Copilot uses AI. Check for mistakes.
- SQL injection via LIKE wildcards is prevented
"""
# Arrange: Create test data
fake = Faker()
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable fake is not used.

Suggested change
fake = Faker()

Copilot uses AI. Check for mistakes.
NeatGuyCoding and others added 2 commits January 1, 2026 22:37
…logic

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use any search/filter functionality that accepts user input for LIKE/ILIKE queries, input special characters that are SQL LIKE wildcards: %, _, or \

1 participant