-
Notifications
You must be signed in to change notification settings - Fork 19.3k
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #30450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Summary of ChangesHello @NeatGuyCoding, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the API's security by refactoring and centralizing the handling of SQL Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a centralized utility function, escape_like_pattern, to consistently and securely handle special characters (%, _, \) in SQL LIKE patterns. This function is applied across various parts of the API where user-provided keywords are used in database queries, mitigating potential SQL injection vulnerabilities through LIKE wildcards. The changes are well-implemented, replacing previous ad-hoc escaping logic with the new standardized function and correctly using the ESCAPE clause in SQLAlchemy queries. The addition of comprehensive unit and integration tests validates the new escaping logic and ensures its correctness across different scenarios. The overall change significantly improves the security and maintainability of the codebase.
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_keyword = escape_like_pattern(keyword) | ||
| query = query.where(sa.and_(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of sa.and_ is redundant here as there is only a single condition. You can simplify the code by removing it for better readability.
| query = query.where(sa.and_(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\"))) | |
| query = query.where(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors SQL LIKE pattern escaping logic by introducing a centralized utility function escape_like_pattern in libs/helper.py. The utility properly escapes special SQL LIKE wildcard characters (%, _, \) to prevent SQL injection vulnerabilities when user input is used in LIKE queries. The refactoring replaces inconsistent inline escaping logic across multiple services, controllers, and data access layers with a single, well-tested implementation.
Key changes:
- Added
escape_like_pattern()utility function with comprehensive unit tests - Updated all SQL LIKE queries across services to use the centralized escaping function with proper ESCAPE clause
- Added integration tests to verify special character handling prevents wildcard exploitation
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| api/libs/helper.py | Introduces the centralized escape_like_pattern() utility function with documentation |
| api/tests/unit_tests/libs/test_helper.py | Adds comprehensive unit tests for the escape utility function |
| api/services/workflow_app_service.py | Updates workflow app log search to use centralized escaping, replacing previous unicode_escape approach |
| api/services/tag_service.py | Updates tag name search to use centralized escaping with ESCAPE clause |
| api/services/external_knowledge_service.py | Updates external knowledge API search to use centralized escaping |
| api/services/dataset_service.py | Updates dataset name, child chunk content, and segment content searches to use centralized escaping |
| api/services/conversation_service.py | Replaces inline escaping with centralized utility for conversation variable search |
| api/services/app_service.py | Updates app name search to use centralized escaping |
| api/services/annotation_service.py | Updates annotation question/content search to use centralized escaping |
| api/core/rag/retrieval/dataset_retrieval.py | Updates metadata filter LIKE operations to use centralized escaping |
| api/core/rag/datasource/vdb/iris/iris_vector.py | Updates IRIS full-text LIKE search fallback to use centralized escaping |
| api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py | Updates ClickZetta LIKE search to use centralized escaping |
| api/controllers/console/datasets/datasets_segments.py | Updates segment content and keywords search to use centralized escaping |
| api/controllers/console/app/conversation.py | Updates both completion and chat conversation searches to use centralized escaping |
| api/tests/test_containers_integration_tests/services/test_workflow_app_service.py | Adds integration tests for special character handling in workflow logs |
| api/tests/test_containers_integration_tests/services/test_tag_service.py | Adds integration tests for special character handling in tag search |
| api/tests/test_containers_integration_tests/services/test_app_service.py | Adds integration tests for special character handling in app search |
| api/tests/test_containers_integration_tests/services/test_annotation_service.py | Adds integration tests for special character handling in annotation search |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Escape special characters in a string for safe use in SQL LIKE patterns. | ||
| This function escapes the special characters used in SQL LIKE patterns: | ||
| - Backslash (\\) -> \\ |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation comment shows incorrect escaping representation. In the docstring:
- Line 40 says "Backslash (\) -> \" but should say "Backslash (\) -> \\"
- Line 41 says "Percent (%) -> \%" which is correct
- Line 42 says "Underscore () -> \" which is correct
The actual implementation and examples are correct, but the description in lines 40-42 should clearly show that a single backslash becomes two backslashes (four in the Python string representation).
| - Backslash (\\) -> \\ | |
| - Backslash (\\) -> \\\\ |
| # Test 4: Search with % should NOT match 100% (verifies escaping works) | ||
| workflow_run.inputs = json.dumps({"search_term": "100% different", "input2": "other_value"}) | ||
| workflow_run.outputs = json.dumps({"result": "100% different result", "status": "success"}) | ||
| db.session.commit() | ||
|
|
||
| result = service.get_paginate_workflow_app_logs( | ||
| session=db_session_with_containers, app_model=app, keyword="50%", page=1, limit=20 | ||
| ) | ||
| # Should not find the 100% entry when searching for 50% | ||
| assert result["total"] == 0 | ||
|
|
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has a logic issue. It reuses the same workflow_run object for all four test cases, modifying its inputs and outputs each time. This means that when Test 4 runs, the workflow_run contains "100% different" data, not "50% discount" data. The test assertion on line 374 expects total == 0, but this is only correct because the data was overwritten, not because the escaping is working correctly.
To properly test the escaping functionality, each test case should create separate workflow runs with distinct data, or the test should verify that searching for "50%" finds exactly the record containing "50% discount" and not the one containing "100% different".
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def escape_like_pattern(pattern: str) -> str: |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation should handle the case where the pattern is None. Currently the function signature shows def escape_like_pattern(pattern: str) -> str:, but the function can accept and return None (as shown in the test case and the if not pattern check). The type annotation should be def escape_like_pattern(pattern: str | None) -> str | None: to accurately reflect the function's behavior.
| def escape_like_pattern(pattern: str) -> str: | |
| def escape_like_pattern(pattern: str | None) -> str | None: |
| - Search treats special characters as literal characters, not wildcards | ||
| - SQL injection via LIKE wildcards is prevented | ||
| """ | ||
| fake = Faker() |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable fake is not used.
| fake = Faker() |
| app_service = AppService() | ||
|
|
||
| # Create apps with special characters in names | ||
| app_with_percent = app_service.create_app( |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable app_with_percent is not used.
| app_with_percent = app_service.create_app( | |
| app_service.create_app( |
| account, | ||
| ) | ||
|
|
||
| app_with_underscore = app_service.create_app( |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable app_with_underscore is not used.
| account, | ||
| ) | ||
|
|
||
| app_with_backslash = app_service.create_app( |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable app_with_backslash is not used.
| app_with_backslash = app_service.create_app( | |
| app_service.create_app( |
| ) | ||
|
|
||
| # Create app that should NOT match | ||
| app_no_match = app_service.create_app( |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable app_no_match is not used.
| - SQL injection via LIKE wildcards is prevented | ||
| """ | ||
| # Arrange: Create test data | ||
| fake = Faker() |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable fake is not used.
| fake = Faker() |
| - SQL injection via LIKE wildcards is prevented | ||
| """ | ||
| # Arrange: Create test data | ||
| fake = Faker() |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable fake is not used.
| fake = Faker() |
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Important
Fixes #<issue number>.Summary
Fix #30449
Screenshots
Checklist
dev/reformat(backend) andcd web && npx lint-staged(frontend) to appease the lint gods