Skip to content

Conversation

@ea-rus
Copy link
Collaborator

@ea-rus ea-rus commented Oct 15, 2025

Quoting rules for string content of the parser:

  • ' => '' (in single quoted strings)
  • " => "" (in double quoted strings)

But standart render used json.dump. It added more qouted symbols that were parsed incorrectly, for example:

  • "" rendered into "\" and parsed into "\" (because at the moment there is no rule to replace \)

@ea-rus ea-rus requested a review from StpMax October 15, 2025 13:38
@github-actions
Copy link

github-actions bot commented Oct 15, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
mindsdb_sql_parser
   __about__.py10100%1–10
   __init__.py1192381%44, 48, 53, 98, 115, 118, 139–158, 165–166
   lexer.py2832193%373, 375, 377, 389, 391, 393, 399–417
   logger.py19479%14, 17, 23, 26
   parser.py11143097%129, 133, 301, 326, 432, 619, 636, 660–661, 882, 936, 1013, 1114, 1168, 1178, 1247, 1258, 1341, 1417, 1456, 1492, 1688–1689, 1864–1865, 2041, 2049, 2102–2105
   utils.py791186%73–79, 116, 132, 134, 136, 142–145
mindsdb_sql_parser/ast
   base.py36586%13, 28, 31, 46, 51
   create.py82989%7–8, 23–31, 95
   drop.py52296%10, 13
   insert.py63494%39–41, 46
   show.py48198%18
   update.py53591%40–42, 75–76
mindsdb_sql_parser/ast/mindsdb
   knowledge_base.py118397%79, 118, 128
mindsdb_sql_parser/ast/select
   case.py37197%22
   constant.py36197%23
   data.py11464%10–12, 15, 19
   identifier.py831187%56, 104–112, 122
   native_query.py13192%25
   operation.py139497%57, 66, 178, 202
   parameter.py15287%17, 20
   select.py108397%160–165
   star.py12283%8–9
TOTAL345515795% 

Tests Skipped Failures Errors Time
313 0 💤 0 ❌ 0 🔥 13.453s ⏱️

if obj is None:
return "null"

if isinstance(obj, bool):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: The dump_json() function lacks circular reference detection, making it vulnerable to infinite recursion attacks that could crash the application.

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if isinstance(obj, bool):
def dump_json(obj, _seen=None, _depth=0, max_depth=100) -> str:
'''
Secure version of dump_json with circular reference detection and recursion limits.
Serializes Python objects to JSON with single quotes for strings.
'''
# Initialize seen set for first call
if _seen is None:
_seen = set()
# Check recursion depth
if _depth > max_depth:
raise RecursionError(f"Maximum recursion depth exceeded ({max_depth})")
# Handle None
if obj is None:
return 'null'
# Handle basic types
if isinstance(obj, (int, float, bool)):
return str(obj).lower()
# Handle strings
if isinstance(obj, str):
# Escape single quotes and backslashes
escaped = obj.replace("\\", "\\\\").replace("'", "\\'")
return f"'{escaped}'"
# Handle lists
if isinstance(obj, (list, tuple)):
items = []
for item in obj:
items.append(dump_json(item, _seen.copy(), _depth + 1, max_depth))
return f"[{', '.join(items)}]"
# Handle dictionaries
if isinstance(obj, dict):
# Check for circular references
obj_id = id(obj)
if obj_id in _seen:
raise ValueError("Circular reference detected in object")
_seen.add(obj_id)
items = []
for key, value in obj.items():
key_str = dump_json(key, _seen.copy(), _depth + 1, max_depth)
value_str = dump_json(value, _seen.copy(), _depth + 1, max_depth)
items.append(f"{key_str}: {value_str}")
return f"{{{', '.join(items)}}}"
# Handle other objects by converting to string
try:
# Limit string size to prevent DoS
obj_str = str(obj)
if len(obj_str) > 10000: # Reasonable limit
obj_str = obj_str[:10000] + "...(truncated)"
return f"'{obj_str}'"
except Exception as e:
return f"'<Object representation error: {str(e)}>'"

if obj is None:
return "null"

if isinstance(obj, bool):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: No limits on output size or input validation for malicious objects, allowing memory exhaustion attacks through objects with malicious str methods.

return content


def unquote(s, is_double_quoted=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: The new unquote() function processes escape sequences differently than the old inline implementation, potentially breaking existing code with backslash-containing strings.

if obj is None:
return "null"

if isinstance(obj, bool):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Inconsistent docstring format in dump_json() function

if obj is None:
return "null"

if isinstance(obj, bool):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: String concatenation in dump_json() could use list joining for better performance

return content


def unquote(s, is_double_quoted=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Multiple string replacements in unquote() could be optimized

@ea-rus ea-rus merged commit eeb5613 into main Oct 31, 2025
14 checks passed
@ea-rus ea-rus deleted the fix-json-render branch October 31, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants