Skip to content

gh-146169: correctly handle re-entrant parsing calls in Expat handlers#146566

Open
picnixz wants to merge 2 commits intopython:mainfrom
picnixz:fix/expat/parse-recursive-146169
Open

gh-146169: correctly handle re-entrant parsing calls in Expat handlers#146566
picnixz wants to merge 2 commits intopython:mainfrom
picnixz:fix/expat/parse-recursive-146169

Conversation

@picnixz
Copy link
Copy Markdown
Member

@picnixz picnixz commented Mar 28, 2026

@picnixz
Copy link
Copy Markdown
Member Author

picnixz commented Mar 28, 2026

@hartwork I'd like your feedback here to know whether there are other places that I need to consider. I will work on the encoding issue described in the GH issue afterwards unless they are interconnected. TiA!

@picnixz picnixz added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Mar 28, 2026
Copy link
Copy Markdown
Contributor

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hartwork I'd like your feedback here to know whether there are other places that I need to consider. I will work on the encoding issue described in the GH issue afterwards unless they are interconnected. TiA!

@picnixz they seem interconnected, yes. Here's what I found:


def ExternalEntityRefHandler(*args):
subparser.Parse(payload_extstr, True)
return 1 # return an integer to indicate that parsing continues
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to nee fixing:

Suggested change
return 1 # return an integer to indicate that parsing continues
return 1 # return a non-zero integer to indicate that parsing continues

Comment on lines +312 to +316
payload = f"""\
<?xml version="1.0" standalone="no"?>
<!DOCTYPE quotations SYSTEM "quotations.dtd" [{payload_extstr}]>
<root>&ext;</root>
""".encode(encoding)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: Use of dedent could help with readability here:

Suggested change
payload = f"""\
<?xml version="1.0" standalone="no"?>
<!DOCTYPE quotations SYSTEM "quotations.dtd" [{payload_extstr}]>
<root>&ext;</root>
""".encode(encoding)
payload = dedent(f"""\
<?xml version="1.0" standalone="no"?>
<!DOCTYPE quotations SYSTEM "quotations.dtd" [{payload_extstr}]>
<root>&ext;</root>
""").encode(encoding)

Comment on lines +866 to +871
if (self->in_callback) {
PyErr_SetString(PyExc_RuntimeError,
"cannot call Parse() from within a handler");
return NULL;
}

Copy link
Copy Markdown
Contributor

@hartwork hartwork Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ParseFile or function pyexpat_xmlparser_ParseFile_impl need something similar?

PS: Also — other than pyexpat_xmlparser_Parse_impl — it does not seem to call XML_SetEncoding. That asymmetry seems unintended (but maybe I am missing something). Maybe such a call is missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh maybe, I will check. As for XML_SetEncoding, I don't know. I'll have a look at it tomorrow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants