Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing can hang on attributes with many references #9

Closed
rossj opened this issue Apr 20, 2020 · 0 comments · Fixed by #10
Closed

Parsing can hang on attributes with many references #9

rossj opened this issue Apr 20, 2020 · 0 comments · Fixed by #10
Labels

Comments

@rossj
Copy link
Contributor

rossj commented Apr 20, 2020

I've come across a few XML documents that seems to stop my program in its tracks. I've traced the issue down to an issue with the decoding of many reference characters in a single attribute value.

For example, take the following example XML with 35 references:

<a b="&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;"></a>

As is, it takes over 9 minutes to parse on my machine. Decreasing to 34 &lt;, it takes 4.8 minutes, while increasing to 36 gives a parse time of 17.5 minutes. My guess is that this is due to a "catastrophic backtracking" RegExp.

Interestingly, self-closing the <a> tag makes things work at a normal / expected speed.

<a b="&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;"/>
rossj added a commit to rossj/parse-xml that referenced this issue Apr 20, 2020
@rossj rossj changed the title Parsing hangs attributes with many references Parsing can hang on attributes with many references Apr 20, 2020
@rgrove rgrove added the bug label Apr 20, 2020
@rgrove rgrove closed this as completed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants