Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion limit exceeded #850

Closed
jengelh opened this issue Dec 13, 2019 · 5 comments
Closed

Recursion limit exceeded #850

jengelh opened this issue Dec 13, 2019 · 5 comments

Comments

@jengelh
Copy link

jengelh commented Dec 13, 2019

libtidy is missing an API to set a recursion limit. For a bit of nesting of HTML tags, one can crash certain programs that make use of libtidy.

$ cat tr.c
#include <tidybuffio.h>
int main()
{
        TidyDoc tdoc = tidyCreate();
        tidyOptSetBool(tdoc, TidyHideComments, yes);
        tidyOptSetBool(tdoc, TidyReplaceColor, yes);
        tidyOptSetBool(tdoc, TidyPreserveEntities, yes);
        tidySetCharEncoding(tdoc, "utf8");
        tidyParseFile(tdoc, "evil.html");
}
$ gcc tr.c `pkg-config tidy --cflags --libs` -Wall -ggdb3
$ (for((i = 0; i < 16384; ++i)); do echo -en "<b>"; done; for ((i = 0; i < 16384; ++i)); do echo -en "</b>"; done) >evil.html
$ ulimit -Ss 2048
$ ./a.out 
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: inserting implicit <body>
...
line 1 column 39031 - Warning: nested emphasis <b>
Segmentation fault (core dumped)

2MB is the typical stack size for glibc-linux threads, and this evil.html is only 114KB in size, meaning it generally does not get held up by size limits of MTAs.

@balthisar
Copy link
Member

We welcome pull requests, but to be honest, this issue is brought up every year or so, and given that it's not a security issue, we would fail at a certain level of nesting due to an API, or would would fail with a segfault. In both cases, the result is failure.

@geoffmcl
Copy link
Contributor

20201130: @jengelh, as expressed by @balthisar, this is a known issue for libtidy, given that the code uses recursion to nest html elements...

FWIIW, in Ubuntu, I was not able to repeat the issue, using your tr.c, and evil.html generation... not sure why... but no problem...

Copying evil.html to windows, and tidy triggers a stack overflow abort dialog, after re-entering ParseInline some 5814 times... so it is certainly repeatable ;=))

Background: Read SF Bug 742, as far back as 2005-12-01, with some ideas on a fix, and #343, #633, and maybe others, here...

As with those earlier issues, marking this as a bug, adding Feature Request, and Won't Fix, and, given that there has been no further feedback for nearly a year, closing this...

But as always, look forward to further feedback, ideas, patches, PR, etc... to address this problem, if possible... thanks...

@MonkeybreadSoftware
Copy link

We have here a html, which seems to cause this for one of our clients. Endless recursion in ParseInline, which crashes some computer, but not others.

Workaround: raise stack allocation for thread and hope it finds an end.

@beatrixwillius
Copy link

The client would be me. I have some screwed up html which I treated with Tidy. Increasing the stack allocated for the thread fixed the crash.

@balthisar
Copy link
Member

5.9.9 fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants