Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError: Element.data() is executed recursively without checks #1864

Closed
biecho opened this issue Dec 1, 2022 · 6 comments
Closed
Assignees
Labels
Milestone

Comments

@biecho
Copy link

biecho commented Dec 1, 2022

Hi there,

The implementation of the Element.data() function listed below is executed recursively in an unsafe manner.

    public String data() {
        StringBuilder sb = StringUtil.borrowBuilder();

        for (Node childNode : childNodes) {
            if (childNode instanceof DataNode) {
                DataNode data = (DataNode) childNode;
                sb.append(data.getWholeData());
            } else if (childNode instanceof Comment) {
                Comment comment = (Comment) childNode;
                sb.append(comment.getData());
            } else if (childNode instanceof Element) {
                Element element = (Element) childNode;
                String elementData = element.data();
                sb.append(elementData);
            } else if (childNode instanceof CDataNode) {
                // this shouldn't really happen because the html parser won't see the cdata as anything special when parsing script.
                // but incase another type gets through.
                CDataNode cDataNode = (CDataNode) childNode;
                sb.append(cDataNode.getWholeText());
            }
        }
        return StringUtil.releaseBuilder(sb);
    }

The recursion is not checked for depth and can lead to a StackOverflowError if there are enough nested children.

I suggest to handle this case similar to Element.text() where a NodeTraversor is used to compute the result iteratively.

Another approach could be something along the lines:

    public String data() {
		StringBuilder sb = StringUtil.borrowBuilder();

		var descendants = new ArrayDeque<>(childNodes);
		while (!descendants.isEmpty()) {
			var descendantNode = descendants.pollFirst();
			if (descendantNode instanceof DataNode) {
				DataNode data = (DataNode) descendantNode;
				sb.append(data.getWholeData());
			}
			else if (descendantNode instanceof Comment) {
				Comment comment = (Comment) descendantNode;
				sb.append(comment.getData());
			}
			else if (descendantNode instanceof Element) {
				Element element = (Element) descendantNode;
				// We must visit the first child on the list next.
				// If this, in turn, has children, his children are visited next, and so on.
				// Therefore, we have to append the child nodes of this parent backward to the front of the queue.
				var childNodes = element.childNodes();
				for (int i = childNodes.size() - 1; i > 0; i--) {
					descendants.addFirst(childNodes.get(i));
				}
			}
			else if (descendantNode instanceof CDataNode) {
				// this shouldn't really happen because the html parser won't see the cdata as anything special when parsing script.
				// but incase another type gets through.
				CDataNode cDataNode = (CDataNode) descendantNode;
				sb.append(cDataNode.getWholeText());
			}
		}
		return StringUtil.releaseBuilder(sb);
@mohite-abhi
Copy link

Hi @biecho, I tested Element.data() with an Element of about 100,000 depth and indeed I got StackOverflowError.

package org.jsoup.examples;
import org.jsoup.nodes.Element;

public class ElementTest {
    public static void main(String[] args) {
        Element element = new Element("elem");
        Element root = element;
        for (int i = 0; i < 100000; i++) {
            Element elem2 = new Element("elem"+i);
            element.appendChild(elem2);
            element = elem2;
            System.out.println(element.tagName());
        }
        System.out.println(root.data());;
    }
}

Exception in thread "main" java.lang.StackOverflowError
at java.base/java.lang.ref.Reference.refersToImpl(Reference.java:383)
at java.base/java.lang.ref.Reference.refersTo(Reference.java:374)
at java.base/java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:508)
at java.base/java.lang.ThreadLocal.get(ThreadLocal.java:187)
at java.base/java.lang.ThreadLocal.get(ThreadLocal.java:169)
at org.jsoup.internal.StringUtil.borrowBuilder(StringUtil.java:353)
at org.jsoup.nodes.Element.data(Element.java:1492)
at org.jsoup.nodes.Element.data(Element.java:1503)
...

I am not sure if this case will occur in real html, but I agree with you that iterative approach would be better.

@jhy
Copy link
Owner

jhy commented Jan 5, 2023

Makes sense, and would be happy to review a PR. Have you encountered this in the wild?

@jhy jhy closed this as completed in 075b0e6 Jan 23, 2023
@jhy jhy added this to the 1.15.4 milestone Jan 23, 2023
@jhy jhy added the fixed label Jan 23, 2023
@jhy jhy self-assigned this Jan 23, 2023
@biecho
Copy link
Author

biecho commented Jan 23, 2023

Thanks for fixing the issue!

Yes, this was encountered in the wild. It might be worth to add a test case just to be on the safe side.

jhy added a commit that referenced this issue Jan 23, 2023
Testcase for #1864 and related -- hasText, data, parents could all overflow.
@jhy
Copy link
Owner

jhy commented Jan 23, 2023

Yep, have added a testcase in 998f429. Also fixed three other potential overflows with e3e2c6b, 75bdf8e, and 3091b66.

jhy added a commit that referenced this issue Jan 23, 2023
Removes chance of overflow in wrap(html) and simplifies.

Related to #1864
@jhy
Copy link
Owner

jhy commented Jan 23, 2023

With the above changes, there are no more recursive calls that could conceivably overflow, that I can find. The two remaining are bound.

In HttpConnection, the execute method recurses to follow redirects, but that's bound already so we don't bounce between redirected URLs forever.

And in SafeList, the recursion in isSafeAttribute is only one level deep.

return !tagName.equals(All) && isSafeAttribute(All, el, attr);

jhy added a commit that referenced this issue Jan 24, 2023
@biecho
Copy link
Author

biecho commented Jan 24, 2023

Awesome! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants