Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM on malformed input #99

Merged
merged 2 commits into from
Jun 12, 2016
Merged

Conversation

shimamoto
Copy link
Contributor

Hi.

I tried in the latest master version since the bug (#35) have been merged.
But the survey found that there are still cases that OOM occur.
Specifically, CDATA sections, comment, DTD.

// broken CDATA
val r = new XMLEventReader(Source.fromString("<broken><![CDATA[A"))

// broken comment
val r = new XMLEventReader(Source.fromString("<!-- broken "))

// broken DTD
val r = new XMLEventReader(Source.fromString(
    """<?xml version="1.0" encoding="utf-8"?>
      |<!DOCTYPE broken [
      |  <!ELE
    """.stripMargin))

All of the above cases occur OOM.

[error] (XMLEventReader) java.lang.OutOfMemoryError: Java heap space

I am in trouble since OOM have a large impact on the production.

This PR is going to avoid the OOM caused by the above-mentioned.
This commit make up for the lack of #35. Specifically,
CDATA sections, comment, DTD.

This commit make up for the lack of scala#35. Specifically,
CDATA sections, comment, DTD.
@biswanaths
Copy link
Contributor

Hey @shimamoto ,

Thank you for taking the time report and submit a PR. Please give me some time to let me go through this issue and also some others to respond back. We are on the verge of a release. If everything works fine, I will merge this PR and do the release.

Thanks again.

@shimamoto
Copy link
Contributor Author

@biswanaths I am grateful for your prompt response. Then, I am looking forward to a good reply from you!

@@ -247,7 +247,7 @@ private[scala] trait MarkupParserCommon extends TokenTests {
while (true) {
if (ch == head && peek(rest))
return handler(positioner(), sb.toString)
else if (ch == SU)
else if (ch == SU || eof)
truncatedError("") // throws TruncatedXMLControl in compiler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like adding the eof here triggers an OOM for any of your tests. I couldn't find an example that triggered the OOM for an XML processing instruction (PI) or a CDATA section -- they're the only ones that use xTakeUntil . I guess it's ok to cargo-cult an eof check everywhere, but it seems you were trying to avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashawley Thank you for the review and comment! It is significant to add this eof. This is necessary in test of CDATA section <broken><![CDATA[A. Without it, the following OOM occur:

Exception in thread "XMLEventReader" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
        at java.lang.StringBuilder.append(StringBuilder.java:202)
        at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:267)
        at scala.xml.parsing.MarkupParserCommon$class.xTakeUntil(MarkupParserCommon.scala:253)
        at scala.xml.pull.XMLEventReader$Parser.xTakeUntil(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.xCharData(MarkupParser.scala:379)
        at scala.xml.pull.XMLEventReader$Parser.xCharData(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:424)
        at scala.xml.pull.XMLEventReader$Parser.content1(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:459)
        at scala.xml.pull.XMLEventReader$Parser.content(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:588)
        at scala.xml.pull.XMLEventReader$Parser.element1(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:433)
        at scala.xml.pull.XMLEventReader$Parser.content1(XMLEventReader.scala:60)
        at scala.xml.parsing.MarkupParser$class.document(MarkupParser.scala:247)
        at scala.xml.pull.XMLEventReader$Parser.document(XMLEventReader.scala:60)
        at scala.xml.pull.XMLEventReader$Parser$$anonfun$run$1.apply(XMLEventReader.scala:96)
        at scala.xml.pull.XMLEventReader$Parser$$anonfun$run$1.apply(XMLEventReader.scala:96)
        at scala.xml.pull.ProducerConsumerIterator$class.interruptibly(XMLEventReader.scala:125)
        at scala.xml.pull.XMLEventReader.interruptibly(XMLEventReader.scala:27)
        at scala.xml.pull.XMLEventReader$Parser.run(XMLEventReader.scala:96)
        at java.lang.Thread.run(Thread.java:745)

It turned out that error checking is rudimentary, however, you are quite right. I fix some tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shimamoto Ok, cool! I thought that was odd. Thanks for fixing up the unit tests!

@ashawley
Copy link
Member

Changes look good to me! Thanks for cleaning up and refactoring the tests while you were at it.

Verify to throw FatalError at a broken CDATA section and comment.
@shimamoto
Copy link
Contributor Author

Fix unit tests of a broken CDATA section and comment since not enough the verification. Without this time fix, I confirmed that OOM occur in these cases.

@biswanaths
Copy link
Contributor

This looks good to me. If there is nothing else I will merge this tomorrow.

@biswanaths biswanaths merged commit 55008bd into scala:master Jun 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants