Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: IOException: Mark invalid after upgrade to 1.12.2 #1324

Closed
surli opened this issue Feb 14, 2020 · 10 comments
Closed

regression: IOException: Mark invalid after upgrade to 1.12.2 #1324

surli opened this issue Feb 14, 2020 · 10 comments
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@surli
Copy link

surli commented Feb 14, 2020

Hi,

we are using jsoup inside XWiki to perform some page validation as part of our integrations tests and we noticed a regression after the upgrade from jsoup 1.12.1 to 1.12.2. When accessing a particular page, we obtained the following:

Exception in thread "main" java.io.IOException: Mark invalid
	at org.jsoup.parser.CharacterReader.rewindToMark(CharacterReader.java:148)
	at org.jsoup.parser.Tokeniser.consumeCharacterReference(Tokeniser.java:192)
	at org.jsoup.parser.TokeniserState$38.read(TokeniserState.java:759)
	at org.jsoup.parser.Tokeniser.read(Tokeniser.java:59)
	at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:55)
	at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:47)
	at org.jsoup.parser.Parser.parseInput(Parser.java:35)
	at org.jsoup.helper.DataUtil.parseInputStream(DataUtil.java:170)
	at org.jsoup.helper.DataUtil.load(DataUtil.java:66)
	at org.jsoup.Jsoup.parse(Jsoup.java:118)
	at Main.main(Main.java:47)

I've been able to isolate and reproduce in a "small" setup, but it still needs an XWiki instance.
Here's the small Java code I'm using for reproduction:

GetMethod method = new GetMethod("http://127.0.0.1:8080/xwiki/bin/admin/XWiki/XWikiPreferences?editor=globaladmin&section=Groups");
method.setFollowRedirects(true);
method.getParams().setSoTimeout(30000);
method.setDoAuthentication(true);
method.addRequestHeader("Authorization","Basic " + new String(Base64.encodeBase64("Admin:admin".getBytes())));
new HttpClient().executeMethod(method);
byte[] response = method.getResponseBody();
method.releaseConnection();

Jsoup.parse(new ByteArrayInputStream(response), null, "");

Before running this, you need to download an instance of XWiki 12.0 (http://maven.xwiki.org/releases/org/xwiki/platform/xwiki-platform-distribution-flavor-jetty-hsqldb/12.0/xwiki-platform-distribution-flavor-jetty-hsqldb-12.0.zip), unzip it, start it using the bash script (./start_xwiki.sh) and open a browser with http://localhost:8080 to be sure everything's loaded.

I checked with 1.12.1 and I confirm I didn't reproduce with it.

surli added a commit to xwiki/xwiki-platform that referenced this issue Feb 14, 2020
This reverts commit 3ebfb99.

Reverting because of regression on jsoup, see: jhy/jsoup#1324
@jhy
Copy link
Owner

jhy commented Feb 14, 2020

Hi,

Thanks for the replication detail. Does this replicate if you save the response byte[] to a file and then open that? (Which I expect it should, as that's all jsoup is getting input from.) If so, can you attach that as a file. It will simplify the repro.

@surli
Copy link
Author

surli commented Feb 14, 2020

Does this replicate if you save the response byte[] to a file and then open that?

Indeed, here's the test file which reproduces the failure: test.html.txt

I now reproduce with:

Jsoup.parse(new FileInputStream("/tmp/test.html.txt"), null, "");

@jhy jhy closed this as completed in 62c0595 Feb 15, 2020
@jhy jhy added bug Confirmed bug that we should fix fixed labels Feb 15, 2020
@jhy jhy added this to the 1.13.1 milestone Feb 15, 2020
@jhy
Copy link
Owner

jhy commented Feb 15, 2020

Thanks for that file. I have replicated and fixed, and it will be in 1.13.1. The issue was that during bufferUp, if a mark existed it would get cleared, and then it couldn't be rewound. This instance happened to require a rewind, so it would blow up. The fix is just to keep the mark when shuffling in new data.

You can build from HEAD to verify.

Also, out of interest, is there a reason to not use Jsoup.connect(url).get() for this? The benefit of using that is that the whole HTTP response does not need to be buffered into memory, and parsing can occur while the network stream in coming in. Happy to take feedback on anything missing in Connect.

@surli
Copy link
Author

surli commented Feb 16, 2020

You can build from HEAD to verify.

Thanks, I'll check that.

is there a reason to not use Jsoup.connect(url).get() for this?

Well I did not write our validation tests where we use jsoup so I'd have to check properly but I can already imagine 2 reasons:

  1. we just don't only use jsoup for the validation, so we have a common method to return the input stream of a page that we use for other validations too
  2. we use HttpClient at several places in our code so we just reused it there again

@Nikos410
Copy link

Hi @jhy,

can you give an ETA on when a new release containing this fix will be published? If finishing version 1.13.1 is still going to take a while, we would really appreciate a hotfix release ❤️

@rdonuk
Copy link

rdonuk commented Feb 25, 2020

I faced the same issue with 1.12.1, are there any workaround or a hotfix planned? It would be perfect 🙏

@jhy
Copy link
Owner

jhy commented Feb 25, 2020

@Nikos410 @rdonuk you can help the release by testing and validating the fix. Can you confirm that the current HEAD build fixes your issue?

@surli
Copy link
Author

surli commented Feb 26, 2020

You can build from HEAD to verify.

Thanks, I'll check that.

Looks like I forgot to answer :) I confirm it's working well on my side (just checked again to be sure)

@Nikos410
Copy link

@jhy Yes I can! We've been using a version built on master in our staging environment for about day now and the error has not shown up again since. 👍 I cannot confirm that it is fixed 100% since this has always been an intermittent error, but it usually occured a few times per hour, so this definetly looks like it is fixed.

@jhy
Copy link
Owner

jhy commented Feb 27, 2020

OK that's great, thank you both for confirming. It looks like this is in good shape. I'm planning on spending some this weekend on wrapping things up and getting a release out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

No branches or pull requests

4 participants