Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output when parsing HTML #790

longcdf opened this issue Dec 31, 2018 · 3 comments

Different output when parsing HTML #790

longcdf opened this issue Dec 31, 2018 · 3 comments


Copy link

longcdf commented Dec 31, 2018

I'm facing some strange behavior when trying to parse a HTML file using LibTidy.
I have a html file contains something likes this:

<li class="ikonpunkt span8"><a href="" class="menuitem"><img alt="Ledige stillinger hvit" src=""/><h3>Ledige stillinger</h3></a></li>

I'm using LibTidy to parse this file.
And it randomly generate 2 outputs:
1 -

<li class="ikonpunkt span8"><a href="" class="menuitem"><img alt="Kontakt oss hvit" src="">
<h3>Kontakt oss</h3>


<li class="ikonpunkt span8"><a href="" class="menuitem"><img alt="Kontakt oss hvit" src=""></a>
<h3><a href="" class="menuitem">Kontakt oss</a></h3>

It happens very randomly.
Could you please help me explain it and suggest a way to overcome this issue?

Copy link

@longcdf thank you for the issue... but at this time do not understand...

Using your input, and a config of -w 0, on current 5.7.17, I repeat the output of 1., given an obvious different name <h3>Ledige stillinger</h3>... but that error aside...

What version of libTidy? Hopefully next source...

How are you using libTidy? In what container, app, lib, whatever... src...

A random happening can only be explained as due to configuration, at the moment libTidy runs... care with multiple threads...

A library, like libtidy.a, has to produce the same output, given a config, and an input, a gzillion times over... forever... no random change... not possible... code paths can not change, without change...

So, at this moment, I really do not see how libTidy could very randomly output 2...

Even the sense is changed - the <h3> header is also a link to the URL... another anchor <a ...>name</a> added... the input must have changed...

At this time do not understand the problem... more information needed... thanks...

Copy link

longcdf commented Jan 1, 2019

Hi @geoffmcl , thank you for your response.
I'm using tidys.lib 5.6.0 downloaded from
This issue happens when I'm using multi-threading.
I'm having about 20 threads, each thread will parse a different html file.
I tested by using tidyParseFile followed by a tidySaveFile to make sure I didn't touch the tinyDoc.

Original file:
Correct output file:
Randomly wrong output file:

One more note is it's not always failed exactly like the second output file. Maybe less or more differences but always has problem at the <h3> tag.

So I wonder maybe the multithreading cause the issue? Cause when I'm using single thread it is fine.


Copy link

Looks like this has been address. Please feel to reopen if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

3 participants