Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.6.0 and breakage with php-tidy #780

Closed
thelounge-zz opened this issue Nov 27, 2018 · 22 comments
Closed

5.6.0 and breakage with php-tidy #780

thelounge-zz opened this issue Nov 27, 2018 · 22 comments

Comments

@thelounge-zz
Copy link

can someone take a look at this?

rebuild the old 5.4 src.rpm from Fedora 27 don't scale in a long term, ignoring the security issues completly

https://bugzilla.redhat.com/show_bug.cgi?id=1537853
https://bugzilla.redhat.com/show_bug.cgi?id=1609520

@thelounge-zz
Copy link
Author

@geoffmcl
Copy link
Contributor

geoffmcl commented Nov 28, 2018

@thelounge-zz thank you for raising the issue here...

It looks like the bug 1537853 may be related to a wrap => 0 problem in 5.6... see #673, #705, ..., fixed by 5.7.9, but not in any tidy release yet...

I will try to re-setup a windows testing, to be able to repeat this test... maybe the wrap is not the problem... further tests, and feedback by others would be most appreciated... especially those who can test with next... thanks...

Changing some html text 4 to a decimal 3.33333 as shown in 1471305 does not seem like an action of tidy... attributes are maintained as strings in libtidy...

@thelounge-zz
Copy link
Author

Changing some html text 4 to a decimal 3.33333 is an action of tidy - there is nothing else dealing in that autotest with html content and downgrade libtidy fixes this issue

@geoffmcl
Copy link
Contributor

@thelounge-zz as stated, can not see how tidy could input 4, and output 3.33333... does not handle text that way... but the impossible can happen ;=))

@thelounge-zz
Copy link
Author

thelounge-zz commented Nov 28, 2018

this smells like some float conversion deep insinde the code, fact is that i wrote PHP code, stored the output as base64 and the autotest is running the same code and making a diff - libtidy version makes the difference

@geoffmcl
Copy link
Contributor

@thelounge-zz I 100% agree this smells like some float conversion deep insinde the code, but which code? ... I try to understand...

IIRC, tidy stores a node input of <a href="/autotest.php?sid=0&amp;yi_id=1&amp;yi_page=4"> as a string... how can float conversion be the problem here? ... will try to test, debug, ... more...

There is no conversion possible... I think... but you have a diff, so just try to understand... and agree with -

cms-autotest diff
...
see attached diff - i won't care that much about whitespaces and rewrite test-hashes but 
detsroy a simple url-param 4 by spit out 3.3333333 in teh cleand HTML is unacceptable

So also seek a SOLUTION...

You store the tidy output as base64... sounds like some encode/decode, internal to PHP, and should be no problem... but...

Did this just show up in release 5.6... or did it exist before... does it still exist in next...

Lots of questions here, which I shall try to explore...

Further feedback really appreciated... thanks...

@thelounge-zz
Copy link
Author

i will try to extract the issue with 3.33333 in a single php file containing all functions and the original strings in base64 and when that is possible provide 2 rpms for Fedora 28 (the offical package and my sandybridge optimized rebuild of the F27 package)

encode/decode can't be a problem since it's pure latin1 HTML code

BTW: "i won't care that much about whitespaces and rewrite test-hashes" is only half the truth, in fact i care because tidy cleanup is an essential part of anything dealing with WYSIWG input and so a ton of tests are failing and need to be reviewed leading to makeing the testsuite depending on a specific library version and from that point on you can no longer sync the whole stuff bewteen different machines with different tidy/php/os versions which defates the purpose

@geoffmcl
Copy link
Contributor

@thelounge-zz thanks for trying to help look into this... do not exactly understand your provide 2 rpms... 28, 27... but...

I explored using a debug version of tidy on a html snippet <a href="/autotest.php?sid=0&amp;yi_id=1&amp;yi_page=4">autotest</a>... with a config -w 0 --show-body-only yes --show-info no and get the partial log output as -

All nodes BEFORE clean and repair
Root  
 StartTag html implicit 
  StartTag head implicit 
   StartTag title implicit 
  StartTag body implicit 
   StartTag a   href="/autotest.php?sid=0&yi_id=1&yi_page=4"
    Text   (8) 'autotest'
All nodes AFTER clean and repair
Root  
 DocType   PUBLIC
 StartTag html implicit 
  StartTag head implicit 
   StartTag title implicit 
  StartTag body implicit 
   StartTag a   href="/autotest.php?sid=0&yi_id=1&yi_page=4"
    Text   (8) 'autotest'
<a href="/autotest.php?sid=0&amp;yi_id=1&amp;yi_page=4">autotest</a>

Just can not see how 4 could morph to 3.33333... tag text is kept as a string... so look forward to further information on this...

I agree that certain types of tests, involving a known input, and an expected output, fixed config, and a diff, may need to be changed according to the library version - hopefully the library improves - and this is work... and our own regression tests are no exception... every effort to keep this stable is taken, but some changes take place... whitespaces, or more... but maybe I also do not quite understand your BTW: comment...

If you have specific examples, then for sure open an issue, and we can explore... thanks...

@thelounge-zz
Copy link
Author

do not exactly understand your provide 2 rpms... 28, 27

libtidy-5.4.0-4.fc28.20181003.rh.x86_64 is my rebuild of the Fedora 27 src.rpm which don't have the issue and so you can downgrade / upgrade and seee that only changing libtidy makes the difference

are you doing your tests with tidy or from PHP because the Fedora maintainer also said that he is unable to reproduce this and din't bother to just run php.tidy linked against the newer version

oriiginally i was at thursday about to extract a isolated reproducer and then daily buisness robbed my time followed by yesterday dealing the whole day with network infractructure issues - i hope people left and right leave me in piece next week at least on tuesday so that i can provide the needed infos and reproducers here :-(

@thelounge-zz
Copy link
Author

Hi

daywork or not, because i miss here the option for attachments please look at https://bugzilla.redhat.com/show_bug.cgi?id=1537853#c22 where you can download a ZIP with a simple "test.php" and the expected output and what you get with tidy 5.6

you simply can't write any useful tests with such huge differences because it's even not possible or at least error prone manually review if it's just a format change or the "meaning" of the resulting HTML has changed

sadly i can't reproduce the 3.3333333 currently in a standalone script and maybe that's a result of the changed output and other processing of it within the php application

@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 4, 2018

@thelounge-zz thank you for the test.php... but when I try to run it in my ..\apache\htdocs I get -

Warning: file_put_contents(/tmp/tidy-test-expected.htm): failed to open stream: No such file or directory in C:\OSGeo4W\apache\htdocs\test.php on line 44
Warning: file_put_contents(/tmp/tidy-test-current.htm): failed to open stream: No such file or directory in C:\OSGeo4W\apache\htdocs\test.php on line 45
Warning: unlink(/tmp/tidy-test-expected.htm): No such file or directory in C:\OSGeo4W\apache\htdocs\test.php on line 47
Warning: unlink(/tmp/tidy-test-current.htm): No such file or directory in C:\OSGeo4W\apache\htdocs\test.php on line 48

I would guess this is because directory <current_disk:>/tmp does not exist... certainly do not have any C:\tmp in that Windows system...

In brief, my <?php phpinfo(); ?> gives -

PHP Version 7.1.14
System	Windows NT WIN7-PC 10.0 build 17134 (Windows 10) AMD64
Build Date	Jan 31 2018 00:10:01
Compiler	MSVC14 (Visual C++ 2015)
Architecture	x64
...
tidy
Tidy support	enabled
libTidy Version	5.4.0
libTidy Release	2017/03/01
Extension Version	7.1.14 ($Id: c6d6b80d55c5d351deaa745c3c540971e5583c24 $)
...

So I still have test setup problems in Windows, to be able to switch and test various libTidy versions... will try to work on that... but...

But this seems to be two (2) problems, probably unrelated...

Problem 1 - 5.6 has a wrap problem - regression - fixed in 5.7.9, onwards

I think I can see that problem in comparing libtidy-5.4.htm and libtidy-5.6.htm... in 5.6 all lines are wrapped, massively...

But that is not particularly clear in reading libtidy-54-56.diff... but thanks for including it...

This wrapping seems to be part of issue #673, mentioned in others, solved in PR #705, where I commented that maybe this is a problem with wrap => 0, now repeated here -

This seems related to #780 - looks like a prob. with wrap => 0, which seemed broken in 5.6, fixed in 5.7.9, and so is in current next...
If someone had the time to cherry pick this fix back into the 5.6, and issue a 5.6.1 release, that would be great...
But given that we have some difficulty in making a new release 5.8, or 6.0, given #743, that does not seem likely to happen...
I don't know how to fix that... see stalled #741...

Need someone to run test.php, using next, 5.7.17 plus, to at least prove this point...

That would demonstrate that this can be solved in 5.6 by commit 67eaeb6, then maybe the discussion on 5.6.1, 5.8, 6.0 releases can continue...

Problem 2 - tidy morphs attribute text 4 to 3.33333

Certainly presently can only agree with maybe ... changed output ... within the php application as the only explanation... especially since you advise it can not be reproduced easily, for testing...

Look forward to further feedback on this issue... we want libTidy to be compatible with php-tidy... thanks...

@thelounge-zz
Copy link
Author

belive it or not but depedning on the tidy-package installed the completly to tidy unrellated code comes up with the mysterious 3,3333333333333

libtidy-5.4.0-4.fc28.20181003.rh.x86_64.rpm
libtidy-5.6.0-2.fc28.x86_64.rpm

[harry@srv-rhsoft:]$ cms-autotest.sh youtube | grep 3333
[pages_summary] => 3,3333333333333
[harry@srv-rhsoft:
]$ cms-autotest.sh youtube | grep 3333
[harry@srv-rhsoft:~]$

$result2 = mysqli_query($this->db->conn, 'select SQL_NO_CACHE found_rows()');
$row2 = mysqli_fetch_row($result2);
$count_summary = $row2[0];
$pages_summary = (string)($count_summary / $this->per_page);
if($pages_summary > (int)$pages_summary)
{
$pages_summary = (int)$pages_summary + 1;
}
return
[
'count_summary' => $count_summary,
'count_filtered' => $count_filtered,
'pages_summary' => $pages_summary,
'data' => $arr
];

@thelounge-zz
Copy link
Author

thelounge-zz commented Dec 10, 2018

libtidy 5.6 is touching LOCALE somewhere without a proper reset - period
look at the comma instad dot after typecasting a float value

libtidy-5.6.0-2.fc28.x86_64.rpm
[harry@srv-rhsoft:~]$ php /downloads/tidy-debug.php
PAGES-SUMMARY: 3,3333333333333
PAGES-SUMMARY: 3,3333333333333
CORRUPTION!

libtidy-5.4.0-4.fc28.20181003.rh.x86_64.rpm
[harry@srv-rhsoft:~]$ php /downloads/tidy-debug.php
PAGES-SUMMARY: 3.3333333333333
PAGES-SUMMARY: 4
OK

/** comment out this line and everything is fine with libtidy-5.6.0-2.fc28.x86_64 too */
$tidy = tidy_parse_string('bla', [], 'latin1');

/** that code is completly unrelated to tidy and must not change it's behavior */
$conn = mysqli_init();
mysqli_real_connect($conn, $host, $user, $pwd, $db);
$result = mysqli_query($conn, "select SQL_CALC_FOUND_ROWS * from cl_autotest_youtube_items where yi_cid='1' and yi_aktiv='1' order by yi_sort asc limit 0, 3");
$count_summary = mysqli_fetch_row(mysqli_query($conn, 'select SQL_NO_CACHE found_rows()'))[0];
$pages_summary = (string)($count_summary / 3);
echo "PAGES-SUMMARY: $pages_summary\n";
if($pages_summary > (int)$pages_summary)
{
$pages_summary = (int)$pages_summary + 1;
}
echo "PAGES-SUMMARY: $pages_summary\n";
if((int)$pages_summary !== 4)
{
echo "CORRUPTION!\n";
}
else
{
echo "OK\n";
}

@thelounge-zz
Copy link
Author

thelounge-zz commented Dec 10, 2018

`<?php declare(strict_types=1);
$host = 'localhost';
$user = 'autotest';
$pwd = '****';
$db = 'autotest';

/** comment out this line and everything is fine with libtidy-5.6.0-2.fc28.x86_64 too */
$tidy = tidy_parse_string('bla', [], 'latin1');

/** that code is completly unrelated to tidy and must not change it's behavior */
$conn = mysqli_init();
mysqli_real_connect($conn, $host, $user, $pwd, $db);
mysqli_query($conn, "DROP TABLE IF EXISTS tidy_test;");
mysqli_query($conn, "CREATE TABLE IF NOT EXISTS tidy_test (tidy_id mediumint(7) UNSIGNED NOT NULL AUTO_INCREMENT, tidy_test char(2), PRIMARY KEY (tidy_id));");
for($i=0; $i<=10; $i++)
{
mysqli_query($conn, "insert into tidy_test (tidy_test) values ('$i');");
}
$result = mysqli_query($conn, "select SQL_CALC_FOUND_ROWS * from tidy_test");
$count_summary = mysqli_fetch_row(mysqli_query($conn, 'select SQL_NO_CACHE found_rows()'))[0];
$pages_summary = (string)($count_summary / 3);
echo "PAGES-SUMMARY: $pages_summary\n";
if($pages_summary > (int)$pages_summary)
{
$pages_summary = (int)$pages_summary + 1;
}
echo "PAGES-SUMMARY: $pages_summary\n";
if((int)$pages_summary !== 4)
{
echo "CORRUPTION!\n";
}
else
{
echo "OK\n";
}
mysqli_query($conn, "DROP TABLE IF EXISTS tidy_test;");
?>`

@geoffmcl
Copy link
Contributor

libtidy 5.6 is touching LOCALE somewhere without a proper reset - period
look at the comma instad dot after typecasting a float value

@thelounge-zz yes, I too had noted the comma, and wondered... you may have found it - see #770... and maybe #783, #785, ...

Agree the lib should not do setlocale(LC_ALL,""), at very least not without a proper reset... but think it can be avoided probably... ongoing...

@thelounge-zz
Copy link
Author

PHP upstream since to have some ideas but the root cause is libtidy
https://bugs.php.net/bug.php?id=77278

would you mind to do some regular release because 5.6 is unuseable crap given all the wrapping and other issues which in combination making it even impossible to look at test output and make a clear "only formatting changes with the same technical meaning" and "god knows what the result is" decision

@thelounge-zz
Copy link
Author

setlocale( LC_ALL, "") changes the locale for the entire application

hint: it is ALWAYS application wide
the wwarning is completly unrelated to PHP at all

http://php.net/manual/en/function.setlocale.php

Warning
The locale information is maintained per process, not per thread. If you are running PHP on a multithreaded server API like IIS, HHVM or Apache on Windows, you may experience sudden changes in locale settings while a script is running, though the script itself never called setlocale(). This happens due to other scripts running in different threads of the same process at the same time, changing the process-wide locale using setlocale().

@thelounge-zz
Copy link
Author

guys are there releases planned somehow in the future given that 5.4 has well known security isssues, 5.6 is unuseable and http://www.html-tidy.org/ is one of the worst websites i discovered over years and https://github.com/htacg/tidy-html5/ also don't show any release tag after 5.6

@thelounge-zz
Copy link
Author

ok, can someone explain me where the fedora maintainer grabbed according to https://bugzilla.redhat.com/show_bug.cgi?id=1728023 the tarball for 5.7.28 so that i can boomark it like all the other stuff i decided to package myself?

@geoffmcl
Copy link
Contributor

@thelounge-zz 5.7.28 tarball is available from https://github.com/htacg/tidy-html5/releases/tag/5.7.28

@thelounge-zz
Copy link
Author

FWIW:

5.7.28 solves all the issues but you guys need to learn about so-names because from our cms autotests one case still did procude completly different HTML missing whole parts

re-compile PHP against libtidy-devel 5.7.28 solved that too

with proper versioning rpm would have refused to install the tidy library compiled against the old API and so indicated which other software needs a rebuild and missing that is not funny for binary distributions shipping subtle broken updates to their users

@thelounge-zz
Copy link
Author

frankly can someone fix https://github.com/htacg/tidy-html5/releases/ where 5.7.28 is still just a tiny line on top and 5.6.0 featured as it would be a useful release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants