Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release an updated HTML::Tidy perl library #562

Closed
geoffmcl opened this issue May 21, 2017 · 17 comments
Closed

Release an updated HTML::Tidy perl library #562

geoffmcl opened this issue May 21, 2017 · 17 comments
Milestone

Comments

@geoffmcl
Copy link
Contributor

I use Perl a lot, and would like a use HTML::Tidy installable, updated module...

I have opened a discussion here to address this...

Any help appreciated... thanks...

@geoffmcl geoffmcl added this to the 5.5 milestone May 21, 2017
@balthisar
Copy link
Member

Great! Anything we need to do to make Tidy more compatible?

@petdance
Copy link
Contributor

Hi, I maintain HTML::Tidy. I'm very interested in using the new tidy-html5.

@geoffmcl
Copy link
Contributor Author

@petdance I am not too familiar with compiling using Makefile.PL, but gave it a go ;=))

Modified Makefile.PL to suit my Windows install of static tidys.lib, and made some small changes in Tidy.xs - header names, and using tidyLibraryVersion(), replacing tidyVersion()... full diff -

diff --git a/Makefile.PL b/Makefile.PL
index 1bca2ef..a2cb460 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -9,8 +9,8 @@ use ExtUtils::MakeMaker;
 use ExtUtils::Liblist;
 use Config;
 
-my $libs = '-ltidyp';
-my $inc = "-I. -I/usr/include/tidyp -I/usr/local/include/tidyp -I$Config{usrinc}/tidyp";
+my $libs = '-ltidys';
+my $inc = "-I. -IF:/Projects/software.x64/include";
 
 eval { require Alien::Tidyp; };
 
@@ -20,12 +20,12 @@ if ( !$@ ) {
     $inc = Alien::Tidyp->config('INC');
 }
 else {
-    print "Alien::Tidyp not found. Looking for for tidyp on your system.\n";
-    my @vars = ExtUtils::Liblist->ext( '-L/usr/lib -L/usr/local/lib -ltidyp', 0, 1 );
+    print "Alien::Tidyp not found. Looking for for tidys on your system.\n";
+    my @vars = ExtUtils::Liblist->ext( '-LF:/Projects/software.x64/lib -ltidys', 0, 1 );
     $libs = $vars[2];
 
     if ( !$libs ) {
-        $libs = '-ltidyp';
+        $libs = '-ltidys';
         print <<'EOF';
 
 It seems that you don't have tidyp installed.  HTML::Tidy does no
diff --git a/Tidy.xs b/Tidy.xs
index 2238b0b..c726b48 100644
--- a/Tidy.xs
+++ b/Tidy.xs
@@ -2,8 +2,8 @@
 #include "perl.h"
 #include "XSUB.h"
 
-#include <tidyp.h>
-#include <buffio.h>
+#include <tidy.h>
+#include <tidybuffio.h>
 #include <stdio.h>
 #include <errno.h>
 
@@ -196,7 +196,7 @@ _tidyp_version()
     PREINIT:
         const char* version;
     CODE:
-        version = tidyVersion();
+        version = tidyLibraryVersion();
         RETVAL = newSVpv(version,0); /* will be automatically "mortalized" */
     OUTPUT:
         RETVAL

Then in a MSVC14 x64 command prompt, ran perl -f Makefile.PL, and to my surprise it cleanly generated a Makefile, and Tidy.c... Woweee...

And when I ran nmake it nearly got there... a few warnings on the tidy.c compile, but no big problems, but really bombed on the link... some 49 unresolved externals... a few are given below -

tidys.lib(alloc.obj) : error LNK2001: unresolved external symbol __imp___acrt_iob_func
tidys.lib(tidylib.obj) : error LNK2001: unresolved external symbol __imp_fclose
msvcrt.lib(utility_desktop.obj) : error LNK2001: unresolved external symbol memset
oldnames.lib(strdup.obi) : error LNK2001: unresolved external symbol __imp_strdup
msvcrt.lib(dll_dllmain.obj) : error LNK2001: unresolved external symbol __C_specific_handler
msvcrt.lib(utility.obj) : error LNK2001: unresolved external symbol __C_specific_handler
msvcrt.lib(utility.obj) : error LNK2001: unresolved external symbol _seh_filter_dll

You will note they are not all related the tidys.lib... very strange... searched around and got some ideas...

This looks like the link is missing some specfic runtime library, but the Makefile already includes a considerable list of LDLOADLIBS, so still trying to sort that out... what exactly is missing? And then how to add more libs to the list... something to do with ExtUtils::Liblist...

Another idea is to compile tidys.lib, using the static runtimes, ie /MT instead of /MD - just add -DUSE_STATIC_RUNTIME:BOOL=YES to the cmake command... this might help, as it really reduces runtime dependencies...

Also could try the DLL version of Tidy library, but that would mean somehow also adding it to the Perl install... and dealing with the name clash, since this is also a build of a tidy.dll...

Alternatively, could try a MinGW.x64 build of tidys.lib - namely libtidys.a... that might be better...

Anyway, not much time left tonight to play, but have pushed my changes to my HTML::Tidy fork, in the 'test1' branch, if you want to look at them, or try...

Also now anxious to try a linux build... but that would be tomorrow, or soonest...

@geoffmcl
Copy link
Contributor Author

On Windows, today, did try -

  1. MinGw-w64 build, but it produced the same unresolved as MSVC14, and had to copy libtidys.a to tidys.lib for it to be found...
  2. Tried a /MT build, and while this reduced the missing externals, was now also missing say tidyLibraryVersion, and all the other tidy APIs used...
  3. Even tried a TIDY_CALL=__cdecl, like for pascal, but still missing the tidy API calls, as well as some others... like there only a difference of the leading _...

Some more options, combinations to try... still work to do here... WIP in Windows...

Ok, seemed to do better in linux (Ubuntu 14.04 x64), maybe? - with perl v5.18.2...

First some, fixes -

diff --git a/Makefile.PL b/Makefile.PL
index 1bca2ef..9a1fb93 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -9,8 +9,8 @@ use ExtUtils::MakeMaker;
 use ExtUtils::Liblist;
 use Config;
 
-my $libs = '-ltidyp';
-my $inc = "-I. -I/usr/include/tidyp -I/usr/local/include/tidyp -I$Config{usrinc}/tidyp";
+my $libs = '-ltidys';
+my $inc = "-I. -I/usr/include -I$Config{usrinc}/tidy";
 
 eval { require Alien::Tidyp; };
 
@@ -20,15 +20,15 @@ if ( !$@ ) {
     $inc = Alien::Tidyp->config('INC');
 }
 else {
-    print "Alien::Tidyp not found. Looking for for tidyp on your system.\n";
-    my @vars = ExtUtils::Liblist->ext( '-L/usr/lib -L/usr/local/lib -ltidyp', 0, 1 );
+    print "Alien::Tidyp not found. Looking for for tidys on your system.\n";
+    my @vars = ExtUtils::Liblist->ext( '-L/usr/lib -L/usr/local/lib -ltidys', 0, 1 );
     $libs = $vars[2];
 
     if ( !$libs ) {
-        $libs = '-ltidyp';
+        $libs = '-ltidys';
         print <<'EOF';
 
-It seems that you don't have tidyp installed.  HTML::Tidy does no
+It seems that you don't have tidys installed.  HTML::Tidy does no
 real work on its own.  It's just a wrapper aound tidyp.
 
 Please read the README.markdown file for details on how to install.
diff --git a/Tidy.xs b/Tidy.xs
index 2238b0b..1091189 100644
--- a/Tidy.xs
+++ b/Tidy.xs
@@ -2,8 +2,8 @@
 #include "perl.h"
 #include "XSUB.h"
 
-#include <tidyp.h>
-#include <buffio.h>
+#include <tidy.h>
+#include <tidybuffio.h>
 #include <stdio.h>
 #include <errno.h>
 
@@ -196,7 +196,8 @@ _tidyp_version()
     PREINIT:
         const char* version;
     CODE:
-        version = tidyVersion();
+        version = tidyLibraryVersion();
         RETVAL = newSVpv(version,0); /* will be automatically "mortalized" */
     OUTPUT:
         RETVAL
+ 

The perl -f Makefile.PL seemd to run fine, and it seems Tidy.c is not generated at this stage... but in the next stage...

Then ran make, and got -

Skip blib/lib/HTML/Tidy/Message.pm (unchanged)
Skip blib/lib/HTML/Tidy.pm (unchanged)
cc -c  -I. -I/usr/include -I/usr/include/tidy -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g   -DVERSION=\"1.56\" -DXS_VERSION=\"1.56\" -fPIC "-I/usr/lib/perl/5.18/CORE"   Tidy.c
Running Mkbootstrap for HTML::Tidy ()
chmod 644 Tidy.bs
rm -f blib/arch/auto/HTML/Tidy/Tidy.so
cc  -shared -L/usr/local/lib -fstack-protector Tidy.o  -o blib/arch/auto/HTML/Tidy/Tidy.so 	\
	   -L/usr/lib -L/usr/local/lib -ltidys  	\
	  
/usr/bin/ld: /usr/lib/libtidys.a(buffio.c.o): relocation R_X86_64_32S against `prvTidyg_default_allocator' can not be used when making a shared object; recompile with -fPIC
/usr/lib/libtidys.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make: *** [blib/arch/auto/HTML/Tidy/Tidy.so] Error 1

Ok, seems to want the libtidys.a recompiled with -fPIC... does not seem right for a static library, but who am I to argue with ld ;=))

Will get around to trying this... as before pushed my changes to my fork, test2 branch... any feed back welcome... thanks...

So as you asked @balthisar, it does seem HTML Tidy may need some small changes to be able to produce a compatible library, that can be used to build a Perl5 Tidy.dll build... at least in linux amd Windows...

Having already done this for his tidyp, maybe @petdance will have some ideas, clues, on this... thanks...

@petdance
Copy link
Contributor

Note that my goal is to do away with tidyp and Alien::Tidyp.

@petdance
Copy link
Contributor

I will try to take a look at the new tidy in the next couple of days. Right now I have a bunch of work with Perl::Critic and ack to deal with first.

@geoffmcl
Copy link
Contributor Author

@petdance thanks for the feedback... very welcome, as always...

Note that my goal is to do away with tidyp and Alien::Tidyp.

Agree absolutely! Luckily on my working systems I have neither of these installed... and thus did not specifically remove their references... but they should eventually be removed...

... work with Perl::Critic and ack to deal with first.

Absolutely understand... we each have our priorities...

  • Re: Windows build of Tidy.dll

First I returned to the default static tidys.lib build, and noted one single, little, thing...

In reading the Makefile created by perl -f Makefile.PL, it adds a link option -nodefaultlib! This is BAD... very bad... the so called default runtime libraries are very important...

I searched for ways to convince ExtUtils::MakeMaker to not do this... but found none... but maybe I missed something... need to learn, understand more here...

But if I manually modify the Makefile to exclude this option, then a perl5 Tidy.dll is built... Yowee... a success...

But before installing this now created Tidy.dll I run the usual nmake test, and get some positive and negative results... still to be analysed fully...

F:\Projects\html-tidy-pet-fork>nmake test

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        C:\Perl64\bin\perl.exe "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_ha
rness(0, 'blib\lib', 'blib\arch')" t/*.t
t/00-load.t .......... # Testing HTML::Tidy 1.56, Perl 5.014002; tidyp 5.5.27
t/00-load.t .......... ok
t/cfg-for-parse.t .... ok
t/clean-crash.t ...... 1/2 Use of uninitialized value $newline in regexp compilation at F:\Projects\html-tidy-pet-fork\b
lib\lib/HTML/Tidy.pm line 243, <DATA> line 1.
Use of uninitialized value $errs in split at F:\Projects\html-tidy-pet-fork\blib\lib/HTML/Tidy.pm line 243, <DATA> line
1.
t/clean-crash.t ...... ok
t/extra-quote.t ...... 1/4
#   Failed test 'Should have exactly three messages'
#   at t/extra-quote.t line 31.
#          got: '4'
#     expected: '3'

#   Failed test 'Matching warnings'
#   at t/extra-quote.t line 35.
#     Structures begin differing at:
#          $got->[2] = '- (4:1) Warning: <img> illegal characters found in URI'
#     $expected->[2] = '- (4:1) Warning: <img> lacks "alt" attribute'
# Looks like you failed 2 tests of 4.
t/extra-quote.t ...... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/4 subtests
t/ignore-text.t ...... 1/3
#   Failed test 'Matching warnings'
#   at t/ignore-text.t line 33.
#     Structures begin differing at:
#          $got->[0] = Does not exist
#     $expected->[0] = 'DATA (24:XX) Warning: unescaped & which should be written as &amp;'
# Looks like you failed 1 test of 3.
t/ignore-text.t ...... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
t/ignore.t ........... 1/9
#   Failed test 'Matching warnings'
#   at t/ignore.t line 38.
#     Structures begin differing at:
#          $got->[2] = Does not exist
#     $expected->[2] = '- (24:XX) Warning: unescaped & which should be written as &amp;'
# Looks like you failed 1 test of 9.
t/ignore.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests
t/levels.t ........... 1/3
#   Failed test 'Matching messages'
#   at t/levels.t line 28.
#     Structures begin differing at:
#          $got->[3] = Does not exist
#     $expected->[3] = '- (24:XX) Warning: unescaped & which should be written as &amp;'
# Looks like you failed 1 test of 3.
t/levels.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
t/message.t .......... ok
t/opt-00.t ........... ok
 at t/parse-crash.t line 19HTML::Tidy: Unknown error type: Tidy found 3 warnings and 0 errors!
t/parse-crash.t ...... ok
t/perfect.t .......... ok
t/pod-coverage.t ..... skipped: Test::Pod::Coverage 1.04 required for testing POD coverage
t/pod.t .............. skipped: Test::Pod 1.14 required for testing POD
t/roundtrip.t ........ 1/3 HTML::Tidy: Unknown error type: Tidy found 4 warnings and 0 errors! at t/roundtrip.t line 18

#   Failed test 'Cleaned up properly'
#   at t/roundtrip.t line 31.
#          got: '<!DOCTYPE html>
# <html>
# <head>
# <meta name="generator" content=
# "HTML Tidy for HTML5 for Windows version 5.5.27">
# <title></title>
# </head>
# <body>
# <a href="http://www.example.com/"><em>This is a test.</em></a>
# </body>
# </html>
# '
#     expected: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
# <html>
# <head>
# <meta name="generator" content="Tidy">
# <title></title>
# </head>
# <body>
# <a href="http://www.example.com/"><em>This is a test.</em></a>
# </body>
# </html>
# '
# Looks like you failed 1 test of 3.
t/roundtrip.t ........ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
t/segfault-form.t .... 1/3 Use of uninitialized value $newline in regexp compilation at F:\Projects\html-tidy-pet-fork\b
lib\lib/HTML/Tidy.pm line 243, <DATA> line 1.
Use of uninitialized value $errs in split at F:\Projects\html-tidy-pet-fork\blib\lib/HTML/Tidy.pm line 243, <DATA> line
1.
t/segfault-form.t .... ok
t/simple.t ........... 1/4
#   Failed test 'Right number of initial messages'
#   at t/simple.t line 20.
#          got: '6'
#     expected: '5'
# Looks like you failed 1 test of 4.
t/simple.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests
t/too-many-titles.t .. 1/3
#   Failed test 'Matching warnings'
#   at t/too-many-titles.t line 27.
#     Structures begin differing at:
#          $got->[1] = '- (4:9) Warning: too many title elements in <title>'
#     $expected->[1] = '- (4:9) Warning: too many title elements in <head>'
# Looks like you failed 1 test of 3.
t/too-many-titles.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
 at t/unicode-nbsp.t line 19:Tidy: Unknown error type: No warnings or errors were found.

t/unicode-nbsp.t ..... 1/2 #   Failed test 'Perl chars OK'
#   at t/unicode-nbsp.t line 19.
#          got: '&nbsp;
# '
#     expected: '&nbsp;
# '
 at t/unicode-nbsp.t line 20pe: No warnings or errors were found.

#   Failed test 'Byte string OK'
#   at t/unicode-nbsp.t line 20.
#          got: '&nbsp;
# '
#     expected: '&nbsp;
# '
# Looks like you failed 2 tests of 2.
t/unicode-nbsp.t ..... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/2 subtests
t/unicode.t .......... 1/9
#   Failed test 'Cleanup didn't break anything'
#   at t/unicode.t line 35.
Wide character in print at C:/Perl64/lib/Test/Builder.pm line 1759, <DATA> line 1.
#          got: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
# <html>
# <head>
# <meta name="generator" content=
# "HTML Tidy for HTML5 for Windows version 5.5.27">
# <title>日本語のホムページ</title>
# </head>
# <body>
# <p>Unicodeが好きですか?</p>
# </body>
# </html>
# '
#     expected: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
# <html>
# <head>
# <meta name="generator" content="Tidy">
# <title>日本語のホムページ</title>
# </head>
# <body>
# <p>Unicodeが好きですか?</p>
# </body>
# </html>
# '

    #   Failed test 'Cleanup didn't break anything'
    #   at t/unicode.t line 54.
Wide character in print at C:/Perl64/lib/Test/Builder.pm line 1759, <DATA> line 1.
    #          got: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
    # <html>
    # <head>
    # <meta name="generator" content=
    # "HTML Tidy for HTML5 for Windows version 5.5.27">
    # <title>日本語のホムページ</title>
    # </head>
    # <body>
    # <p>Unicodeが好きですか?</p>
    # </body>
    # </html>
    # '
    #     expected: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
    # <html>
    # <head>
    # <meta name="generator" content="Tidy">
    # <title>日本語のホムページ</title>
    # </head>
    # <body>
    # <p>Unicodeが好きですか?</p>
    # </body>
    # </html>
    # '
    # Looks like you failed 1 test of 3.

#   Failed test 'Try send bytes to clean method.'
#   at t/unicode.t line 55.
# Looks like you failed 2 tests of 9.
t/unicode.t .......... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/9 subtests
t/venus.t ............ 1/2
#   Failed test 'Cooked stuff looks like what we expected'
#   at t/venus.t line 25.
#     Structures begin differing at:
#          $got->[5] = '  <body bgcolor="#FFFFFF" link="#5B3D23" alink="#8C6136"'
#     $expected->[5] = '  <body bgcolor="#FFFFFF" link="#5B3D23" alink="#8C6136" vlink="#BE844A" background="../../Wetla
ndGraphics/PaperBG.gif">'
# Looks like you failed 1 test of 2.
t/venus.t ............ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests
t/version.t ..........
#   Failed test 'Valid version string'
t/version.t .......... 1/4 #   at t/version.t line 11.
#                   '5.5.27'
#     doesn't match '(?^:^\d\.\d{2,}$)'
Argument "5.5.27" isn't numeric in numeric ge (>=) at (eval in cmp_ok) t/version.t line 12.

#   Failed test 'Valid version string'
#   at t/version.t line 11.
#                   '5.5.27'
#     doesn't match '(?^:^\d\.\d{2,}$)'
Argument "5.5.27" isn't numeric in numeric ge (>=) at (eval in cmp_ok) t/version.t line 12.
# Looks like you failed 2 tests of 4.
t/version.t .......... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/4 subtests
t/wordwrap.t ......... HTML::Tidy: Unknown error type: Tidy found 5 warnings and 0 errors! at t/wordwrap.t line 34

t/wordwrap.t ......... 1/1 #   Failed test 'Cleaned stuff looks like what we expected'
#   at t/wordwrap.t line 36.
#     Structures begin differing at:
#          $got->[1] = 'html>'
#     $expected->[1] = 'html PUBLIC "-//W3C//DTD HTML 3.2//EN">'
# Looks like you failed 1 test of 1.
t/wordwrap.t ......... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests

Test Summary Report
-------------------
t/extra-quote.t    (Wstat: 512 Tests: 4 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 2
t/ignore-text.t    (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/ignore.t         (Wstat: 256 Tests: 9 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/levels.t         (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/roundtrip.t      (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/simple.t         (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/too-many-titles.t (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/unicode-nbsp.t   (Wstat: 512 Tests: 2 Failed: 2)
  Failed tests:  1-2
  Non-zero exit status: 2
t/unicode.t        (Wstat: 512 Tests: 9 Failed: 2)
  Failed tests:  4, 9
  Non-zero exit status: 2
t/venus.t          (Wstat: 256 Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
t/version.t        (Wstat: 512 Tests: 4 Failed: 2)
  Failed tests:  1, 3
  Non-zero exit status: 2
t/wordwrap.t       (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=22, Tests=66,  3 wallclock secs ( 0.08 usr +  0.09 sys =  0.17 CPU)
Result: FAIL
Failed 12/22 test programs. 16/66 subtests failed.
NMAKE : fatal error U1077: 'C:\Perl64\bin\perl.exe' : return code '0x1'
Stop.

So we get some success, and some failures...

Now, as I try to analyse the difference, it does seem some tests need to be adjusted to the current behavior of Tidy 5.5.??... That is maybe the test expected results need to also be adjusted... to what tidy5 will do...

Despite these negative test indications, I chose to do nmake install, and this successfully added C:\Perl64\site\lib\auto\HTML\Tidy\tidy.dll to my perl installation. and added C:\Perl64\site\lib\HTML\Tidy\Tidy.pm ... all seemed good...

But then when I tried Tidy03.pl, I get an error?

#!/usr/bin/perl -w
use strict;
use warnings;
use HTML::Tidy;
my $tidy = HTML::Tidy::Document->new();
if (! $tidy) {
    die "ERROR: Failed to load HTML::Tidy ... $! ...\n";
}
my $doc = <<"EOF;";
 <p>Hello HTML::Tidy!</p>
EOF;

Error message -

Can't locate object method "new" via package "HTML::Tidy::Document" (perhaps you forgot to load "HTML::Tidy::Document"?) at Tidy03.pl line 5.

So ok, the API has maybe changed from http://search.cpan.org/dist/HTML-Tidy/lib/HTML/Tidy.pm ...

But what is the new API? The tests seem to show just my $tidy = HTML::Tidy->new($args);, but I remain a bit confused...

Seems lots of good steps forward, and maybe my bad understanding of the Perl5 API is my problem...

Look forward to feedback, when you have the time... thanks...

@geoffmcl
Copy link
Contributor Author

@petdance ok, it looks to me as if just about everything is working ok in Windows... my simple test, more just to understand than anything else -

#!/usr/bin/perl -w
use strict;
use warnings;
use HTML::Tidy;

my $use_conf_file = 1;
my ($tidy);
if ($use_conf_file) {
    #my $txt = "show-body-only: 1\ntidy-mark: 0\n";
    my $txt = "indent: 1\nwrap: 0\nshow-info: 0\n";
    my $file = 'temptidy.cfg';
    open WOF, ">$file" or die "ERROR: Unable to open $file! $!\n";
    print WOF $txt;
    close WOF;
    $tidy = HTML::Tidy->new( {'config_file' => $file} );
} else {
    #my $args = { show_body_only => 1,
    #    show_info => 0 };
    my $args = { indent => 1,
        wrap => 0,
        tidy_mark => 0,
        show_info => 0 };
    $tidy = HTML::Tidy->new( $args );
}
if (! $tidy) {
    die "ERROR: Failed to load HTML::Tidy ... $! ...\n";
}
my $doc = <<"EOF;";
 <p>Hello Tidy!</p>
EOF;
#my $rc = $tidy->parse( '-', $doc );
#print "$clean\n";
my $clean = $tidy->clean( $doc );
print "$clean\n";
my @msg = $tidy->messages();
if (@msg) {
    my $cnt = scalar @msg;
    print "Have $cnt messages...\n";
    print join("\n",@msg)."\n";
}
# eof

This produced the great output -

 at tidy04.pl line 33rror type: Tidy found 3 warnings and 0 errors!
<!DOCTYPE html>
<html>
  <head>
    <meta name="generator" content="HTML Tidy for HTML5 for Windows version 5.5.27">
    <title></title>
  </head>
  <body>
    <p>
      Hello Tidy!
    </p>
  </body>
</html>

Have 3 messages...
 (1:2) Warning: missing <!DOCTYPE> declaration
 (1:2) Warning: inserting implicit <body>
 (1:2) Warning: inserting missing 'title' element

Not sure why the first line looks to be an overlap of messages, but maybe a Windows CrLf was not used somewhere... looks like just Cr got out...

Then I started looking at why the nmake test failed a few of the tests, and as suspected, some of them just need the expected updated to what current library HTML Tidy 5.5.27 will generate...

However test t/clean-crash.t suggests a small problem in tidy.xs, around line 163... in the _tidy_clean service... you had earlier made the prognosis /* XXX I think this cascade is a bug waiting to happen */... which comes true on this test case...

The input html of the test has an error, so no output will be generated! So I think output.bp will be zero. So no outputs will be pushed for the return...

void
_tidy_clean(input, configfile, tidy_options) 
    ...
        if ( rc >= 0 && output.bp && errbuf.bp ) {
            XPUSHs( sv_2mortal(newSVpvn((char *)output.bp, output.size)) );
            XPUSHs( sv_2mortal(newSVpvn((char *)errbuf.bp, errbuf.size)) );

So back in the Tidy.pm the appropriate variables will be uninitialized!

    my ($cleaned, $errbuf, $newline) = _tidy_clean( $text,
                                          $self->{config_file},
                                          $self->{tidy_options});
    utf8::decode($cleaned);
    utf8::decode($errbuf);

    $self->_parse_errors('', $errbuf, $newline);

Simply put, _tidy_clean must always return 3 params... or _parse_errors must deal with the not defined case, or something...

Of course for the test, we could probably add a { force_output => 1 }, config, but for some reason you have marked this option as unsupported. Not sure why?

I removed it from lib\HTML\Tidy.pm, rebuilt and added my $tidy = HTML::Tidy->new( { force_output => 1 } ); to this test, and it PASSES... but not sure why the Unknown error type: carp...

HTML::Tidy: Unknown error type: Tidy found 8 warnings and 1 error! at t/clean-crash.t line 20
t/clean-crash.t ...... ok

But of course it still should also work without this config option... to avoid the noise from line my @lines = split( /$newline/, $errs );...

Have pushed these changes to the test1 branch of my fork, and opened an Issue 1 there to mirror this issue, and track this Windows x64 build...

Will continue to work on other tests as time permit...

But really thank you for this great HTML::Tidy package... It is just what the doctor ordered ;=))

@petdance
Copy link
Contributor

I don't understand why you're posting all this over here on tidy-html5, when it's html-tidy that is the Perl around it.

@geoffmcl
Copy link
Contributor Author

@petdance well until I got deep enough into building, testing, etc, was still unsure if anything was needed here to make the tidys.lib library compatible to tidyp, and thus be linkable into html-tidy Tidy.dll! But so far it seems it is...

So, yes, sorry, will move these now purely perl issues back to my html-tidy fork, which I have started to do in Issue 1 there... thanks...

The aim there is to try to prepare a PR to merge into your master source... is this ok?

Or do you have some other preferred way I can offer help? Do you want a new issue for each? Directly in your repo? Patches maybe... please advise... only trying to help... thanks...

@petdance
Copy link
Contributor

I welcome the help. Thank you.

I'll let you know as soon as I get something working for this new tidy-html5.

@benkasminbullock
Copy link
Contributor

HTML::Valid was released using tidy-html5 in 2015:

https://metacpan.org/release/HTML-Valid

I've discussed this with @geoffmcl in the issues list several times.

HTML::Tidy seems to not be passing tests very successfully:

http://matrix.cpantesters.org/?dist=HTML-Tidy+1.58

@petdance
Copy link
Contributor

@benkasminbullock That's fantastic. I didn't know that that had been released. How does it compare to HTML::Tidy? Should I abandon HTML::Tidy and switch to HTML::Valid?

@benkasminbullock
Copy link
Contributor

@petdance The main use of the module at the moment is to get a hashset of HTML 5 tags, which is HTML::Valid::Tagset in the distribution. This is used by my other module HTML::Make as well as by some scripts I use to validate HTML. The HTML::Valid part of the distribution is not extensively tested at the moment so I don't recommend discarding anything. If you have some evaluations of the module then please do let me know, and I will try to improve it. It incorporates most of the code of this project except for some encoding related things.

@balthisar
Copy link
Member

Any reason to keep this open here? Is there something that we can do to help enable the Perl module?

@petdance
Copy link
Contributor

I see it as just a marker to the work that needs to be done on the Perl side.

@geoffmcl
Copy link
Contributor Author

As this is not technically a HTML Tidy issue, closing this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants