Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t/01_bad.t Failed test 'XML-Liberal-0.30/t/bad/BAD-chr-31.xml' at t/01_bad.t line 28 got: ' at position unknown location .. ' expected: '' #4

Closed
hakonhagland opened this issue Mar 4, 2022 · 26 comments

Comments

@hakonhagland
Copy link

I am trying to install on Ubuntu 21.10, perl version 5.34.0:

$ perl Makefile.PL 
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for XML::Liberal
Writing MYMETA.yml and MYMETA.json

$ make
cp lib/XML/Liberal/Remedy/NotUTF8.pm blib/lib/XML/Liberal/Remedy/NotUTF8.pm
cp lib/XML/Liberal/Remedy/Declaration.pm blib/lib/XML/Liberal/Remedy/Declaration.pm
cp lib/XML/Liberal/Remedy/StandaloneAttribute.pm blib/lib/XML/Liberal/Remedy/StandaloneAttribute.pm
cp lib/XML/Liberal/Remedy/InvalidEncoding.pm blib/lib/XML/Liberal/Remedy/InvalidEncoding.pm
cp lib/XML/Liberal/Remedy/LowAsciiChars.pm blib/lib/XML/Liberal/Remedy/LowAsciiChars.pm
cp lib/XML/Liberal/Remedy/TrailingElements.pm blib/lib/XML/Liberal/Remedy/TrailingElements.pm
cp lib/XML/Liberal/Remedy/TrailingDoctype.pm blib/lib/XML/Liberal/Remedy/TrailingDoctype.pm
cp lib/XML/Liberal/Remedy/UndeclaredNS.pm blib/lib/XML/Liberal/Remedy/UndeclaredNS.pm
cp lib/XML/Liberal/Remedy/EntityRef.pm blib/lib/XML/Liberal/Remedy/EntityRef.pm
cp lib/XML/Liberal/Remedy/XHTMLEmptyTag.pm blib/lib/XML/Liberal/Remedy/XHTMLEmptyTag.pm
cp lib/XML/Liberal/LibXML.pm blib/lib/XML/Liberal/LibXML.pm
cp lib/XML/Liberal/Error.pm blib/lib/XML/Liberal/Error.pm
cp lib/XML/Liberal/Remedy/DeprecatedDTD.pm blib/lib/XML/Liberal/Remedy/DeprecatedDTD.pm
cp lib/XML/Liberal.pm blib/lib/XML/Liberal.pm
cp lib/XML/Liberal/Remedy/ControlCode.pm blib/lib/XML/Liberal/Remedy/ControlCode.pm
cp lib/XML/Liberal/Remedy/HTMLEntity.pm blib/lib/XML/Liberal/Remedy/HTMLEntity.pm
cp lib/XML/Liberal/Remedy/UnquotedAttribute.pm blib/lib/XML/Liberal/Remedy/UnquotedAttribute.pm
cp lib/XML/Liberal/Remedy/NestedCDATA.pm blib/lib/XML/Liberal/Remedy/NestedCDATA.pm
cp lib/XML/Liberal/Remedy/UnclosedHTML.pm blib/lib/XML/Liberal/Remedy/UnclosedHTML.pm
Manifying 2 pod documents

$ make test
PERL_DL_NONLAZY=1 "/home/hakon/perlbrew/perls/perl-5.34.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'inc', 'blib/lib', 'blib/arch')" t/*.t
t/00_compile.t ......... Possible precedence issue with control flow operator at /home/hakon/test/perl/XML-Liberal-0.30/inc/Test/Builder.pm line 917.
t/00_compile.t ......... ok   
t/01_bad.t ............. Possible precedence issue with control flow operator at /home/hakon/test/perl/XML-Liberal-0.30/inc/Test/Builder.pm line 917.
t/01_bad.t ............. 1/? 
#   Failed test '/home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-chr-31.xml'
#   at t/01_bad.t line 28.
#          got: ' at position unknown location at t/01_bad.t line 27.
# '
#     expected: ''
 
#   Failed test 'created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-chr-31.xml isa XML::LibXML::Document'
#   at t/01_bad.t line 29.
#     created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-chr-31.xml isn't defined
Can't call method "toString" on an undefined value at t/01_bad.t line 33, <$fh> line 1.
# Looks like you failed 2 tests of 7.
# Looks like your test exited with 2 just after 7.
t/01_bad.t ............. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests 
t/02_global.t .......... Possible precedence issue with control flow operator at /home/hakon/test/perl/XML-Liberal-0.30/inc/Test/Builder.pm line 917.
t/02_global.t .......... 1/? 
#   Failed test '/home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml'
#   at t/02_global.t line 19.
#          got: 'parser error : Specification mandates value for attribute noshade at position 3:14 at t/02_global.t line 18.
# '
#     expected: ''
 
#   Failed test 'created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isa XML::LibXML::Document'
#   at t/02_global.t line 20.
#     created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isn't defined
 
#   Failed test '/home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml'
#   at t/02_global.t line 25.
#          got: 'parser error : Specification mandates value for attribute noshade at position 3:14 at t/02_global.t line 24.
# '
#     expected: ''
 
#   Failed test 'created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isa XML::LibXML::Document'
#   at t/02_global.t line 26.
#     created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isn't defined
 
#   Failed test '/home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml'
#   at t/02_global.t line 30.
#          got: 'parser error : Specification mandates value for attribute noshade at position 3:14 at t/02_global.t line 29.
# '
#     expected: ''
 
#   Failed test 'created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isa XML::LibXML::Document'
#   at t/02_global.t line 31.
#     created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isn't defined
# Looks like you failed 6 tests of 144.
t/02_global.t .......... Dubious, test returned 6 (wstat 1536, 0x600)
Failed 6/144 subtests 
t/03_global_destroy.t .. Possible precedence issue with control flow operator at /home/hakon/test/perl/XML-Liberal-0.30/inc/Test/Builder.pm line 917.
t/03_global_destroy.t .. 1/? 
#   Failed test '/home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml'
#   at t/03_global_destroy.t line 24.
#          got: 'parser error : Specification mandates value for attribute noshade at position 3:14 at t/03_global_destroy.t line 23.
# '
#     expected: ''
 
#   Failed test 'created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isa XML::LibXML::Document'
#   at t/03_global_destroy.t line 25.
#     created DOM node with /home/hakon/test/perl/XML-Liberal-0.30/t/bad/BAD-atttibute.xml isn't defined
# Looks like you failed 2 tests of 71.
t/03_global_destroy.t .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/71 subtests 
t/04_sanity.t .......... Possible precedence issue with control flow operator at /home/hakon/test/perl/XML-Liberal-0.30/inc/Test/Builder.pm line 917.
Redundant argument in sprintf at /home/hakon/test/perl/XML-Liberal-0.30/inc/Spiffy.pm line 232.
Redundant argument in sprintf at /home/hakon/test/perl/XML-Liberal-0.30/inc/Spiffy.pm line 232.
Redundant argument in sprintf at /home/hakon/test/perl/XML-Liberal-0.30/inc/Spiffy.pm line 232.
Redundant argument in sprintf at /home/hakon/test/perl/XML-Liberal-0.30/inc/Spiffy.pm line 232.
t/04_sanity.t .......... 1/? parser error : Specification mandates value for attribute nofoo at position 2:12 at t/04_sanity.t line 6.
# Looks like your test exited with 2 just after 12.
t/04_sanity.t .......... Dubious, test returned 2 (wstat 512, 0x200)
All 12 subtests passed 
 
Test Summary Report
-------------------
t/01_bad.t           (Wstat: 512 Tests: 7 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 2
t/02_global.t        (Wstat: 1536 Tests: 144 Failed: 6)
  Failed tests:  31-36
  Non-zero exit status: 6
t/03_global_destroy.t (Wstat: 512 Tests: 71 Failed: 2)
  Failed tests:  16-17
  Non-zero exit status: 2
t/04_sanity.t        (Wstat: 512 Tests: 12 Failed: 0)
  Non-zero exit status: 2
Files=5, Tests=235,  1 wallclock secs ( 0.04 usr  0.00 sys +  0.35 cusr  0.04 csys =  0.43 CPU)
Result: FAIL
Failed 4/5 test programs. 10/235 subtests failed.
make: *** [Makefile:800: test_dynamic] Error 2
@hakonhagland
Copy link
Author

See also this question on stackoverflow.com

@ikegami
Copy link

ikegami commented Mar 4, 2022

In lib/XML/Liberal/Remedy/StandaloneAttribute.pm, we have

    my ($attr) = $error->message =~
        /^parser error : Specification mandate value for attribute (\w+)/
            or return 0;

This should be

    my ($attr) = $error->message =~
        /^parser error : Specification mandates? value for attribute (\w+)/
            or return 0;

(I'm assuming that some versions of libxml2 returned Specification mandate value. A web search seems to confirm this.)

@ikegami
Copy link

ikegami commented Mar 4, 2022

The other bug, at position unknown location at t/01_bad.t line 27., appears to be a problem with extract_error. It's returning the following for the BAD-chr-*.xml tests:

bless( {
    'message' => undef,
    'line' => undef,
    'column' => undef,
    'location' => undef
}, 'XML::Liberal::Error' );

Not a fix, but wouldn't it be better to get rid of XML::Liberal::Remedy::ControlCode and simply doing s/[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f]+//g unconditionally up front?

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

Fixed the regexp for Specification mandates value. I can't repro the other issue about control code though.

@hakonhagland
Copy link
Author

hakonhagland commented Mar 4, 2022

@miyagawa I now get an error when I run Makefile.PL

$ git clone git@github.com:miyagawa/XML-Liberal.git
$ cd XML-Liberal
$ perl Makefile.PL 
include /home/hakon/perl/test/XML-Liberal/inc/Module/Install.pm
Bareword "auto_set_repository" not allowed while "strict subs" in use at Makefile.PL line 5.
Bareword "use_test_base" not allowed while "strict subs" in use at Makefile.PL line 19.
Execution of Makefile.PL aborted due to compilation errors.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

that's because you don't have Module::Install::Repository and is not related to the test errors in this issue.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

I converted the repository to use Milla so you don't need to have Module::Install and its plugins.

@hakonhagland
Copy link
Author

Ok great, I retried now:

$ perl Makefile.PL 
Generating a Unix-style Makefile
Writing Makefile for XML::Liberal
Writing MYMETA.yml and MYMETA.json

$ make
cp lib/XML/Liberal/Remedy/UndeclaredNS.pm blib/lib/XML/Liberal/Remedy/UndeclaredNS.pm
cp lib/XML/Liberal/Remedy/XHTMLEmptyTag.pm blib/lib/XML/Liberal/Remedy/XHTMLEmptyTag.pm
cp lib/XML/Liberal/Remedy/InvalidEncoding.pm blib/lib/XML/Liberal/Remedy/InvalidEncoding.pm
cp lib/XML/Liberal/Remedy/DeprecatedDTD.pm blib/lib/XML/Liberal/Remedy/DeprecatedDTD.pm
cp lib/XML/Liberal/Remedy/EntityRef.pm blib/lib/XML/Liberal/Remedy/EntityRef.pm
cp lib/XML/Liberal/Remedy/NestedCDATA.pm blib/lib/XML/Liberal/Remedy/NestedCDATA.pm
cp lib/XML/Liberal/Remedy/LowAsciiChars.pm blib/lib/XML/Liberal/Remedy/LowAsciiChars.pm
cp lib/XML/Liberal/Remedy/ControlCode.pm blib/lib/XML/Liberal/Remedy/ControlCode.pm
cp lib/XML/Liberal/Remedy/Declaration.pm blib/lib/XML/Liberal/Remedy/Declaration.pm
cp lib/XML/Liberal/Remedy/StandaloneAttribute.pm blib/lib/XML/Liberal/Remedy/StandaloneAttribute.pm
cp lib/XML/Liberal/Remedy/NotUTF8.pm blib/lib/XML/Liberal/Remedy/NotUTF8.pm
cp lib/XML/Liberal/Remedy/UnclosedHTML.pm blib/lib/XML/Liberal/Remedy/UnclosedHTML.pm
cp lib/XML/Liberal/Remedy/UnquotedAttribute.pm blib/lib/XML/Liberal/Remedy/UnquotedAttribute.pm
cp lib/XML/Liberal/LibXML.pm blib/lib/XML/Liberal/LibXML.pm
cp lib/XML/Liberal/Error.pm blib/lib/XML/Liberal/Error.pm
cp lib/XML/Liberal/Remedy/HTMLEntity.pm blib/lib/XML/Liberal/Remedy/HTMLEntity.pm
cp lib/XML/Liberal.pm blib/lib/XML/Liberal.pm
cp lib/XML/Liberal/Remedy/TrailingElements.pm blib/lib/XML/Liberal/Remedy/TrailingElements.pm
cp lib/XML/Liberal/Remedy/TrailingDoctype.pm blib/lib/XML/Liberal/Remedy/TrailingDoctype.pm
Manifying 2 pod documents

$ make test
PERL_DL_NONLAZY=1 "/home/hakon/perlbrew/perls/perl-5.34.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00_compile.t ......... ok   
t/01_bad.t ............. 1/? 
#   Failed test '/home/hakon/perl/test/XML-Liberal/t/bad/BAD-chr-31.xml'
#   at t/01_bad.t line 28.
#          got: ' at position unknown location at t/01_bad.t line 27.
# '
#     expected: ''

#   Failed test ''created DOM node with /home/hakon/perl/test/XML-Liberal/t/bad/BAD-chr-31.xml' isa 'XML::LibXML::Document''
#   at t/01_bad.t line 29.
#     'created DOM node with /home/hakon/perl/test/XML-Liberal/t/bad/BAD-chr-31.xml' isn't defined
Can't call method "toString" on an undefined value at t/01_bad.t line 33, <$fh> line 1.
# Looks like your test exited with 2 just after 7.
t/01_bad.t ............. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests 
t/02_global.t .......... ok     
t/03_global_destroy.t .. ok    
t/04_sanity.t .......... ok    

Test Summary Report
-------------------
t/01_bad.t           (Wstat: 512 Tests: 7 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 2
Files=5, Tests=241,  0 wallclock secs ( 0.04 usr  0.01 sys +  0.51 cusr  0.07 csys =  0.63 CPU)
Result: FAIL
Failed 1/5 test programs. 2/241 subtests failed.
make: *** [Makefile:887: test_dynamic] Error 255

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

I can't reproduce the error: what version of perl and libxml2 do you have?

Mine is Perl 5.32.1, and

➜  xml2-config --version
2.9.4

@hakonhagland
Copy link
Author

$ xml2-config --version
2.9.12
$ perl --version
This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux

@ikegami
Copy link

ikegami commented Mar 4, 2022

The fix is to replace

my $error = $self->extract_error($@, \$xml);

with

my $error = $self->extract_error("$@", \$xml);

I'm guessing the AUTOLOAD mechanism is implicitly localizing $@. By using "$@" instead of $@, a new anonymous scalar is created, so changes to $@ no longer effect the scalar used as argument.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

Interesting. Is that a change in perl 5.34?

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

Bizarre, still can't repro with perl 5.34.0 with my libxml2, so it must be libxml2 version somehow that doesn't materialize this condition.

➜  perl -v | grep This
This is perl 5, version 34, subversion 0 (v5.34.0) built for darwin-2level

➜  perl -MXML::LibXML -E 'say for  $XML::LibXML::VERSION, XML::LibXML::LIBXML_DOTTED_VERSION' 
2.0207
2.9.4

➜  prove -lbr t
t/00_compile.t ......... ok   
t/01_bad.t ............. ok     
t/02_global.t .......... ok     
t/03_global_destroy.t .. ok    
t/04_sanity.t .......... ok    
All tests successful.
Files=5, Tests=453,  0 wallclock secs ( 0.03 usr  0.01 sys +  0.29 cusr  0.04 csys =  0.37 CPU)
Result: PASS

@ikegami
Copy link

ikegami commented Mar 4, 2022

You're right. It has nothing to do with AUTLOAD.

$ perl -Mv5.10 -e'package AAA { sub AUTOLOAD { say $@ // undef } } $@ = "message"; AAA->foo;'
message

The problem is that you're hitting this: last if $exn->message =~/Unregistered error message/;. When I stringify the error object, it bypasses that logic. I'm not sure what's the point of that logic.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

Yeah, libxml2 might have changed the errors for the bad encoded characters.

@ikegami
Copy link

ikegami commented Mar 4, 2022

When I stringify $@, this is what I get:

:2: parser error : Unregistered error message
<char number='30'><![CDATA[ � ]]></char>
                            ^
:2: parser error : PCDATA invalid Char value 30
<char number='30'><![CDATA[ � ]]></char>
                            ^
:2: parser error : Sequence ']]>' not allowed in content
<char number='30'><![CDATA[ � ]]></char>

@ikegami
Copy link

ikegami commented Mar 4, 2022

In any case, it can be avoided entirely by doing s/[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f]+//g up front. If the document doesn't contain these, no problem. If it does, it saves having to go through a parse-error-fix cycle.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

Well this is still weird, the logic still should have handled it correctly: if it encountered the Unregistered error message skips the first 3 lines to get the actual error, which sounds right from your stack trace.

Here's what I get if I add a debug output for my @errors = split /\n/, $exn for BAD-chr-17.xml:

t/01_bad.t .. 1/? $VAR1 = [
          ':2: parser error : Unregistered error message',
          '<char number=\'17\'><![CDATA[  ]]></char>',
          '                            ^',
          ':2: parser error : PCDATA invalid Char value 17',
          '<char number=\'17\'><![CDATA[  ]]></char>',
          '                            ^',
          ':2: parser error : Sequence \']]>\' not allowed in content',
          '<char number=\'17\'><![CDATA[  ]]></char>',
          '                             ^',
          ':2: parser error : Sequence \']]>\' not allowed in content',
          '<char number=\'17\'><![CDATA[  ]]></char>',
          '                              ^',
          ':2: parser error : internal error: detected an error in element content',
          '',
          '<char number=\'17\'><![CDATA[  ]]></char>',
          '                              ^'
        ];
t/01_bad.t .. ok   

I don't like the idea of fixing the XML before running it through the parser, since it will munge the input.

@miyagawa
Copy link
Owner

miyagawa commented Mar 4, 2022

I was able to reproduce with libxml 2.9.14, and applied the fix.

@hakonhagland
Copy link
Author

Great, the last version works fine here (no failed tests).

@ikegami
Copy link

ikegami commented Mar 22, 2022

I don't like the idea of fixing the XML before running it through the parser, since it will munge the input.

huh? You end up doing it anyway. You're just making the module slower and more error-prone. It's already broken once and you want to risk it happening again?

@miyagawa
Copy link
Owner

There's a difference between proactively processing the data and reactively patching the data as the parser emits errors, and this module by design only does the latter. And the performance is not a concern for this module.

Nothing stops you from doing the preprocessing before calling the module, though.

@ikegami
Copy link

ikegami commented Mar 22, 2022 via email

@miyagawa
Copy link
Owner

miyagawa commented Mar 22, 2022

I pretend not to know anything about incoming XML whether it's valid or not. It runs it through an existing parser, and fix them as errors happen. That's the design, whether you like it or not.

It's very error prone, but the entire concept of running it through another module and doing something based on its error message is, if that was not obvious. Changing how to handle one error message out of many doesn't change that.

@ikegami
Copy link

ikegami commented Mar 22, 2022 via email

@miyagawa
Copy link
Owner

It's already broken once and you want to risk it happening again?

I'm going to reply to your original question. The entire code is based on "run it through the parser, parse the error output and apply a patch". If you describe that is error-prone (which I do not disagree), there's nothing to do about it, other than rewriting the module to apply the known patches beforehand, which is also error prone, because then you have to be an XML parser on your own.

I don't understand what you're trying to get out of this conversation. Clearly you're not required to use this module (I suppose you don't anyway, and to be accurate, I don't either). Locking the issue.

Repository owner locked as off-topic and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants