Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonStrict flag for CPAN::Meta::YAML interop #44

Open
rurban opened this issue Mar 2, 2016 · 6 comments
Open

NonStrict flag for CPAN::Meta::YAML interop #44

rurban opened this issue Mar 2, 2016 · 6 comments

Comments

@rurban
Copy link

rurban commented Mar 2, 2016

My plan is to use the best and fastest YAML module for YAML load and dump in core (as Cpanel::JSON::XS for json).
But LibYAML is too strict for the CPAN::Meta validation tests.
The root cause is that CPAN::Meta::YAML is broken, accepting too much illegal YAML.
I cannot load too many .yml files from CPAN prefs and build dirs.
So I am writing a NonStrict mode to workaround those quirks.

I'm ok with rejecting illegal structures, which would lead to
t/data-fixable/98042513-META.yml being not fixable but illegal.

    Class::Date:                   ^A^A^F

because the value for Class::Date is missing, and will not be assigned to undef (as CPAN::Meta::YAML, i.e. YAML::Tiny) does.

But this should be fixable with my new $NonStrict flag:

    Class::Date:                   0^A^A^F

The illegal control characters are ignored then.

$ cperl -MYAML::XS -e'$YAML::XS::NonStrict=0;$h=YAML::XS::LoadFile("t/data-fixable/98042513-META.yml"); print %$h'
YAML::XS::LoadFile Error: The problem

    control characters are not allowed

was found at document: 29, offset: 395
$ cperl -MYAML::XS -e'$YAML::XS::NonStrict=1;$h=YAML::XS::LoadFile("t/data-fixable/98042513-META.yml"); print %$h'
... (nyi)

The 2nd problem is an illegal utf8 codepoint conflicts with perl core.
There I'm not sure yet if its a libyaml bug or a CPAN::Meta bug or really an illegal utf8 codepoint:

YAML::XS::LoadFile Error: The problem

invalid trailing UTF-8 octet

was found at document: 25708, offset: 70

not ok 40 - t/data-test/unicode.yml validates
#   Failed test 't/data-test/unicode.yml validates'
#   at t/validator.t line 40.
# ERRORS:
# Missing mandatory field, 'version' (version) [Validation: 1.0]

Ok with this plan?

BTW: As you can see in the errmsgs I have already written a XS version for LoadFile,
which uses the better file iterator. Unfortunately I haven't yet wired the filename from XS down to libyaml, which only needs a FILE*. So the errmsg is not good enough yet. But still better than now. You'll see with my upcoming PR.

@karenetheridge
Copy link

The root cause is that CPAN::Meta::YAML is broken, accepting too much illegal YAML.

CPAN::Meta::YAML is a thin wrapper around YAML::Tiny. However, I wonder if it should instead be a wrapper around YAML::XS (if XS is available), falling back to YAML::Tiny otherwise?

(Maybe I should have opened a new ticket for that. sorry!)

@rurban
Copy link
Author

rurban commented Mar 3, 2016

@karenetheridge: This will not work because both YAML::XS and YAML are broken and inconsistent.
YAML::XS insists on writing seq elements without intentation, while YAML cannot read that.
Arguably those are bugs in both libraries. Ingy thinks not, but at least one is wrong then, and then the spec is inconsistent.

YAML and YAML::XS insists on failing hard on illegal structure or elements, the CPAN::Meta::YAML validator is useless with them, because the parser returns a fatal error and {}.
All other YAML loaders replace illegal elements with undef. That's what the new NonStrict mode is for.
All other YAML loaders can load the YAML files produced by any YAML dumper. They are already nonstrict, and accept seqs with and without indent. There were many more such inconsistencies historically.

p5p is right by insisting on their own YAML::Tiny based library, but I want some performance from a maintained YAML::XS library, which excludes YAML::Syck.

So I'm gonna fix YAML and YAML::XS.
YAML because it is the default reader in CPAN, and it fails to read YAML::XS files. Which is the worst bug for the start. See http://blogs.perl.org/users/rurban/2016/03/on-yaml-and-yamlxs-inconsistencies.html

@karenetheridge
Copy link

Having a nonstrict/lax parser option sounds totally fine to me if it's not the default (or it will break backwards compatibility).

@rurban
Copy link
Author

rurban commented Mar 8, 2016

@karenetheridge: It is already broken for many years. There's not much backcompat to keep.
That's why everybody recommends to ignore YAML and only use JSON for the perl toolchain.
The goal is to make YAML::XS (and eventually YAML) usable for the perl toolchain again.
So far only YAML::Tiny and YAML::Syck are usable.

@ingydotnet : NonStrict is off by default, only turned on by the perl toolchain YAML loaders. It's fine for me if you prefer to throw strict errors, even if it breaks CPAN. But then the toolchain folks need to come to a decision what to do with illegal yaml, and how to define illegal yaml. They accept illegal yaml as "fixable", so I came up with this NonStrict mode. See Perl-Toolchain-Gang/CPAN-Meta#106

rurban pushed a commit to rurban/CPAN-Meta that referenced this issue Mar 8, 2016
Enhance the documentation for JSON and YAML backends.

Add tests for some cperl builtin prereqs.

t/validator.t now tests all installed YAML and JSON backends.

Skip one fixable testfile for which YAML and YAML::XS do not agree.
See ingydotnet/yaml-libyaml-pm#44
and Perl-Toolchain-Gang#106
rurban pushed a commit to rurban/libyaml that referenced this issue Mar 9, 2016
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to rurban/libyaml that referenced this issue Mar 9, 2016
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to rurban/libyaml that referenced this issue Mar 9, 2016
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to perl11/cperl that referenced this issue Mar 18, 2016
LibYAML is too strict to be able to parse YAML/CPAN-Meta-YAML
generated code, and pass the CPAN-Meta validation tests.
Add a NonStrict mode, 2 tests still fail.
Also add a native LoadFile method while we are here.

See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to rurban/libyaml that referenced this issue Dec 1, 2016
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
@perlpunk
Copy link
Collaborator

perlpunk commented Nov 9, 2017

@rurban

YAML::XS insists on writing seq elements without intentation, while YAML cannot read that.
Arguably those are bugs in both libraries. Ingy thinks not, but at least one is wrong then, and then the spec is inconsistent.

The dash is part of the indentation. That's how I understand the spec, and most others do. All yaml processors I know (the ones that are in yaml-editor) accept zero indented sequences, except YAML.pm. So this is a bug in YAML.pm. Feel free to fix it.

All other YAML loaders replace illegal elements with undef.

Can you please back that up with examples? What do you mean by "All other YAML loaders", and what do you mean by "illegal elements"?

Also see http://matrix.yaml.io/

perlpunk pushed a commit to perlpunk/libyaml that referenced this issue Dec 3, 2017
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
perlpunk pushed a commit to perlpunk/libyaml that referenced this issue Jan 27, 2018
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
@perlpunk
Copy link
Collaborator

Released YAML.pm 1.25_001 which supports zero indented block sequences

perlpunk pushed a commit to perlpunk/libyaml that referenced this issue Jul 17, 2018
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
perlpunk pushed a commit to perlpunk/libyaml that referenced this issue Jul 17, 2018
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
perlpunk pushed a commit to perlpunk/libyaml that referenced this issue Jul 17, 2018
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to rurban/libyaml that referenced this issue Jun 17, 2019
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
rurban pushed a commit to rurban/libyaml that referenced this issue Jun 21, 2019
optionally allow certian reader errors.
needed for perl YAML validation tests.
See ingydotnet/yaml-libyaml-pm#44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants