Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper rejection of plain scalar with first character not in printable set of \u0080-\u00FF #5

Closed
purinchu opened this issue Oct 13, 2018 · 3 comments

Comments

@purinchu
Copy link

YAML::PP fails to load valid YAML files that have plain scalars that start with a printable Unicode character in the range \u0080 through \u00FF. That is, characters that are printable in ASCII and Unicode work as first character, as do characters that are \u0100 or higher. Affected characters work fine if they're not the first as well.

I suspect the common Perl "Unicode bug" in the regexs handling plain scalars but I wasn't able to easily identify a fix within YAML::PP. Quoting the plain scalar is a sufficient workaround.

YAML::PP is version 0.009, as installed from App::Cpanminus under perlbrew using perl 5.28.0.

Output

The bug manifests in a Perl exception that generates output similar to:

$ perl test2.pl 
Line      : 5
Column    : 15
Expected  : ALIAS DOUBLEQUOTE FLOWMAP_START FLOWSEQ_START FOLDED LITERAL PLAIN SINGLEQUOTE
Got       : Invalid plain scalar
Where     : perl-5.28.0/lib/site_perl/5.28.0/YAML/PP/Parser.pm line 516
YAML      : "\x{c9}ric Bischoff\n"
  at perl-5.28.0/lib/site_perl/5.28.0/YAML/PP/Loader.pm line 60.

The "\x{c9}ric Bischoff" is Éric Bischoff in the source YAML file (with the É being \u00c9, the exact UTF-8 bytes in the source files are c3 89).

Troubleshooting Performed

I have confirmed the source file is both valid UTF-8 (using iconv) and valid YAML (using various online validators). The YAML 1.2 spec appears to say this should work with all printable Unicode characters that aren't "indicators" or otherwise confusable with other YAML syntax.

I mentioned that higher Unicode code points are unaffected. In fact a name of ☃ric Bischoff (snowman as first character, \u2603) works perfectly.

The workaround around we identified for those who can't change their name so easily is to quote the name, which YAML::PP parses fine.

Test case

This test case reproduces the bug and prints out characters which cause YAML::PP to fail to load in the range noted.

use 5.014;
use YAML::PP qw(Load);
use feature 'unicode_strings';

# Allow Perl to spit out UTF-8 to STDOUT
binmode STDOUT, ':encoding(UTF-8)';

my $base = "description: Foo\nmembers:\n- displayname: ";
# Toggle single-quoting or plain scalar testcase
$base .= @ARGV ? "'Xric Bischoff'" : "Xric Bischoff";
my $index = index ($base, 'X');
say "$base\n\n---------\nReplacing 'X' with other printable chars:";

for my $char (0x21 .. 0x110) {
    my $str = $base;

    # Unprintable chars are not valid parts of a plain scalar
    my $replacement = chr($char);
    next if $replacement !~ /[[:print:]]/;

    substr ($str, $index, 1, $replacement);

    my $data = eval { Load($str) };
    say sprintf ("\\x%X (%s)", $char, $replacement), " doesn't work" if ($@);
}
@perlpunk
Copy link
Owner

Thanks, that was an easy fix, I had forgotten \xA0-\xFF in two of the regexes ;-)

Pushed to master

@purinchu
Copy link
Author

Thanks, I copied the fix to my local YAML::PP and can confirm it works here. Looking forward to seeing it in the next release!

@perlpunk
Copy link
Owner

FYI: Released 0.009_001 today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants