New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

round trip grows regexp #69

Closed
boldra opened this Issue Dec 5, 2017 · 10 comments

Comments

Projects
None yet
2 participants
@boldra

boldra commented Dec 5, 2017

Test program

perl -MYAML::XS -E 'my $y=Load($ARGV[0]); for(1..10) { $y=Load(Dump($y)) }; say Dump($y)' 'match: !!perl/regexp (?i-xsm:OK)'

input

match: !!perl/regexp (?i-xsm:OK)

output

match: !!perl/regexp (?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^ui:OK)))))))))))

(expected output to be same as input).
Perl 5.24.3/Perl 5.22.4
YAML::XS 0.66

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 5, 2017

Collaborator

Unfortunately you can see this effect also simply with this code:

my $regex = qr{OK};
say $regex;
my $str = "$regex";
$regex = qr{$str};
say $regex;
__END__
(?^u:OK)
(?^u:(?^u:OK))

I remember Data::Dumper had the same problem years ago, so I sent a bugreport. We could look how it was solved.

I see no trivial way to solve this. One could look at the output of qr{} and try to remove that everytime when stringyfying a regex.

Collaborator

perlpunk commented Dec 5, 2017

Unfortunately you can see this effect also simply with this code:

my $regex = qr{OK};
say $regex;
my $str = "$regex";
$regex = qr{$str};
say $regex;
__END__
(?^u:OK)
(?^u:(?^u:OK))

I remember Data::Dumper had the same problem years ago, so I sent a bugreport. We could look how it was solved.

I see no trivial way to solve this. One could look at the output of qr{} and try to remove that everytime when stringyfying a regex.

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 5, 2017

Collaborator

Like this:

my $string = "(?i-xsm:OK)";
my ($pre, $post) = split m/PLACEHOLDER/, qr{PLACEHOLDER};
my $re;
for (1..5) {
    if ($string =~ s/^\Q$pre//) {  
        $string =~ s/\Q$post\E$//;
    }
    $re = qr{$string};
    say $re;
}
say $re;
__END__
(?^u:(?i-xsm:OK))

Only in XS instead...

Collaborator

perlpunk commented Dec 5, 2017

Like this:

my $string = "(?i-xsm:OK)";
my ($pre, $post) = split m/PLACEHOLDER/, qr{PLACEHOLDER};
my $re;
for (1..5) {
    if ($string =~ s/^\Q$pre//) {  
        $string =~ s/\Q$post\E$//;
    }
    $re = qr{$string};
    say $re;
}
say $re;
__END__
(?^u:(?i-xsm:OK))

Only in XS instead...

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk
Collaborator

perlpunk commented Dec 5, 2017

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk
Collaborator

perlpunk commented Dec 5, 2017

@boldra

This comment has been minimized.

Show comment
Hide comment
@boldra

boldra Dec 5, 2017

Anton Petrusevich just suggested this diff:


 

--- lib/YAML/XS.pm        2017-11-15 18:59:43.000000000 +0100

+++ /home/anton/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/YAML/XS.pm     2017-12-05 15:25:21.300019521 +0100

@@ -116,6 +116,7 @@

};

 sub __qr_loader {

+    1 while $_[0] =~  s/\A(\(\?\^u?:)(.*)\)\z/$2/;

     if ($_[0] =~ /\A  \(\?  ([ixsm]*)  (?:-  (?:[ixsm]*))?  : (.*) \)  \z/x) {

         my $sub = _QR_MAP->{$1} || _QR_MAP->{''};

         &$sub($2);

It certainly does the job for us, but I'm not sure whether all the superfluous quoting is going to start with the ^u?

boldra commented Dec 5, 2017

Anton Petrusevich just suggested this diff:


 

--- lib/YAML/XS.pm        2017-11-15 18:59:43.000000000 +0100

+++ /home/anton/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/YAML/XS.pm     2017-12-05 15:25:21.300019521 +0100

@@ -116,6 +116,7 @@

};

 sub __qr_loader {

+    1 while $_[0] =~  s/\A(\(\?\^u?:)(.*)\)\z/$2/;

     if ($_[0] =~ /\A  \(\?  ([ixsm]*)  (?:-  (?:[ixsm]*))?  : (.*) \)  \z/x) {

         my $sub = _QR_MAP->{$1} || _QR_MAP->{''};

         &$sub($2);

It certainly does the job for us, but I'm not sure whether all the superfluous quoting is going to start with the ^u?

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 6, 2017

Collaborator

I will have a look soon. In YAML.pm it works, so maybe we can get it working for YAML::XS too.

Collaborator

perlpunk commented Dec 6, 2017

I will have a look soon. In YAML.pm it works, so maybe we can get it working for YAML::XS too.

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 8, 2017

Collaborator

I made a quick fix for this in PR #70

Collaborator

perlpunk commented Dec 8, 2017

I made a quick fix for this in PR #70

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 8, 2017

Collaborator

btw, YAML.pm has the same problem because it also does not check the u:

use YAML;
use Encode;
my $yaml = decode_utf8 q{re : !!perl/regexp OK}; 
for (1..5) {
  $data = Load $yaml;
  $yaml = Dump $data;
}          
say $yaml;
__END__
---                    
re: !!perl/regexp (?^u:(?^u:(?^u:(?^u:(?^u:OK)))))

Collaborator

perlpunk commented Dec 8, 2017

btw, YAML.pm has the same problem because it also does not check the u:

use YAML;
use Encode;
my $yaml = decode_utf8 q{re : !!perl/regexp OK}; 
for (1..5) {
  $data = Load $yaml;
  $yaml = Dump $data;
}          
say $yaml;
__END__
---                    
re: !!perl/regexp (?^u:(?^u:(?^u:(?^u:(?^u:OK)))))

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 16, 2017

Collaborator

Thanks, released YAML::LibYAML 0.67_001

Collaborator

perlpunk commented Dec 16, 2017

Thanks, released YAML::LibYAML 0.67_001

@perlpunk

This comment has been minimized.

Show comment
Hide comment
@perlpunk

perlpunk Dec 16, 2017

Collaborator

Closing, created issue #72 for improving regex handling

Collaborator

perlpunk commented Dec 16, 2017

Closing, created issue #72 for improving regex handling

@perlpunk perlpunk closed this Dec 16, 2017

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Apr 3, 2018

dev-perl/YAML-LibYAML: Bump to version 0.690.0
Upstream:
- (Security) Add $LoadBlessed option[1] to run on/off loading objects
- Fix regex roundtrip [2]
- Fix loading of many regex classes [3]
- Support tags !!str, !!map and !!seq [4]
- Support JSON::PP::Boolean via $YAML::XS::Boolean [5]
- Fix Dump modifying inspected values [6]

[1]
Bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861958
Bug: ingydotnet/yaml-libyaml-pm#45
Bug: ingydotnet/yaml-libyaml-pm#73
Bug: ingydotnet/yaml-libyaml-pm#74
[2]
Bug: ingydotnet/yaml-libyaml-pm#69
Bug: ingydotnet/yaml-libyaml-pm#70
[3]
Bug: ingydotnet/yaml-libyaml-pm#64
Bug: ingydotnet/yaml-libyaml-pm#71
[4]
Bug: ingydotnet/yaml-libyaml-pm#67
[5]
Bug: ingydotnet/yaml-libyaml-pm#66
[6]
Bug: ingydotnet/yaml-libyaml-pm#32
Bug: ingydotnet/yaml-libyaml-pm#55
Package-Manager: Portage-2.3.24, Repoman-2.3.6

madmartin added a commit to madmartin/gentoo that referenced this issue Apr 4, 2018

dev-perl/YAML-LibYAML: Bump to version 0.690.0
Upstream:
- (Security) Add $LoadBlessed option[1] to run on/off loading objects
- Fix regex roundtrip [2]
- Fix loading of many regex classes [3]
- Support tags !!str, !!map and !!seq [4]
- Support JSON::PP::Boolean via $YAML::XS::Boolean [5]
- Fix Dump modifying inspected values [6]

[1]
Bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861958
Bug: ingydotnet/yaml-libyaml-pm#45
Bug: ingydotnet/yaml-libyaml-pm#73
Bug: ingydotnet/yaml-libyaml-pm#74
[2]
Bug: ingydotnet/yaml-libyaml-pm#69
Bug: ingydotnet/yaml-libyaml-pm#70
[3]
Bug: ingydotnet/yaml-libyaml-pm#64
Bug: ingydotnet/yaml-libyaml-pm#71
[4]
Bug: ingydotnet/yaml-libyaml-pm#67
[5]
Bug: ingydotnet/yaml-libyaml-pm#66
[6]
Bug: ingydotnet/yaml-libyaml-pm#32
Bug: ingydotnet/yaml-libyaml-pm#55
Package-Manager: Portage-2.3.24, Repoman-2.3.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment