Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split /\A/ works like /^/m, matches embedded newlines #14086

Closed
p5pRT opened this issue Sep 11, 2014 · 49 comments
Labels

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Sep 11, 2014

Migrated from rt.perl.org#122761 (status was 'resolved')

Searchable as RT122761$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @mauke

perldoc perlrebackslash​:

  \A "\A" only matches at the beginning of the string.

perldoc -f split​:

  Empty leading fields are produced when there are positive-width matches at the beginning of the string; a zero-width match at the beginning of the string does not produce an empty field.

Therefore split /\A/ should return the input string as is. \A can only match once (at offset 0), which (logically speaking) should turn "foo" into ("", "foo"), but because of the special case in split of not producing empty leading fields for zero-width matches at the beginning, we just get "foo" again.

What actually happens​:

$ perl -wE 'say "[$_]" for split /\A/, "foo\nbar\nbaz"'
[foo
]
[bar
]
[baz]

Apparently split thinks /\A/ is the same as /^/m, matching after every embedded newline in the input string. I think this is a bug in split.

The test above was with​:
This is perl 5, version 12, subversion 4 (v5.12.4) built for x86_64-linux

... but an IRC bot running 5.20.0 produces the same results so I assume it's still present in 5.20.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 14​:28, l.mai@​web.de <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by l.mai@​web.de
# Please include the string​: [perl #122761]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122761 >

perldoc perlrebackslash​:

\A "\A" only matches at the beginning of the string.

perldoc -f split​:

Empty leading fields are produced when there are positive-width matches
at the beginning of the string; a zero-width match at the beginning of the
string does not produce an empty field.

Therefore split /\A/ should return the input string as is. \A can only
match once (at offset 0), which (logically speaking) should turn "foo" into
("", "foo"), but because of the special case in split of not producing
empty leading fields for zero-width matches at the beginning, we just get
"foo" again.

What actually happens​:

$ perl -wE 'say "[$_]" for split /\A/, "foo\nbar\nbaz"'
[foo
]
[bar
]
[baz]

Apparently split thinks /\A/ is the same as /^/m, matching after every
embedded newline in the input string. I think this is a bug in split.

The test above was with​:
This is perl 5, version 12, subversion 4 (v5.12.4) built for x86_64-linux

... but an IRC bot running 5.20.0 produces the same results so I assume
it's still present in 5.20.

Yes this is still in blead.

I was party to breaking this in 7bd1e61 in
2007. (7 years to find the bug in the logic!) But the story is, as usual,
much more complicated than that.

This code does NOT use the regex engine for anything other parsing the
pattern. The pattern produces an SBOL END regop sequence, which is the same
as would be produced for /^/, and triggers the RXf_START_ONLY optimisation
case for split.

Part of the problem is that way way way back in the history of Perl,
someone decided that split /^/, $string should behave the same as split
/^/m, $string.

To explain more /^/m produces a MBOL op, "multi-beginning-of-line", and /^/
produces a SBOL op, "single-beginning-of-line".

And split will and has always treated both the same, as an MBOL, when the
pattern was JUST /^/.

Later on in history /\A/ was added as a synonym for /^/, and produces an
SBOL op.

When I upgraded the logic in 7bd1e61 to not look at the pattern
*string*, and instead look at the regop structure instead (a much more
reliable process), and set flags in the pattern which split would then use,
(something required to enable regex engine plug ins to trigger split
optimisations), I inadvertently made it so split /\A/ was the same as split
/^/ which was always the same thing as split /^/m.

So now we have a problem. There is LOADS of code out there that assumes that

split /^/, $string;

is the correct way to split a string into lines.

However it was only true because of the optimisation in split // did not
pay attention to the presence or absence of the /m flag.

So we are now in a jam.

I can do some kind of workaround that makes /\A/ not trigger this
optimisation, but basically split is broken by design for these kind of
cases.

The naive obvious fix would be to document that split // operates with the
m flag set by default, which would explain the unusual behavior of split
/^/, but that would break other patterns.

For instance split /^x/ does not act as though there is an implicit /m flag
set.

perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^x/, $str'

foo
xbar
xbaz
<<
perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^x/m, $str'
foo
<<
bar
<<
baz
<<

Compare with just plain /^/​:

perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^/m, $str'

foo
<<
xbar
<<
xbaz
<<
perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^/, $str'
foo
<<
xbar
<<
xbaz
<<

IOW, split /^/ and split /^/m do the same thing, which they definitely
shouldn't.

Given all this I really cant decide what to do. I *could* change the code
so that SBOL is exempt from this optimization (or perhaps triggered a
different optimisation) but that would break split /^/, on one level I
wouldn't mind, as I could argue it is broken already, but in practice I
think this would break really a lot of code, and at the very least we would
probably want a deprecation cycle. I *could* mess around and figure out a
way to distinguish /^/ from /\A/ even though the two should be identical,
or I could just say "yeah, split doesnt play by the rules, wont-fix".

A simple work around btw would be to write​: /\A|\A/. But that would suck.

I really dont know what to do here. Basically the root of this bug was
created probably in the very early history of Perl.

Another alternative would be to introduce a multiline version of \A, say \L
for this discussion, and then fix split /\A/ and split /\L/ to do the right
thing, and leave split /^/ broken (and document it is broken).

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Abigail

On Thu, Sep 11, 2014 at 04​:29​:02PM +0200, demerphq wrote​:

[SNIP]

So now we have a problem. There is LOADS of code out there that assumes that

split /^/, $string;

is the correct way to split a string into lines.

However it was only true because of the optimisation in split // did not
pay attention to the presence or absence of the /m flag.

I first thought having "split /^/" mean the same as "split /^/m" was
done intentionally, as it's documented by "perldoc -f split"​:

  A PATTERN of "/^/" is treated as if it were "/^/m", since it
  isn’t much use otherwise.

The third edition of "Programming Perl" documents this behaviour as well --
but not the second edition does not.

But looking at some old commits, this may actually not be the case.

Commit 2cdd06f (Aug 4,
1999/Ilya Zakharevich) makes perl warn on "split /^/",
saying it's usage to mean "split /^/m" is deprecated. Commit
46a8fef (Aug 5, 1999/Paul Marquess)
turns a "warn" into a "Perl_warner" (with the same message).
Then half an hour later, in commit 0e8f60d
(Aug 5, 1999/Jarkko Hietaniemi) the deprecation warning is no longer
on by default.

And then commit 1ec9456
(Jul 25, 2000/M. J. T. Guy) documents the current behaviour. The patch
says "with notes from tchrist and gbarr", and it was the summer of 2000
that people were working on the third edition of "Programming Perl".

I haven't checked the mail archives to see whether that was any discussion.

So we are now in a jam.

I can do some kind of workaround that makes /\A/ not trigger this
optimisation, but basically split is broken by design for these kind of
cases.

The naive obvious fix would be to document that split // operates with the
m flag set by default, which would explain the unusual behavior of split
/^/, but that would break other patterns.

As I said, /^/ implying a /m has already been documented for 14 years.

[SNIP]

Another alternative would be to introduce a multiline version of \A, say \L
for this discussion, and then fix split /\A/ and split /\L/ to do the right
thing, and leave split /^/ broken (and document it is broken).

My suggestion​: leave it as is, and document it. How useful is it to be
able to write​:

  split /\A/ => $foo;

when you could have written

  $foo;

instead?

Fixing it to do the "right thing" seems like a whole lot of work for little
benefit.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @mauke

Am Do 11. Sep 2014, 07​:29​:35, demerphq schrieb​:

I really dont know what to do here. Basically the root of this bug was
created probably in the very early history of Perl.

My first idea would be to revert the opcode checking and go back to the pattern source; i.e. do the equivalent of $src eq "^". That would keep backwards compatibility with existing code and the letter of the documentation ("If PATTERN is /^/, ..."). It would also make \A "work" (i.e. do nothing) again.

Then I'd add a deprecation note to the documentation; something like​:

  If PATTERN is /^/, then it is treated as if it used the multiline
  modifier (/^/m). However, this special case is deprecated. Always
  use /^/m in new code.

That leaves the door open to actual deprecation warnings if we decide to remove this feature in a future release.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 18​:27, l.mai@​web.de via RT <perlbug-followup@​perl.org>
wrote​:

Am Do 11. Sep 2014, 07​:29​:35, demerphq schrieb​:

I really dont know what to do here. Basically the root of this bug was
created probably in the very early history of Perl.

My first idea would be to revert the opcode checking and go back to the
pattern source; i.e. do the equivalent of $src eq "^". That would keep
backwards compatibility with existing code and the letter of the
documentation ("If PATTERN is /^/, ..."). It would also make \A "work"
(i.e. do nothing) again.

FWIW, I am really against using the raw pattern.

For instance I expect​:

split /(?​:)^/

to be the same as

split /^(?​:)/

to be the same as

split /^/

to be the same as

split /#this splits lines out without capturing the line break
^
#end of comment
/x

I fixed a bunch of issues like this when I redid this code. I am really
against changing back.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @ikegami

split already says​:

If PATTERN is /^/ , then it is treated as if it used the multiline modifier
<http​://perldoc.perl.org/perlreref.html#OPERATORS> (/^/m ), since it isn't
much use otherwise.

How about

If PATTERN is /^/ or /\A/, then it is treated as if it used the multiline
modifier <http​://perldoc.perl.org/perlreref.html#OPERATORS> (/^/m ), since
it isn't much use otherwise. Preprending (?​:) (e.g. /(?​:)^/) sufficiently
alters the pattern to restore the normal regex behaviour.

On Thu, Sep 11, 2014 at 10​:29 AM, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 14​:28, l.mai@​web.de <perlbug-followup@​perl.org>
wrote​:

# New Ticket Created by l.mai@​web.de
# Please include the string​: [perl #122761]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122761 >

perldoc perlrebackslash​:

\A "\A" only matches at the beginning of the string.

perldoc -f split​:

Empty leading fields are produced when there are positive-width
matches at the beginning of the string; a zero-width match at the beginning
of the string does not produce an empty field.

Therefore split /\A/ should return the input string as is. \A can only
match once (at offset 0), which (logically speaking) should turn "foo" into
("", "foo"), but because of the special case in split of not producing
empty leading fields for zero-width matches at the beginning, we just get
"foo" again.

What actually happens​:

$ perl -wE 'say "[$_]" for split /\A/, "foo\nbar\nbaz"'
[foo
]
[bar
]
[baz]

Apparently split thinks /\A/ is the same as /^/m, matching after every
embedded newline in the input string. I think this is a bug in split.

The test above was with​:
This is perl 5, version 12, subversion 4 (v5.12.4) built for x86_64-linux

... but an IRC bot running 5.20.0 produces the same results so I assume
it's still present in 5.20.

Yes this is still in blead.

I was party to breaking this in 7bd1e61
in 2007. (7 years to find the bug in the logic!) But the story is, as
usual, much more complicated than that.

This code does NOT use the regex engine for anything other parsing the
pattern. The pattern produces an SBOL END regop sequence, which is the same
as would be produced for /^/, and triggers the RXf_START_ONLY optimisation
case for split.

Part of the problem is that way way way back in the history of Perl,
someone decided that split /^/, $string should behave the same as split
/^/m, $string.

To explain more /^/m produces a MBOL op, "multi-beginning-of-line", and
/^/ produces a SBOL op, "single-beginning-of-line".

And split will and has always treated both the same, as an MBOL, when the
pattern was JUST /^/.

Later on in history /\A/ was added as a synonym for /^/, and produces an
SBOL op.

When I upgraded the logic in 7bd1e61 to not look at the pattern
*string*, and instead look at the regop structure instead (a much more
reliable process), and set flags in the pattern which split would then use,
(something required to enable regex engine plug ins to trigger split
optimisations), I inadvertently made it so split /\A/ was the same as split
/^/ which was always the same thing as split /^/m.

So now we have a problem. There is LOADS of code out there that assumes
that

split /^/, $string;

is the correct way to split a string into lines.

However it was only true because of the optimisation in split // did not
pay attention to the presence or absence of the /m flag.

So we are now in a jam.

I can do some kind of workaround that makes /\A/ not trigger this
optimisation, but basically split is broken by design for these kind of
cases.

The naive obvious fix would be to document that split // operates with the
m flag set by default, which would explain the unusual behavior of split
/^/, but that would break other patterns.

For instance split /^x/ does not act as though there is an implicit /m
flag set.

perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^x/, $str'

foo
xbar
xbaz
<<
perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^x/m, $str'
foo
<<
bar
<<
baz
<<

Compare with just plain /^/​:

perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^/m, $str'

foo
<<
xbar
<<
xbaz
<<
perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^/, $str'
foo
<<
xbar
<<
xbaz
<<

IOW, split /^/ and split /^/m do the same thing, which they definitely
shouldn't.

Given all this I really cant decide what to do. I *could* change the code
so that SBOL is exempt from this optimization (or perhaps triggered a
different optimisation) but that would break split /^/, on one level I
wouldn't mind, as I could argue it is broken already, but in practice I
think this would break really a lot of code, and at the very least we would
probably want a deprecation cycle. I *could* mess around and figure out a
way to distinguish /^/ from /\A/ even though the two should be identical,
or I could just say "yeah, split doesnt play by the rules, wont-fix".

A simple work around btw would be to write​: /\A|\A/. But that would suck.

I really dont know what to do here. Basically the root of this bug was
created probably in the very early history of Perl.

Another alternative would be to introduce a multiline version of \A, say
\L for this discussion, and then fix split /\A/ and split /\L/ to do the
right thing, and leave split /^/ broken (and document it is broken).

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Abigail

On Thu, Sep 11, 2014 at 09​:27​:36AM -0700, l.mai@​web.de via RT wrote​:

Am Do 11. Sep 2014, 07​:29​:35, demerphq schrieb​:

I really dont know what to do here. Basically the root of this bug was
created probably in the very early history of Perl.

My first idea would be to revert the opcode checking and go back to the pattern source; i.e. do the equivalent of $src eq "^". That would keep backwards compatibility with existing code and the letter of the documentation ("If PATTERN is /^/, ..."). It would also make \A "work" (i.e. do nothing) again.

Then I'd add a deprecation note to the documentation; something like​:

If PATTERN is /^/\, then it is treated as if it used the multiline
modifier \(/^/m\)\. However\, this special case is deprecated\. Always
use /^/m in new code\.

That leaves the door open to actual deprecation warnings if we decide to remove this feature in a future release.

As can been seen in my other post, we did this back in 1999. Then quickly
turned off the warning by default. And then a year later, just documented
the behaviour. It has been documented to work like this for 14 years now,
more than half the life time of Perl.

Considering the uselessness of splitting on just the beginning of the string
(which is effectively a noop), I do not think there's anything significant to
be gained by deprecating this.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 18​:59, Eric Brine <ikegami@​adaelis.com> wrote​:

split already says​:

If PATTERN is /^/ , then it is treated as if it used the multiline
modifier <http​://perldoc.perl.org/perlreref.html#OPERATORS> (/^/m ),
since it isn't much use otherwise.

How about

If PATTERN is /^/ or /\A/, then it is treated as if it used the multiline
modifier <http​://perldoc.perl.org/perlreref.html#OPERATORS> (/^/m ),
since it isn't much use otherwise. Preprending (?​:) (e.g. /(?​:)^/)
sufficiently alters the pattern to restore the normal regex behaviour.

I really really really hate the idea that prepending (?​:) to the pattern
should change what it does. It is completely counterintuitive. Like

foo((),$thing);

being different from

foo($thing);

And like I said we had a bunch of bug reports along those lines.

The real issue here is that the /^/ implies /m in split thing was not
properly thought out and should never have been done. Why should
C<split /^/, $string> be different from C<split /^x/, $string>?

This is yet another example of how "ooh neat" features, especially in the
regex engine almost *always* cause trouble that is nearly impossible to
resolve after the fact.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @rjbs

* demerphq <demerphq@​gmail.com> [2014-09-11T10​:29​:02]

I *could* mess around and figure out a way to distinguish /^/ from /\A/ even
though the two should be identical, or I could just say "yeah, split doesnt
play by the rules, wont-fix".

First off​: thanks for this post, which was interesting and useful.

It seems to me that the above is a subset of the below​:

Another alternative would be to introduce a multiline version of \A, say \L
for this discussion, and then fix split /\A/ and split /\L/ to do the right
thing, and leave split /^/ broken (and document it is broken).

That is​: you need to distinguish ^ from \A, whether or not you add \L, for such
a fix. Before I get into anything else​: is that an accurate reading of the
situation?

--
rjbs

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 19​:54, Ricardo Signes <perl.p5p@​rjbs.manxome.org>
wrote​:

* demerphq <demerphq@​gmail.com> [2014-09-11T10​:29​:02]

I *could* mess around and figure out a way to distinguish /^/ from /\A/
even
though the two should be identical, or I could just say "yeah, split
doesnt
play by the rules, wont-fix".

First off​: thanks for this post, which was interesting and useful.

No problem. Especially as I was indirectly responsible for part of the mess.

It seems to me that the above is a subset of the below​:

Another alternative would be to introduce a multiline version of \A, say
\L
for this discussion, and then fix split /\A/ and split /\L/ to do the
right
thing, and leave split /^/ broken (and document it is broken).

That is​: you need to distinguish ^ from \A, whether or not you add \L, for
such
a fix. Before I get into anything else​: is that an accurate reading of the
situation?

Er, sort of. What you describe is option 2 below.

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by default. At
the same time we would change the optimisation for /^/ to detect MBOL,
which is what is produced by split /^/m, $string. This would then mean that
only /^/ would trigger the optimisation as it would produce a MBOL, and \A
would be fine because it produces an SBOL.

2. use the flag field of the regop to store whether the SBOL comes from \A
or ^, and then only enable the /^/ optimisation when it was /^/.

Personally the more I think about this more i think that 1 is better, even
though it is probably the riskier of the two. Having said that I am
speaking hypothetically, option 1 *might* break something, but I struggle
to think what, whereas option 2 would leave lots of things "broken" that
are already "broken" and would fix this single case only, without breaking
anything else.

Consider what option 1 would result in​:

  split /^/, $string
  split /^x/, $string

would behave the same as far as the ^ operator goes. And it would mean that

split /^/, $string
split /$/, $string
spit /\Z/, $string

would behave similarly (that is match all the beginning or end of lines in
the string). That they dont IMO is pretty wrong. The justification applied
to make split /^/ work IMO applies just as much to split /$/ or split /\z/.

And it would fix the problem with /\A/ behaving like /^/m (which is
uncontroversially wrong).

And when I think about what it would break I struggle to think of
something. Does anything come to mind to anyone else?

Also the other nice thing about option one is it doesnt need an \L
metacharacter, which I proposed only because we would have no way to say "I
really want to split on the beginning of the string". Whereas if we
defaulted split compilation to enable /m then we could turn it off easily​:

split /(?-m​:^)/, $string

would disable the defaut /m flag. The reason I proposed the \L
metacharacter is there is no way to turn off a flag from the outside of the
pattern.

In fact the process of writing this email I have become sufficiently
convinced that option /m is the right thing to do and that I will start
writing the patch now so we can find out if it breaks anything.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @ap

* demerphq <demerphq@​gmail.com> [2014-09-11 20​:25]​:

Whereas if we defaulted split compilation to enable /m then we could
turn it off easily​:

split /(?-m​:^)/, $string

would disable the defaut /m flag.

Would writing it `split qr/^/, $string` also work? (I would hope yes.)

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 22​:45, Aristotle Pagaltzis <pagaltzis@​gmx.de> wrote​:

* demerphq <demerphq@​gmail.com> [2014-09-11 20​:25]​:

Whereas if we defaulted split compilation to enable /m then we could
turn it off easily​:

split /(?-m​:^)/, $string

would disable the defaut /m flag.

Would writing it `split qr/^/, $string` also work? (I would hope yes.)

No, when Karl changed qr/^/ to reduce down to (?^​:^) he changed the
semantics of such a case so we would not disable the /m flag, and this
would behave just the same as split /^/, $_. I think anyway.

IOW, (?^​:^) means "match /^/ under whatever rules the pattern is compiled
in".

In the older perls it would turn into (?-msix​:^) and then yes I think it
would have behaved as you expect.

Win-some, lose-some.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @ap

* demerphq <demerphq@​gmail.com> [2014-09-11 22​:55]​:

No, when Karl changed qr/^/ to reduce down to (?^​:^) he changed the
semantics of such a case so we would not disable the /m flag, and this
would behave just the same as split /^/, $_.

It does.

I am assuming that this special compilation context applies at the time
that split compiles the pattern, which implies that if split were made
to not stringify qr objects, then it would not apply to them. Correct?

If so – is that doable with reasonable effort?

It would go some ways toward regularising split’s behaviour further.

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Abigail

On Thu, Sep 11, 2014 at 08​:20​:28PM +0200, demerphq wrote​:

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by default.

To do that, you would first have to change the behavior of split, as
it currently does *NOT* do this. Only for /^/. Witness​:

  $ perl -E 'say "[$_]" for split /^a/m => "foo\nabar\nabaz"'
  [foo
  ]
  [bar
  ]
  [baz]

  $ perl -E 'say "[$_]" for split /^a/ => "foo\nabar\nabaz"'
  [foo
  abar
  abaz]

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 12 September 2014 00​:58, Abigail <abigail@​abigail.be> wrote​:

On Thu, Sep 11, 2014 at 08​:20​:28PM +0200, demerphq wrote​:

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by default.

To do that, you would first have to change the behavior of split, as
it currently does *NOT* do this. Only for /^/. Witness​:

$ perl \-E 'say "\[$\_\]" for split /^a/m => "foo\\nabar\\nabaz"'
\[foo
\]
\[bar
\]
\[baz\]

$ perl \-E 'say "\[$\_\]" for split /^a/ => "foo\\nabar\\nabaz"'
\[foo
abar
abaz\]

Yes, I have said exactly the same thing multiple times in this thread.

And to me its actually exactly the reason we *should* do this. I consider
the inconsistency here to be *most* undesirable.

As I said elsewhere in this thread, why should split /$/ not have the same
"special" rule applied? I find the extreme differences in the following to
be *most* surprising.

$ perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /^/, $str'

foo
<<
xbar
<<
xbaz
<<
$ perl -le'my $str="foo\nxbar\nxbaz\n"; print ">>$_<<" for split /$/, $str'
foo
xbar
xbaz<<

<<

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Abigail

On Fri, Sep 12, 2014 at 01​:09​:27AM +0200, demerphq wrote​:

On 12 September 2014 00​:58, Abigail <abigail@​abigail.be> wrote​:

On Thu, Sep 11, 2014 at 08​:20​:28PM +0200, demerphq wrote​:

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by default.

To do that, you would first have to change the behavior of split, as
it currently does *NOT* do this. Only for /^/. Witness​:

$ perl \-E 'say "\[$\_\]" for split /^a/m => "foo\\nabar\\nabaz"'
\[foo
\]
\[bar
\]
\[baz\]

$ perl \-E 'say "\[$\_\]" for split /^a/ => "foo\\nabar\\nabaz"'
\[foo
abar
abaz\]

Yes, I have said exactly the same thing multiple times in this thread.

And to me its actually exactly the reason we *should* do this. I consider
the inconsistency here to be *most* undesirable.

As I said elsewhere in this thread, why should split /$/ not have the same
"special" rule applied? I find the extreme differences in the following to
be *most* surprising.

Because noone uses /$/m to split a multiline string into individual lines,
as that leaves you with strings starting with a newline. Giving /$/
a special rule just means an extra testcase, and another thing to use
in obfuscated code, but it won't be useful for most people (it will also
be harmless).

I'm still figuring out what problem needs solving. Is it really a problem
that

  split /\A/, "multiline string";

splits as /^/m? Is splitting on the beginning of the string, resulting in a
one-element list consisting of the string itself so useful we want to
overhaul how split is working?

Can't we just document this exception?

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 01​:47, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 01​:09​:27AM +0200, demerphq wrote​:

On 12 September 2014 00​:58, Abigail <abigail@​abigail.be> wrote​:

On Thu, Sep 11, 2014 at 08​:20​:28PM +0200, demerphq wrote​:

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by
default.

To do that, you would first have to change the behavior of split, as
it currently does *NOT* do this. Only for /^/. Witness​:

$ perl \-E 'say "\[$\_\]" for split /^a/m => "foo\\nabar\\nabaz"'
\[foo
\]
\[bar
\]
\[baz\]

$ perl \-E 'say "\[$\_\]" for split /^a/ => "foo\\nabar\\nabaz"'
\[foo
abar
abaz\]

Yes, I have said exactly the same thing multiple times in this thread.

And to me its actually exactly the reason we *should* do this. I consider
the inconsistency here to be *most* undesirable.

As I said elsewhere in this thread, why should split /$/ not have the
same
"special" rule applied? I find the extreme differences in the following
to
be *most* surprising.

Because noone uses /$/m to split a multiline string into individual lines,
as that leaves you with strings starting with a newline. Giving /$/
a special rule just means an extra testcase, and another thing to use
in obfuscated code, but it won't be useful for most people (it will also
be harmless).

I'm still figuring out what problem needs solving. Is it really a problem
that

split /\\A/\, "multiline string";

splits as /^/m? Is splitting on the beginning of the string, resulting in a
one-element list consisting of the string itself so useful we want to
overhaul how split is working?

Can't we just document this exception?

I dont like the exceptions here, and I find the inconsistency to be very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

My intent is to make split default to /m enabled which I believe is the
right and complete way to have done this (mis)feature in the first place.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

On 12 September 2014 01​:47, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 01​:09​:27AM +0200, demerphq wrote​:

On 12 September 2014 00​:58, Abigail <abigail@​abigail.be> wrote​:

On Thu, Sep 11, 2014 at 08​:20​:28PM +0200, demerphq wrote​:

Thinking about this more I think there are two reasonable options​:

1. document that all patterns to split are compiled under /m by
default.

To do that, you would first have to change the behavior of split, as
it currently does *NOT* do this. Only for /^/. Witness​:

$ perl \-E 'say "\[$\_\]" for split /^a/m => "foo\\nabar\\nabaz"'
\[foo
\]
\[bar
\]
\[baz\]

$ perl \-E 'say "\[$\_\]" for split /^a/ => "foo\\nabar\\nabaz"'
\[foo
abar
abaz\]

Yes, I have said exactly the same thing multiple times in this thread.

And to me its actually exactly the reason we *should* do this. I consider
the inconsistency here to be *most* undesirable.

As I said elsewhere in this thread, why should split /$/ not have the
same
"special" rule applied? I find the extreme differences in the following
to
be *most* surprising.

Because noone uses /$/m to split a multiline string into individual lines,
as that leaves you with strings starting with a newline. Giving /$/
a special rule just means an extra testcase, and another thing to use
in obfuscated code, but it won't be useful for most people (it will also
be harmless).

I'm still figuring out what problem needs solving. Is it really a problem
that

split /\\A/\, "multiline string";

splits as /^/m? Is splitting on the beginning of the string, resulting in a
one-element list consisting of the string itself so useful we want to
overhaul how split is working?

Can't we just document this exception?

I dont like the exceptions here, and I find the inconsistency to be very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

But split is already full of exceptions​:

  * Any pattern matching the empty string is special cased.
  * // is even doubly special cased.
  * " " is special cased (but / / isn't)

My intent is to make split default to /m enabled which I believe is the
right and complete way to have done this (mis)feature in the first place.

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the entire
point of /\A/.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 10​:17​:03AM +0200, Abigail wrote​:

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the entire
point of /\A/.

Having said that, the only code effected by such a change is a split
on a multiline string, with a pattern which includes either /^/ or /$/,
other than a lone /^/. And since /^PAT/ is pretty useless without a /m,
I doubt there's a lot of code that's effected by such a change.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 13​:07, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 10​:17​:03AM +0200, Abigail wrote​:

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the entire
point of /\A/.

Having said that, the only code effected by such a change is a split
on a multiline string, with a pattern which includes either /^/ or /$/,
other than a lone /^/. And since /^PAT/ is pretty useless without a /m,
I doubt there's a lot of code that's effected by such a change.

Indeed. Exactly the same conclusion I came to as well.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

[snip]

I dont like the exceptions here, and I find the inconsistency to be very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

But split is already full of exceptions​:

* Any pattern matching the empty string is special cased.

I don't know if I agree here. Part of this behavior is the default
behaviour for how an empty pattern matches

perl -le'my $str="abcdef"; while($str=~//g) { print substr($str,$-[0],1) }'
a
b
c
d
e
f

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall what
you mean. (Which in itself is a good reason to eliminate as many of the
inconsistencies as possible.)

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug, and it
is on my todo list to fix. I don't consider one bug an excuse not to fix
other bugs.

My intent is to make split default to /m enabled which I believe is the
right and complete way to have done this (mis)feature in the first place.

Really? For what purpose?

Consistency in behaviour of things like split /^/ and split /^x/ at the
very least.

You'd potentially break code,

Lets find out if that is FUD or Fact. As far as I can tell the only code
that might be affected would be something like split /$/ which you are
already on record of saying nobody uses.

and it won't fix the issue of /\A/ acting like /^/m, because even with /m,
/\A/
isn't supposed to match any internal newlines anyway. That's the entire
point of /\A/.

Yes it does.

/^/ => SBOL
/^/m => MBOL
/\A/ => SBOL
/\A/m => SBOL

The equivalence of /^/ and /^/m is afforded by the following code​:

  else if (PL_regkind[fop] == BOL && nop == END)

if we change the default of split to /m and that code is changed to​:

  else if (fop == MBOL && nop == END)

then

split /^/, => MBOL
split /\A/, => SBOL

which fixes the bug in this thread, and make splits behaviour consistent
with regular patterns.

Yves
ps​: Abigail sorry for the dupe, I accidentally replied to you direct
instead of "to-all".

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 02​:21​:56PM +0200, demerphq wrote​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

[snip]

I dont like the exceptions here, and I find the inconsistency to be very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

But split is already full of exceptions​:

* Any pattern matching the empty string is special cased.

I don't know if I agree here. Part of this behavior is the default
behaviour for how an empty pattern matches

perl -le'my $str="abcdef"; while($str=~//g) { print substr($str,$-[0],1) }'
a
b
c
d
e
f

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall what
you mean. (Which in itself is a good reason to eliminate as many of the
inconsistencies as possible.)

From the split doc entry​:

  As a special case for "split", the empty pattern given in match
  operator syntax ("//") specifically matches the empty string,
  which is contrary to its usual interpretation as the last
  successful match.

So it's special cased to get to not mean the last succesful match,
but to be the empty string.

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug, and it
is on my todo list to fix. I don't consider one bug an excuse not to fix
other bugs.

How do you propose to "fix" that? Both C<< split " " >> and C<< split / / >>
are quite frequent, and useful.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @ap

* Abigail <abigail@​abigail.be> [2014-09-11 18​:15]​:

How useful is it to be able to write​:

split /\\A/ => $foo;

when you could have written

$foo;

instead?

There are quite a few APIs that use regexps as a sort of DSL. It’s not
hard to imagine a data munging function that does a split internally but
expects/allows you to specify the delimiter to split on. And on occasion
you may then have need to make the function not split the string at all,
in which case you require some kind of pattern that can turn split into
an identity function.

* Abigail <abigail@​abigail.be> [2014-09-12 10​:20]​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

My intent is to make split default to /m enabled which I believe is
the right and complete way to have done this (mis)feature in the
first place.

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the
entire point of /\A/.

Well yes, the entire point of this thread is the idea that split /\A/
should not behave like split /^/.

* demerphq <demerphq@​gmail.com> [2014-09-12 14​:25]​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall
what you mean. (Which in itself is a good reason to eliminate as many
of the inconsistencies as possible.)

  split //, "foobar" # yields qw( f o o b a r )

Normally an empty match reuses the last pattern but here it really means
an empty match. Maybe the sense in which Abigail is calling it doubly
special-cased is that it is normally special-cased in the RE engine, but
split, as a special-case in turn, removes that special-case treatment?

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 02​:57​:01PM +0200, Aristotle Pagaltzis wrote​:

* Abigail <abigail@​abigail.be> [2014-09-11 18​:15]​:

How useful is it to be able to write​:

split /\\A/ => $foo;

when you could have written

$foo;

instead?

There are quite a few APIs that use regexps as a sort of DSL. It’s not
hard to imagine a data munging function that does a split internally but
expects/allows you to specify the delimiter to split on. And on occasion
you may then have need to make the function not split the string at all,
in which case you require some kind of pattern that can turn split into
an identity function.

Sure, a niche case, and one for which /\A/ isn't the only option.
(I would use /(*FAIL)/ or /(?!)/, as that makes the intent more clear).

* Abigail <abigail@​abigail.be> [2014-09-12 10​:20]​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

My intent is to make split default to /m enabled which I believe is
the right and complete way to have done this (mis)feature in the
first place.

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the
entire point of /\A/.

Well yes, the entire point of this thread is the idea that split /\A/
should not behave like split /^/.

* demerphq <demerphq@​gmail.com> [2014-09-12 14​:25]​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall
what you mean. (Which in itself is a good reason to eliminate as many
of the inconsistencies as possible.)

split //\, "foobar" \# yields qw\( f o o b a r \)

Normally an empty match reuses the last pattern but here it really means
an empty match. Maybe the sense in which Abigail is calling it doubly
special-cased is that it is normally special-cased in the RE engine, but
split, as a special-case in turn, removes that special-case treatment?

Double special cased as in "not acting like the normal //, but acting as
an empty string", and "patterns matching the empty string are special cased",
although Yves gives convincing evidence the latter isn't all that special
cased.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 14​:50, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 02​:21​:56PM +0200, demerphq wrote​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

[snip]

I dont like the exceptions here, and I find the inconsistency to be
very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

But split is already full of exceptions​:

* Any pattern matching the empty string is special cased.

I don't know if I agree here. Part of this behavior is the default
behaviour for how an empty pattern matches

perl -le'my $str="abcdef"; while($str=~//g) { print substr($str,$-[0],1)
}'
a
b
c
d
e
f

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall what
you mean. (Which in itself is a good reason to eliminate as many of the
inconsistencies as possible.)

From the split doc entry​:

As a special case for "split"\, the empty pattern given in match
operator syntax \("//"\) specifically matches the empty string\,
which is contrary to its usual interpretation as the last
successful match\.

So it's special cased to get to not mean the last succesful match,
but to be the empty string.

Oh that. Right. That isn't a special case in split, its a special case in
the m// and s/// operator that isn't present in split nor is it in qr//.

$ perl -le'"foo"=~/(.*)/ and print $1; print qr//'
foo
(?^​:)

Also I thought we decided that that feature wasn't very useful and were
going to deprecate it. :-)

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug, and
it
is on my todo list to fix. I don't consider one bug an excuse not to fix
other bugs.

How do you propose to "fix" that? Both C<< split " " >> and C<< split / /

are quite frequent, and useful.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in 5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<" for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a split
pattern.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 15​:52, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 02​:57​:01PM +0200, Aristotle Pagaltzis wrote​:

* Abigail <abigail@​abigail.be> [2014-09-11 18​:15]​:

How useful is it to be able to write​:

split /\\A/ => $foo;

when you could have written

$foo;

instead?

There are quite a few APIs that use regexps as a sort of DSL. It’s not
hard to imagine a data munging function that does a split internally but
expects/allows you to specify the delimiter to split on. And on occasion
you may then have need to make the function not split the string at all,
in which case you require some kind of pattern that can turn split into
an identity function.

Sure, a niche case, and one for which /\A/ isn't the only option.
(I would use /(*FAIL)/ or /(?!)/, as that makes the intent more clear).

FWIW, I am only moderately interested in fixing the split /\A/ behaviour in
of itself. IOW, if it turns out the /m default "just wont work", then I
would be fine with saying "wont-fix". However since I believe the /m
default resolves a bunch of inconsistencies AND fixes the split /\A/ case I
am quite interested in getting that done.

* Abigail <abigail@​abigail.be> [2014-09-12 10​:20]​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

My intent is to make split default to /m enabled which I believe is
the right and complete way to have done this (mis)feature in the
first place.

Really? For what purpose? You'd potentially break code, and it won't
fix the issue of /\A/ acting like /^/m, because even with /m, /\A/
isn't supposed to match any internal newlines anyway. That's the
entire point of /\A/.

Well yes, the entire point of this thread is the idea that split /\A/
should not behave like split /^/.

* demerphq <demerphq@​gmail.com> [2014-09-12 14​:25]​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall
what you mean. (Which in itself is a good reason to eliminate as many
of the inconsistencies as possible.)

split //\, "foobar" \# yields qw\( f o o b a r \)

Normally an empty match reuses the last pattern but here it really means
an empty match. Maybe the sense in which Abigail is calling it doubly
special-cased is that it is normally special-cased in the RE engine, but
split, as a special-case in turn, removes that special-case treatment?

Double special cased as in "not acting like the normal //, but acting as
an empty string", and "patterns matching the empty string are special
cased",
although Yves gives convincing evidence the latter isn't all that special
cased.

Sorry to repeat a previous mail, but IMO it is not that split // is special
cased, but rather that m// and s//.../ are special cased. That special case
also does not apply to qr//. Although similar to the idea of (*SPLIT_WHITE)
I have also contemplated a meta pattern (*LAST_SUCCESSFUL_MATCH_PATTERN),
which would embed the last successful match pattern in another pattern.
This would mean we could get rid of the special case of the empty pattern,
which I consider dangerous, and we would actually have a more useful
construct, imagine something like this​:

if ($str=/$pat1/ or $str=/$pat2/ or $str=~/$pat3/) {
  $str=~m/\((*LAST_SUCCESSFUL_MATCH_PATTERN)\)/;
}

which I admit I am not sure how it would be used, but I am pretty sure
someone, (Damian?) would use it. :-)

Yves

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 04​:16​:13PM +0200, demerphq wrote​:

On 12 September 2014 14​:50, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 02​:21​:56PM +0200, demerphq wrote​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 05​:19​:49AM +0200, demerphq wrote​:

[snip]

I dont like the exceptions here, and I find the inconsistency to be
very
confusing. Regexes are hard enough without them having weird
inconsistencies like the ones we have.

But split is already full of exceptions​:

* Any pattern matching the empty string is special cased.

I don't know if I agree here. Part of this behavior is the default
behaviour for how an empty pattern matches

perl -le'my $str="abcdef"; while($str=~//g) { print substr($str,$-[0],1)
}'
a
b
c
d
e
f

* // is even doubly special cased.

Perhaps ENOTENOUGHCOFFEE, but can you expand on that, I don't recall what
you mean. (Which in itself is a good reason to eliminate as many of the
inconsistencies as possible.)

From the split doc entry​:

As a special case for "split"\, the empty pattern given in match
operator syntax \("//"\) specifically matches the empty string\,
which is contrary to its usual interpretation as the last
successful match\.

So it's special cased to get to not mean the last succesful match,
but to be the empty string.

Oh that. Right. That isn't a special case in split, its a special case in
the m// and s/// operator that isn't present in split nor is it in qr//.

$ perl -le'"foo"=~/(.*)/ and print $1; print qr//'
foo
(?^​:)

Also I thought we decided that that feature wasn't very useful and were
going to deprecate it. :-)

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug, and
it
is on my todo list to fix. I don't consider one bug an excuse not to fix
other bugs.

How do you propose to "fix" that? Both C<< split " " >> and C<< split / /

are quite frequent, and useful.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in 5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<" for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a split
pattern.

Are you saying you want to change the meaning of

  split " ", "string";

and people should write

  split qr /(*SPLIT_WHITE)/, "string";

instead?

That would not be very programmer friendly.

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 16​:27, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 04​:16​:13PM +0200, demerphq wrote​:

On 12 September 2014 14​:50, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 02​:21​:56PM +0200, demerphq wrote​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

[snip]

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug,
and
it
is on my todo list to fix. I don't consider one bug an excuse not to
fix
other bugs.

How do you propose to "fix" that? Both C<< split " " >> and C<< split
/ /

are quite frequent, and useful.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in 5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a
split
pattern.

Are you saying you want to change the meaning of

split " "\, "string";

and people should write

split qr /\(\*SPLIT\_WHITE\)/\, "string";

instead?

That would not be very programmer friendly.

No no. I mean that I think code like this​:

my $pat= qr/$user_pat/;

my @​things= split /$pat/, $input;

should be capable of producing split white semantics.

I have no intention of removing the split " ", $string semantics, and on
the contrary, the patch I mentioned for 5.18 means that I have made it even
easier to do this kind of thing. Eg​:

$ ./perl -Ilib -le'print $]; my $pat=" "; my $foo="foo\n\n\nbar\n\n"; print
">$_<" for split $pat, $foo'
5.021004

foo<
bar<

$ perl -le'print $]; my $pat=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for split $pat, $foo'
5.014002

foo

bar

<

Previously the *only* way to get split white behavior was to write
*exactly* C<split " ", $string>, there was no other way to do it.

So before that patch if you wanted to parametrically control the split, you
would need something like​:

my @​things= $pat eq " " ? split " ", $input : split $pat, $input;

you couldnt write this even​:

my @​things= split $pat eq " " ? " " : $pat, $input

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @Abigail

On Fri, Sep 12, 2014 at 04​:37​:21PM +0200, demerphq wrote​:

On 12 September 2014 16​:27, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 04​:16​:13PM +0200, demerphq wrote​:

On 12 September 2014 14​:50, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 02​:21​:56PM +0200, demerphq wrote​:

On 12 September 2014 10​:17, Abigail <abigail@​abigail.be> wrote​:

[snip]

* " " is special cased (but / / isn't)

I consider the inability to simulate " " using a qr// or // a bug,
and
it
is on my todo list to fix. I don't consider one bug an excuse not to
fix
other bugs.

How do you propose to "fix" that? Both C<< split " " >> and C<< split
/ /

are quite frequent, and useful.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in 5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a
split
pattern.

Are you saying you want to change the meaning of

split " "\, "string";

and people should write

split qr /\(\*SPLIT\_WHITE\)/\, "string";

instead?

That would not be very programmer friendly.

No no. I mean that I think code like this​:

my $pat= qr/$user_pat/;

my @​things= split /$pat/, $input;

should be capable of producing split white semantics.

I have no intention of removing the split " ", $string semantics, and on
the contrary, the patch I mentioned for 5.18 means that I have made it even
easier to do this kind of thing. Eg​:

$ ./perl -Ilib -le'print $]; my $pat=" "; my $foo="foo\n\n\nbar\n\n"; print
">$_<" for split $pat, $foo'
5.021004

foo<
bar<

$ perl -le'print $]; my $pat=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for split $pat, $foo'
5.014002

foo

bar

<

Previously the *only* way to get split white behavior was to write
*exactly* C<split " ", $string>, there was no other way to do it.

So before that patch if you wanted to parametrically control the split, you
would need something like​:

my @​things= $pat eq " " ? split " ", $input : split $pat, $input;

you couldnt write this even​:

my @​things= split $pat eq " " ? " " : $pat, $input

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

Excellent.

I think adding a qr /(*SPLIT_WHITE)/, while keeping the existing behaviour
of split, is a useful addition to the language.

I presume

  $str =~ s/(*SPLIT_WHITE)/.../;

and

  $str =~ /(*SPLIT_WHITE)/;

will be meaningless, just as

  $pat = qr /(*SPLIT_WHITE)/;
  $str =~ /$pat/;

Abigail

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 16​:45, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 04​:37​:21PM +0200, demerphq wrote​:

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

Excellent.

I think adding a qr /(*SPLIT_WHITE)/, while keeping the existing behaviour
of split, is a useful addition to the language.

I presume

$str =~ s/\(\*SPLIT\_WHITE\)/\.\.\./;

and

$str =~ /\(\*SPLIT\_WHITE\)/;

will be meaningless, just as

$pat = qr /\(\*SPLIT\_WHITE\)/;
$str =~ /$pat/;

Well, no, I think making them illegal in normal patterns would be nearly
impossible. The construct would need to do something, and I was thinking it
might behave the same as \n+ or something like that. Im open to suggestions
on what it does however, and I could probably be convinced to make it warn
in a normal pattern, implementation permitting.

Yves

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @cpansprout

On Fri Sep 12 05​:22​:28 2014, demerphq wrote​:

I consider the inability to simulate " " using a qr// or // a bug,
and it
is on my todo list to fix. I don't consider one bug an excuse not to
fix
other bugs.

Omitting initial empty fields is more a feature of split than of the regexp engine. Making a special pattern that does that makes as much sense to me as qr//c. If we were to consider the // to be part of the split operator (and I generally do), then we could introduce a m// modifier that only applies in split (and is an error otherwise).

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @cpansprout

On Fri Sep 12 07​:16​:50 2014, demerphq wrote​:

Oh that. Right. That isn't a special case in split, its a special
case in
the m// and s/// operator that isn't present in split nor is it in
qr//.

$ perl -le'"foo"=~/(.*)/ and print $1; print qr//'
foo
(?^​:)

Also I thought we decided that that feature wasn't very useful and
were
going to deprecate it. :-)

I use it.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in
5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a
split
pattern.

But if we are going to generalise it, it would be useful to skip initial null fields with other separators, such as /,/, too.

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @cpansprout

On Fri Sep 12 07​:37​:43 2014, demerphq wrote​:

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

To my mind, that just doesn’t add up. How is that much different from having a way to specify the second half of s/// with qr//?

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @cpansprout

On Fri Sep 12 07​:57​:02 2014, demerphq wrote​:

On 12 September 2014 16​:45, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 04​:37​:21PM +0200, demerphq wrote​:

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

Excellent.

I think adding a qr /(*SPLIT_WHITE)/, while keeping the existing behaviour
of split, is a useful addition to the language.

I presume

$str =~ s/\(\*SPLIT\_WHITE\)/\.\.\./;

and

$str =~ /\(\*SPLIT\_WHITE\)/;

will be meaningless, just as

$pat = qr /\(\*SPLIT\_WHITE\)/;
$str =~ /$pat/;

Well, no, I think making them illegal in normal patterns would be nearly
impossible. The construct would need to do something, and I was thinking it
might behave the same as \n+ or something like that. Im open to suggestions
on what it does however, and I could probably be convinced to make it warn
in a normal pattern, implementation permitting.

Oh, and what would split /(,)(?(1)(*SPLIT_WHITE))/ do?

I just can’t wrap my mind around this \s+-and-a-split-flag construct.

Maybe what we want is qr//k, where the /k flag is ignored by m// and s///, but is taken by split to mean sKip initial null fields.

But then what would split /foo${that_qr}bar/ do?

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 17​:20, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Fri Sep 12 05​:22​:28 2014, demerphq wrote​:

I consider the inability to simulate " " using a qr// or // a bug,
and it
is on my todo list to fix. I don't consider one bug an excuse not to
fix
other bugs.

Omitting initial empty fields is more a feature of split than of the
regexp engine. Making a special pattern that does that makes as much sense
to me as qr//c.

qr//c is obvious useless. On the other hand two *very* experienced regex
people, Abigail and myself, both see the utility of a (*SPLIT_WHITE) meta
pattern that allows split to trigger the special case triggered by split
//, $foo. I think that is sufficient justification to overlook your
inability to see its utility.

If we were to consider the // to be part of the split operator (and I
generally do),

I consider that wrong. Split is a function which uses a pattern as an
argument, and changes its behaviour based on what that pattern is.

then we could introduce a m// modifier that only applies in split (and is
an error otherwise).

I dont think a modifier is required, or even a particularly elegant
solution to this.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 17​:22, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Fri Sep 12 07​:16​:50 2014, demerphq wrote​:

Oh that. Right. That isn't a special case in split, its a special
case in
the m// and s/// operator that isn't present in split nor is it in
qr//.

$ perl -le'"foo"=~/(.*)/ and print $1; print qr//'
foo
(?^​:)

Also I thought we decided that that feature wasn't very useful and
were
going to deprecate it. :-)

I use it.

Yes, I think I have used it once or twice in my career. However the fact
that a very small number of people use a feature that most consider
confusing and dangerous is not generally a reason not to deprecate it. If
it was widely used then it would be different.

split qr/(*SPLIT_WHITE)/, $string

is my working plan. I fixed one issue related to this, I think in
5.18,
where there was no way at all to parametrically get the split white
behavior. Now you can do this​:

./perl -Ilib -le'my $str=" "; my $foo="foo\n\n\nbar\n\n"; print ">$_<"
for
split $str, $foo'

foo<
bar<

But I think one should be able to do this with a qr// object as well.

Basically (*SPLIT_WHITE) would be semantically equivalent to \s+ in a
normal regex, and produce the split white special case behavior in a
split
pattern.

But if we are going to generalise it, it would be useful to skip initial
null fields with other separators, such as /,/, too.

Then we can create a pattern that does it. (*EAT_EMPTY) maybe.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 17​:24, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Fri Sep 12 07​:37​:43 2014, demerphq wrote​:

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention of
changing how split behaves when its argument is not a qr// object.

To my mind, that just doesn’t add up. How is that much different from
having a way to specify the second half of s/// with qr//?

Completely different. As different as jet-planes and penguins.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 17​:28, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Fri Sep 12 07​:57​:02 2014, demerphq wrote​:

On 12 September 2014 16​:45, Abigail <abigail@​abigail.be> wrote​:

On Fri, Sep 12, 2014 at 04​:37​:21PM +0200, demerphq wrote​:

To summarize, I would like to make it so you can use a qr// object to
obtain *every* special case behaviour of split. I have no intention
of
changing how split behaves when its argument is not a qr// object.

Excellent.

I think adding a qr /(*SPLIT_WHITE)/, while keeping the existing
behaviour
of split, is a useful addition to the language.

I presume

$str =~ s/\(\*SPLIT\_WHITE\)/\.\.\./;

and

$str =~ /\(\*SPLIT\_WHITE\)/;

will be meaningless, just as

$pat = qr /\(\*SPLIT\_WHITE\)/;
$str =~ /$pat/;

Well, no, I think making them illegal in normal patterns would be nearly
impossible. The construct would need to do something, and I was thinking
it
might behave the same as \n+ or something like that. Im open to
suggestions
on what it does however, and I could probably be convinced to make it
warn
in a normal pattern, implementation permitting.

Oh, and what would split /(,)(?(1)(*SPLIT_WHITE))/ do?

Not sure yet. Maybe nothing.

I just can’t wrap my mind around this \s+-and-a-split-flag construct.

I couldn't possibly comment on your inability to wrap your mind around this.

Maybe what we want is qr//k, where the /k flag is ignored by m// and s///,
but is taken by split to mean sKip initial null fields.

/k is unavailable to us due to Regexp​::Common.

Although i retract an earlier comment, *maybe* a modifier is appropriate
for some of these issues. It deserves more reflection than I gave it on a
previous mail.

But then what would split /foo${that_qr}bar/ do?

Probably just revert to its "normal" regex behaviour.

cheers,
Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @bulk88

I dont think this ticket is productive anymore. 20 posts in half a day between just 2, or maybe 3 people.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @cpansprout

On Fri Sep 12 08​:33​:15 2014, demerphq wrote​:

qr//c is obvious useless. On the other hand two *very* experienced
regex
people, Abigail and myself, both see the utility of a (*SPLIT_WHITE)
meta
pattern that allows split to trigger the special case triggered by
split
//, $foo. I think that is sufficient justification to overlook your
inability to see its utility.

It’s not that I do not see its utility. It just seems like too much of a special case, and I thought we were trying to get away from those. If it’s something that goes in a pattern, but affects the behaviour of one specific operator that acts on the pattern, then what is its scope? etc., etc.

Now, if we want to add a thingy that goes in a pattern and flags the pattern to tell split not skip initial fields, then let’s make it general. E.g., your /(*SPLIT_WHITE)/ could be written /(*EAT_EMPTY)\s+/ or /(?q)\s+/ or /\s+/q (with q only because q is available).

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @hvds

I've completely lost track of the bifurcating paths of the discussion,
so I hope we'll get a synopsis of a proposal at some point, ideally as
a separate thread.

Somewhere in there were references to making split patterns act as if
they had //m on by default. That sounds like choosing to exchange one
crazy set of behaviours in all previous versions of perl with a
differently crazy set of behaviours in all subsequent versions. I may
have got the proposal wrong though.

Not sure that I've used /^/ or /\A/ much in split patterns, but I've
almost certainly used /$/ or /\z/, probably with things like​:
  /($delimiter)(?=$field(?=$delimiter|$))/
.. where the patterns for $delimiter and $field were sufficiently
ambiguous.

I was never aware of an implied //m, so I've never knowingly used that.
I'd much rather deprecate the special case, and let people say what
they mean, than invite further breakage by extending the special case
further.

Hugo

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 13, 2014

From @rjbs

* demerphq <demerphq@​gmail.com> [2014-09-11T14​:20​:28]

In fact the process of writing this email I have become sufficiently
convinced that option /m is the right thing to do and that I will start
writing the patch now so we can find out if it breaks anything.

I am sitting here making my "I am so nervous face," but I also can't really
come up with much that I think will be affected. As a side note, I did find
this amusing line​:

  https://metacpan.org/source/ANDYA/TAP-Parser-0.54/t/040-parse.t#L630

Anyway, on one hand and in one way this is a big scary change that makes me
antsy, but on the other hand, I think it will trade one goofy special case for
another straightforward one.

This is not to say that I'm saying "do it!" But it sounds like you want to
write the patch, and if you do that, we can smoke CPAN and also look at
specific changes. So I think that's a decent step forward.

A lot of other stuff came up in this thread about /other/ changes to split and
patterns. I think they should be discussed on their own, rather than as part
of discussing whether/how/why to fix splitting on /\A/.

--
rjbs

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 17, 2014

From @demerphq

On 11 September 2014 16​:29, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 14​:28, l.mai@​web.de <perlbug-followup@​perl.org>
wrote​:

# New Ticket Created by l.mai@​web.de
# Please include the string​: [perl #122761]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122761 >

perldoc perlrebackslash​:

\A "\A" only matches at the beginning of the string.

perldoc -f split​:

Empty leading fields are produced when there are positive-width
matches at the beginning of the string; a zero-width match at the beginning
of the string does not produce an empty field.

Therefore split /\A/ should return the input string as is. \A can only
match once (at offset 0), which (logically speaking) should turn "foo" into
("", "foo"), but because of the special case in split of not producing
empty leading fields for zero-width matches at the beginning, we just get
"foo" again.

[split]

I can do some kind of workaround that makes /\A/ not trigger this
optimisation,

I have fixed this with​:

1645b83 Perl RT #122761 - split /\A/
should not behave like split /^/m
aa48e90 change NODE_ALIGN_FILL to set
flags to 0
d3d47aa Eliminate the duplicative regops
BOL and EOL

Note that this does NOT make split // default to /m enabled. It simply
allows the split optimisation involved to distinguish between /^/ and /\A/.

Related to this I did some cleanup, freeing up bits, reducing object size,
and other simplifications.

/me puts away the chainsaw.

I still plan to try the "default to /m in split" and see what happens, so
please do not close this ticket right away, even though
1645b83 does fix the actual issue reported
in this ticket.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 17, 2014

From @cpansprout

On Tue Sep 16 20​:13​:15 2014, demerphq wrote​:

I have fixed this with​:

1645b83 Perl RT #122761 - split /\A/
should not behave like split /^/m
aa48e90 change NODE_ALIGN_FILL to set
flags to 0
d3d47aa Eliminate the duplicative
regops
BOL and EOL

Did the porting tests fail before you ran make regen to regenerate the table in perldebguts.pod?

**Duck**

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 17, 2014

From @demerphq

On 17 September 2014 06​:36, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Tue Sep 16 20​:13​:15 2014, demerphq wrote​:

I have fixed this with​:

1645b83 Perl RT #122761 - split /\A/
should not behave like split /^/m
aa48e90 change NODE_ALIGN_FILL to set
flags to 0
d3d47aa Eliminate the duplicative
regops
BOL and EOL

Did the porting tests fail before you ran make regen to regenerate the
table in perldebguts.pod?

**Duck**

No. d3d47aa includes changes to regen/regcomp.pl and regcomp.sym which
necessitated a regen anyway.

I did however leak some warning/diagnostics into the porting tests, which
should only be shown when it is run manually, which i fixed
in 53e1903.

Smarty pants. :-)

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 26, 2016

From @mauke

On Tue Sep 16 20​:13​:15 2014, demerphq wrote​:

I have fixed this with​:

1645b83 Perl RT #122761 - split /\A/
should not behave like split /^/m
aa48e90 change NODE_ALIGN_FILL to set
flags to 0
d3d47aa Eliminate the duplicative
regops
BOL and EOL

Note that this does NOT make split // default to /m enabled. It simply
allows the split optimisation involved to distinguish between /^/ and
/\A/.

Related to this I did some cleanup, freeing up bits, reducing object
size,
and other simplifications.

/me puts away the chainsaw.

I still plan to try the "default to /m in split" and see what happens,
so
please do not close this ticket right away, even though
1645b83 does fix the actual issue
reported
in this ticket.

Shouldn't this be done in a new ticket then? (Also, is this still happening?)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 26, 2016

From @jkeenan

On Fri Feb 26 10​:53​:38 2016, mauke- wrote​:

On Tue Sep 16 20​:13​:15 2014, demerphq wrote​:

I have fixed this with​:

1645b83 Perl RT #122761 - split /\A/
should not behave like split /^/m
aa48e90 change NODE_ALIGN_FILL to
set
flags to 0
d3d47aa Eliminate the duplicative
regops
BOL and EOL

Note that this does NOT make split // default to /m enabled. It
simply
allows the split optimisation involved to distinguish between /^/ and
/\A/.

Related to this I did some cleanup, freeing up bits, reducing object
size,
and other simplifications.

/me puts away the chainsaw.

I still plan to try the "default to /m in split" and see what
happens,
so
please do not close this ticket right away, even though
1645b83 does fix the actual issue
reported
in this ticket.

Shouldn't this be done in a new ticket then? (Also, is this still
happening?)

I recommend closing this ticket and having anyone pursuing this open a new ticket.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 27, 2016

@mauke - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this Feb 27, 2016
@p5pRT p5pRT added the Severity Low label Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.