Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range Operator inconsistency? #16770

Closed
p5pRT opened this issue Nov 28, 2018 · 14 comments
Labels

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Nov 28, 2018

Migrated from rt.perl.org#133695 (status was 'pending release')

Searchable as RT133695$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 28, 2018

From @haukex

Dear P5P,

As first reported on PerlMonks in this thread​:
https://www.perlmonks.org/?node_id=1226434

perlop says​: "The range operator (in list context) makes use of the
magical auto-increment algorithm if the operands are strings. ... If the
final value specified is not in the sequence that the magical increment
would produce, the sequence goes until the next value would be longer
than the final value specified."

And yet there are some really strange inconsistencies with respect to
the produced ranges, sometimes the strings appear to be treated as
integers, sometimes they don't. In particular, compare "0".."-1", which
produces "0" through "99", to "1".."-1", which produces the empty list.

Some more test cases from Perl 5.26.0 on Linux are below. (A note on the
output​: unfortunately Data​::Dump numifies strings that look like
integers - e.g. "0".."99" does in fact produce the *strings* "0" through
"99" and "0".." -1 " the *strings* "0" through "9999", despite them
being shown as numbers below.)

$ perl -wMstrict -MData​::Dump -e' dd "0".."-1" '
(0 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "1".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "01".."-1" '
("01", "02", "03", "04", "05", "06", "07", "08", "09", 10 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "1".."xx" '
(1 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "11".."xx" '
(11 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."xx" '
(90 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "-1".."xx" '
-1
$ perl -wMstrict -MData​::Dump -e' dd "0".." -1 " '
(0 .. 9999)
$ perl -wMstrict -MData​::Dump -e' dd " 0 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd " 11 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".."-1.0" '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".." -1.0 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".." 1.0 " '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".."1.0" '
(0, 1)

Thanks, Regards,
-- Hauke D

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 28, 2018

From @haukex

Hi all,

Now with a test file attached.

Best,
-- Hauke D

On Wed, 28 Nov 2018 07​:15​:33 -0800, haukex@​zero-g.net wrote​:

Dear P5P,

As first reported on PerlMonks in this thread​:
https://www.perlmonks.org/?node_id=1226434

perlop says​: "The range operator (in list context) makes use of the
magical auto-increment algorithm if the operands are strings. ... If the
final value specified is not in the sequence that the magical increment
would produce, the sequence goes until the next value would be longer
than the final value specified."

And yet there are some really strange inconsistencies with respect to
the produced ranges, sometimes the strings appear to be treated as
integers, sometimes they don't. In particular, compare "0".."-1", which
produces "0" through "99", to "1".."-1", which produces the empty list.

Some more test cases from Perl 5.26.0 on Linux are below. (A note on the
output​: unfortunately Data​::Dump numifies strings that look like
integers - e.g. "0".."99" does in fact produce the *strings* "0" through
"99" and "0".." -1 " the *strings* "0" through "9999", despite them
being shown as numbers below.)

$ perl -wMstrict -MData​::Dump -e' dd "0".."-1" '
(0 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "1".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "01".."-1" '
("01", "02", "03", "04", "05", "06", "07", "08", "09", 10 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "1".."xx" '
(1 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "11".."xx" '
(11 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."xx" '
(90 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "-1".."xx" '
-1
$ perl -wMstrict -MData​::Dump -e' dd "0".." -1 " '
(0 .. 9999)
$ perl -wMstrict -MData​::Dump -e' dd " 0 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd " 11 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".."-1.0" '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".." -1.0 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".." 1.0 " '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".."1.0" '
(0, 1)

Thanks, Regards,
-- Hauke D

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 28, 2018

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 28, 2018

From [Unknown Contact. See original ticket]

Hi all,

Now with a test file attached.

Best,
-- Hauke D

On Wed, 28 Nov 2018 07​:15​:33 -0800, haukex@​zero-g.net wrote​:

Dear P5P,

As first reported on PerlMonks in this thread​:
https://www.perlmonks.org/?node_id=1226434

perlop says​: "The range operator (in list context) makes use of the
magical auto-increment algorithm if the operands are strings. ... If the
final value specified is not in the sequence that the magical increment
would produce, the sequence goes until the next value would be longer
than the final value specified."

And yet there are some really strange inconsistencies with respect to
the produced ranges, sometimes the strings appear to be treated as
integers, sometimes they don't. In particular, compare "0".."-1", which
produces "0" through "99", to "1".."-1", which produces the empty list.

Some more test cases from Perl 5.26.0 on Linux are below. (A note on the
output​: unfortunately Data​::Dump numifies strings that look like
integers - e.g. "0".."99" does in fact produce the *strings* "0" through
"99" and "0".." -1 " the *strings* "0" through "9999", despite them
being shown as numbers below.)

$ perl -wMstrict -MData​::Dump -e' dd "0".."-1" '
(0 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "1".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "01".."-1" '
("01", "02", "03", "04", "05", "06", "07", "08", "09", 10 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."-1" '
()
$ perl -wMstrict -MData​::Dump -e' dd "1".."xx" '
(1 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "11".."xx" '
(11 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "90".."xx" '
(90 .. 99)
$ perl -wMstrict -MData​::Dump -e' dd "-1".."xx" '
-1
$ perl -wMstrict -MData​::Dump -e' dd "0".." -1 " '
(0 .. 9999)
$ perl -wMstrict -MData​::Dump -e' dd " 0 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd " 11 ".." -1 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".."-1.0" '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".." -1.0 " '
()
$ perl -wMstrict -MData​::Dump -e' dd "0.0".." 1.0 " '
"0.0"
$ perl -wMstrict -MData​::Dump -e' dd " 0.0 ".."1.0" '
(0, 1)

Thanks, Regards,
-- Hauke D

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 29, 2018

From @iabyn

On Wed, Nov 28, 2018 at 07​:56​:34AM -0800, Hauke D via RT wrote​:

As first reported on PerlMonks in this thread​:
https://www.perlmonks.org/?node_id=1226434

perlop says​: "The range operator (in list context) makes use of the
magical auto-increment algorithm if the operands are strings. ... If the
final value specified is not in the sequence that the magical increment
would produce, the sequence goes until the next value would be longer
than the final value specified."

And yet there are some really strange inconsistencies with respect to
the produced ranges, sometimes the strings appear to be treated as
integers, sometimes they don't. In particular, compare "0".."-1", which
produces "0" through "99", to "1".."-1", which produces the empty list.

Perl internally tries very hard to treat the range args as numeric where
possible, and has a special exception for the string "0". The relevant
macro from pp_ctl.c (reformed for clarity) is​:

  /* This code tries to decide if "$left .. $right" should use the
  magical string increment, or if the range is numeric (we make
  an exception for .."0" [#18165]). AMS 20021031. */

  #define RANGE_IS_NUMERIC(left,right) (
  SvNIOKp(left)
  || (SvOK(left) && !SvPOKp(left))
  || SvNIOKp(right)
  || (SvOK(right) && !SvPOKp(right))
  || (
  (
  (!SvOK(left) && SvOK(right))
  || (
  (!SvOK(left) || looks_like_number(left))
  && SvPOKp(left)
  && *SvPVX_const(left) != '0')
  )
  && (!SvOK(right) || looks_like_number(right))
  )
  )

Frabnkly I don't understand all those conditions; they are a lot more
specific than the docs.

--
A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
  -- Things That Never Happen in "Star Trek" #9

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 29, 2018

@jkeenan - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 30, 2018

From @haukex

Hi,

Thanks for looking into this!

The code comment in the code you showed [1] mentions #18165 [2] which references #18114 [3] where a reply by Slaven Rezic makes sense to me​: 'There is a special handling for numeric strings beginning with a "0". This is to allow things like "01".."31" to preserve the leading zero for one-digit numbers.' The basic behavior appears to go all the way back to 5.000 [4].

  [1] https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179
  [2] https://rt.perl.org/Public/Bug/Display.html?id=18165
  [3] https://rt.perl.org/Public/Bug/Display.html?id=18114
  [4] https://perl5.git.perl.org/perl.git/blob/refs/tags/perl-5.000:/pp_ctl.c#l694

So my interpretation of the rules is this​: If the left and right operands are strings, then check if they looks_like_number. If they do, treat them as integers. However, make an exception when the left-hand side begins with "0", for the reason stated above.

The key word here is *begins* with zero; the condition *SvPVX_const(left)!='0' causes this inconsistency​:

  -3..-1 and "-3".."-1" are (-3,-2,-1)
  -2..-1 and "-2".."-1" are (-2,-1)
  -1..-1 and "-1".."-1" are (-1)
  1..-1 and "1".."-1" are ()
  however​:
  0..-1 is () but "0".."-1" is (0..99)

That latter behavior may be in line with "01".."-1", which is ("01","02","03",...), but IMO it's still surprising, and in any case the fact that strings that look like numbers are treated as such appears to be undocumented.

I have two alternative proposals​: (A) leave the behavior as-is, but document it, or (B) change the behavior so that the above condition is 'if the LHS is a string that begins with 0, except for the string "0" itself' (and document it) - this would cause the "01".."31" case to still work, but also cause "0".."-1" to act like 0..-1.

Patches for both A (just document) and B (change behavior) are attached, with tests included (a full build passes all tests on my end). My internals knowledge is quite limited so I hope my use of SvCUR in the second patch is correct.

My personal preference is option B, since it gets rid of the above inconsistency, but I understand that if there are worries about backwards compatibility; option A may be better in that respect. The way I've worded the documentation pretty much nails down the behavior and wouldn't allow for future changes, a third option might be to word the documentation more loosely and leave the door open for future changes.

Thanks, Regards,
-- Hauke D

P.S. The attachment "rt133695.pl" in my previous message contains an off-by-one error, but in an unused branch of code, so the output and conclusions produced by the script are still correct (as long as $inseq is always false, which it currently is).

On Thu, 29 Nov 2018 04​:05​:27 -0800, davem wrote​:

On Wed, Nov 28, 2018 at 07​:56​:34AM -0800, Hauke D via RT wrote​:

As first reported on PerlMonks in this thread​:
https://www.perlmonks.org/?node_id=1226434

perlop says​: "The range operator (in list context) makes use of the
magical auto-increment algorithm if the operands are strings. ...
If the
final value specified is not in the sequence that the magical
increment
would produce, the sequence goes until the next value would be
longer
than the final value specified."

And yet there are some really strange inconsistencies with respect
to
the produced ranges, sometimes the strings appear to be treated as
integers, sometimes they don't. In particular, compare "0".."-1",
which
produces "0" through "99", to "1".."-1", which produces the empty
list.

Perl internally tries very hard to treat the range args as numeric
where
possible, and has a special exception for the string "0". The relevant
macro from pp_ctl.c (reformed for clarity) is​:

/* This code tries to decide if "$left .. $right" should use the
magical string increment, or if the range is numeric (we make
an exception for .."0" [#18165]). AMS 20021031. */

#define RANGE_IS_NUMERIC(left,right) (
SvNIOKp(left)
|| (SvOK(left) && !SvPOKp(left))
|| SvNIOKp(right)
|| (SvOK(right) && !SvPOKp(right))
|| (
(
(!SvOK(left) && SvOK(right))
|| (
(!SvOK(left) || looks_like_number(left))
&& SvPOKp(left)
&& *SvPVX_const(left) != '0')
)
&& (!SvOK(right) || looks_like_number(right))
)
)

Frabnkly I don't understand all those conditions; they are a lot more
specific than the docs.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 30, 2018

From @haukex

rt133695_rangeop_zero_A_doc_only.patch
From 52296ca221128e2ed89d2f9e39520dcb96801eb9 Mon Sep 17 00:00:00 2001
From: Hauke D <haukex@zero-g.net>
Date: Fri, 30 Nov 2018 13:56:10 +0100
Subject: [PATCH] (perl #133695) Document range op details

"-2".."-1" is the same as -2..-1 and "1".."-1" is the same as 1..-1, but
"0".."-1" is the same as "0".."99". This patch documents the rules for
the range operator in list context with both operands being strings more
explicitly.

See also #18165 and #18114.
---
 pod/perlop.pod | 85 +++++++++++++++++++++++++++++++++++++++-----------
 pp_ctl.c       |  3 +-
 t/op/range.t   | 24 +++++++++++++-
 3 files changed, 92 insertions(+), 20 deletions(-)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index d6adbd11f2..9ff980e9b4 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1081,26 +1081,82 @@ And now some examples as a list operator:
     @foo = @foo[0 .. $#foo];        # an expensive no-op
     @foo = @foo[$#foo-4 .. $#foo];  # slice last 5 items
 
-The range operator (in list context) makes use of the magical
-auto-increment algorithm if the operands are strings.  You
-can say
+Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
+return two elements in list context.
 
-    @alphabet = ("A" .. "Z");
+    @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
 
-to get all normal letters of the English alphabet, or
+The range operator in list context can make use of the magical
+auto-increment algorithm if both operands are strings, subject to the
+following rules:
 
-    $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+=over
+
+=item *
+
+With one exception (below), if both strings look like numbers to Perl,
+the magic increment will not be applied, and the strings will be treated
+as numbers (more specifically, integers) instead.
+
+For example, C<"-2".."2"> is the same as C<-2..2>, C<"1".."-1"> is the
+same as C<1..-1> (producing the empty list), and C<"2.18".."3.14">
+produces C<2, 3>.
 
-to get a hexadecimal digit, or
+=item *
+
+The exception to the above rule is when the left-hand string begins with
+C<0>, including the string C<"0"> itself. In this case, the magic
+increment I<will> be applied, even though strings like C<"01"> would
+normally look like a number to Perl.
+
+For example, C<"01".."04"> produces C<"01", "02", "03", "04">, and
+C<"0".."-1"> produces C<"0"> through C<"99"> - this may seem
+surprising, but see the following rules for why it works this way.
+To get dates with leading zeros, you can say:
 
     @z2 = ("01" .. "31");
     print $z2[$mday];
 
-to get dates with leading zeros.
+If you want to force strings to be interpreted as numbers, you could say
+
+    @numbers = ( 0+$first .. 0+$last );
+
+=item *
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
+only the initial value will be returned.
+
+For example, C<"ax".."az"> produces C<"ax", "ay", "az">, but
+C<"*x".."az"> produces only C<"*x">.
+
+=item *
+
+For other initial values that are strings that do follow the rules of the
+magical increment, the corresponding sequence will be returned.
+
+For example, you can say
+
+    @alphabet = ("A" .. "Z");
+
+to get all normal letters of the English alphabet, or
+
+    $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+
+to get a hexadecimal digit.
+
+=item *
 
 If the final value specified is not in the sequence that the magical
 increment would produce, the sequence goes until the next value would
-be longer than the final value specified.
+be longer than the final value specified. If the length of the final
+string is shorter than the first, the empty list is returned.
+
+For example, C<"a".."--"> is the same as C<"a".."zz">, C<"0".."xx">
+produces C<"0"> through C<"99">, and C<"aaa".."--"> returns the empty
+list.
+
+=back
 
 As of Perl 5.26, the list-context range operator on strings works as expected
 in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
@@ -1108,10 +1164,8 @@ in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
 that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
 depends on the internal encoding of the range endpoint.
 
-If the initial value specified isn't part of a magical increment
-sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
-only the initial value will be returned.  So the following will only
-return an alpha:
+Because the magical increment only works on non-empty strings matching
+C</^[a-zA-Z]*[0-9]*\z/>, the following will only return an alpha:
 
     use charnames "greek";
     my @greek_small =  ("\N{alpha}" .. "\N{omega}");
@@ -1131,11 +1185,6 @@ you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
 L<experimental feature|perlrecharclass/Extended Bracketed Character
 Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
 
-Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
-return two elements in list context.
-
-    @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-
 =head2 Conditional Operator
 X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
 
diff --git a/pp_ctl.c b/pp_ctl.c
index 17d4f0d14a..2da942aa88 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -1178,7 +1178,8 @@ PP(pp_flip)
 
 /* This code tries to decide if "$left .. $right" should use the
    magical string increment, or if the range is numeric (we make
-   an exception for .."0" [#18165]). AMS 20021031. */
+   an exception for .."0" [#18165]). AMS 20021031.
+   See also [#133695] - the rules are now documented in perlop. */
 
 #define RANGE_IS_NUMERIC(left,right) ( \
 	SvNIOKp(left)  || (SvOK(left)  && !SvPOKp(left))  || \
diff --git a/t/op/range.t b/t/op/range.t
index 19ae1269ca..18eaa1fe0c 100644
--- a/t/op/range.t
+++ b/t/op/range.t
@@ -9,7 +9,7 @@ BEGIN {
 
 use Config;
 
-plan (146);
+plan (162);
 
 is(join(':',1..5), '1:2:3:4:5');
 
@@ -112,6 +112,28 @@ is(join(":","-4".."-0")    , "-4:-3:-2:-1:0");
 is(join(":","-4\n".."0\n") , "-4:-3:-2:-1:0");
 is(join(":","-4\n".."-0\n"), "-4:-3:-2:-1:0");
 
+# [#133695] document inconsistency between "0".."-1" and 0..-1
+is(join(":","-2".."-1")    , "-2:-1");
+is(join(":","-1".."-1")    , "-1");
+is(join(":", 0 .. -1 )     , "");
+is(join(":","0".."-1")     , "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","1".."-1")     , "");
+
+# these test the statements made in the documentation
+# regarding the rules of string ranges
+is(join(":","-2".."2"),      join(":",-2..2));
+is(join(":","2.18".."3.14"), "2:3");
+is(join(":","01".."04"),     "01:02:03:04");
+# "0".."-1" tested above
+is(join(":","00".."31"),     "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31");
+is(join(":","ax".."az"),     "ax:ay:az");
+is(join(":","*x".."az"),     "*x");
+is(join(":","A".."Z"),       "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z");
+is(join(":", 0..9,"a".."f"), "0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f");
+is(join(":","a".."--"),      join(":","a".."zz"));
+is(join(":","0".."xx"),      "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","aaa".."--"),    "");
+
 # undef should be treated as 0 for numerical range
 is(join(":",undef..2), '0:1:2');
 is(join(":",-2..undef), '-2:-1:0');
-- 
2.19.2

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 30, 2018

From @haukex

rt133695_rangeop_zero_B_change.patch
From cd2b39ae22f1a9e2090cea546da9a2c3884bf22e Mon Sep 17 00:00:00 2001
From: Hauke D <haukex@zero-g.net>
Date: Fri, 30 Nov 2018 13:06:07 +0100
Subject: [PATCH] (perl #133695) "0".."-1" should act like 0..-1

Previously, *any* string beginning with 0, including the string "0"
itself, would be subject to the magic string auto-increment, instead of
being treated like a number. This meant that "-2".."-1" was the same as
-2..-1 and "1".."-1" was the same as 1..-1, but "0".."-1" was the same
as "0".."99".

This patch fixes that inconsistency, while still allowing ranges like
"01".."31" to produce the strings "01", "02", ... "31", which is what
the "begins with 0" exception was intended for.

This patch also expands the documentation in perlop and states the rules
for the range operator in list context with both operands being strings
more explicitly.

See also #18165 and #18114.
---
 pod/perlop.pod | 84 +++++++++++++++++++++++++++++++++++++++-----------
 pp_ctl.c       | 10 ++++--
 t/op/range.t   | 23 +++++++++++++-
 3 files changed, 95 insertions(+), 22 deletions(-)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index d6adbd11f2..d4101ff544 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1081,26 +1081,81 @@ And now some examples as a list operator:
     @foo = @foo[0 .. $#foo];        # an expensive no-op
     @foo = @foo[$#foo-4 .. $#foo];  # slice last 5 items
 
-The range operator (in list context) makes use of the magical
-auto-increment algorithm if the operands are strings.  You
-can say
+Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
+return two elements in list context.
 
-    @alphabet = ("A" .. "Z");
+    @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
 
-to get all normal letters of the English alphabet, or
+The range operator in list context can make use of the magical
+auto-increment algorithm if both operands are strings, subject to the
+following rules:
 
-    $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+=over
+
+=item *
+
+With one exception (below), if both strings look like numbers to Perl,
+the magic increment will not be applied, and the strings will be treated
+as numbers (more specifically, integers) instead.
+
+For example, C<"-2".."2"> is the same as C<-2..2>, and
+C<"2.18".."3.14"> produces C<2, 3>.
 
-to get a hexadecimal digit, or
+=item *
+
+The exception to the above rule is when the left-hand string begins with
+C<0> and is longer than one character, in this case the magic increment
+I<will> be applied, even though strings like C<"01"> would normally look
+like a number to Perl.
+
+For example, C<"01".."04"> produces C<"01", "02", "03", "04">, and
+C<"00".."-1"> produces C<"00"> through C<"99"> - this may seem
+surprising, but see the following rules for why it works this way.
+To get dates with leading zeros, you can say:
 
     @z2 = ("01" .. "31");
     print $z2[$mday];
 
-to get dates with leading zeros.
+If you want to force strings to be interpreted as numbers, you could say
+
+    @numbers = ( 0+$first .. 0+$last );
+
+=item *
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
+only the initial value will be returned.
+
+For example, C<"ax".."az"> produces C<"ax", "ay", "az">, but
+C<"*x".."az"> produces only C<"*x">.
+
+=item *
+
+For other initial values that are strings that do follow the rules of the
+magical increment, the corresponding sequence will be returned.
+
+For example, you can say
+
+    @alphabet = ("A" .. "Z");
+
+to get all normal letters of the English alphabet, or
+
+    $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+
+to get a hexadecimal digit.
+
+=item *
 
 If the final value specified is not in the sequence that the magical
 increment would produce, the sequence goes until the next value would
-be longer than the final value specified.
+be longer than the final value specified. If the length of the final
+string is shorter than the first, the empty list is returned.
+
+For example, C<"a".."--"> is the same as C<"a".."zz">, C<"0".."xx">
+produces C<"0"> through C<"99">, and C<"aaa".."--"> returns the empty
+list.
+
+=back
 
 As of Perl 5.26, the list-context range operator on strings works as expected
 in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
@@ -1108,10 +1163,8 @@ in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
 that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
 depends on the internal encoding of the range endpoint.
 
-If the initial value specified isn't part of a magical increment
-sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
-only the initial value will be returned.  So the following will only
-return an alpha:
+Because the magical increment only works on non-empty strings matching
+C</^[a-zA-Z]*[0-9]*\z/>, the following will only return an alpha:
 
     use charnames "greek";
     my @greek_small =  ("\N{alpha}" .. "\N{omega}");
@@ -1131,11 +1184,6 @@ you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
 L<experimental feature|perlrecharclass/Extended Bracketed Character
 Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
 
-Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
-return two elements in list context.
-
-    @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-
 =head2 Conditional Operator
 X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
 
diff --git a/pp_ctl.c b/pp_ctl.c
index 17d4f0d14a..e820a9df02 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -1177,14 +1177,18 @@ PP(pp_flip)
 }
 
 /* This code tries to decide if "$left .. $right" should use the
-   magical string increment, or if the range is numeric (we make
-   an exception for .."0" [#18165]). AMS 20021031. */
+   magical string increment, or if the range is numeric. Initially,
+   an exception was made for *any* string beginning with "0" (see
+   [#18165], AMS 20021031), but now that is only applied when the
+   string's length is also >1 - see the rules now documented in
+   perlop [#133695] */
 
 #define RANGE_IS_NUMERIC(left,right) ( \
 	SvNIOKp(left)  || (SvOK(left)  && !SvPOKp(left))  || \
 	SvNIOKp(right) || (SvOK(right) && !SvPOKp(right)) || \
 	(((!SvOK(left) && SvOK(right)) || ((!SvOK(left) || \
-          looks_like_number(left)) && SvPOKp(left) && *SvPVX_const(left) != '0')) \
+          looks_like_number(left)) && SvPOKp(left) \
+          && !(*SvPVX_const(left) == '0' && SvCUR(left)>1 ) )) \
          && (!SvOK(right) || looks_like_number(right))))
 
 PP(pp_flop)
diff --git a/t/op/range.t b/t/op/range.t
index 19ae1269ca..2deefc61cf 100644
--- a/t/op/range.t
+++ b/t/op/range.t
@@ -9,7 +9,7 @@ BEGIN {
 
 use Config;
 
-plan (146);
+plan (162);
 
 is(join(':',1..5), '1:2:3:4:5');
 
@@ -112,6 +112,27 @@ is(join(":","-4".."-0")    , "-4:-3:-2:-1:0");
 is(join(":","-4\n".."0\n") , "-4:-3:-2:-1:0");
 is(join(":","-4\n".."-0\n"), "-4:-3:-2:-1:0");
 
+# [#133695] "0".."-1" should be the same as 0..-1
+is(join(":","-2".."-1")    , "-2:-1");
+is(join(":","-1".."-1")    , "-1");
+is(join(":","0".."-1")     , "");
+is(join(":","1".."-1")     , "");
+
+# these test the statements made in the documentation
+# regarding the rules of string ranges
+is(join(":","-2".."2"),      join(":",-2..2));
+is(join(":","2.18".."3.14"), "2:3");
+is(join(":","01".."04"),     "01:02:03:04");
+is(join(":","00".."-1"),     "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","00".."31"),     "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31");
+is(join(":","ax".."az"),     "ax:ay:az");
+is(join(":","*x".."az"),     "*x");
+is(join(":","A".."Z"),       "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z");
+is(join(":", 0..9,"a".."f"), "0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f");
+is(join(":","a".."--"),      join(":","a".."zz"));
+is(join(":","0".."xx"),      "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","aaa".."--"),    "");
+
 # undef should be treated as 0 for numerical range
 is(join(":",undef..2), '0:1:2');
 is(join(":",-2..undef), '-2:-1:0');
-- 
2.19.2

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 13, 2019

From @tonycoz

On Fri, 30 Nov 2018 06​:09​:07 -0800, haukex@​zero-g.net wrote​:

Hi,

Thanks for looking into this!

The code comment in the code you showed [1] mentions #18165 [2] which
references #18114 [3] where a reply by Slaven Rezic makes sense to me​:
'There is a special handling for numeric strings beginning with a "0".
This is to allow things like "01".."31" to preserve the leading zero
for one-digit numbers.' The basic behavior appears to go all the way
back to 5.000 [4].

[1]
https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179
[2] https://rt.perl.org/Public/Bug/Display.html?id=18165
[3] https://rt.perl.org/Public/Bug/Display.html?id=18114
[4] https://perl5.git.perl.org/perl.git/blob/refs/tags/perl-
5.000​:/pp_ctl.c#l694

So my interpretation of the rules is this​: If the left and right
operands are strings, then check if they looks_like_number. If they
do, treat them as integers. However, make an exception when the left-
hand side begins with "0", for the reason stated above.

The key word here is *begins* with zero; the condition
*SvPVX_const(left)!='0' causes this inconsistency​:

-3..-1 and "-3".."-1" are (-3,-2,-1)
-2..-1 and "-2".."-1" are (-2,-1)
-1..-1 and "-1".."-1" are (-1)
1..-1 and "1".."-1" are ()
however​:
0..-1 is () but "0".."-1" is (0..99)

That latter behavior may be in line with "01".."-1", which is
("01","02","03",...), but IMO it's still surprising, and in any case
the fact that strings that look like numbers are treated as such
appears to be undocumented.

I have two alternative proposals​: (A) leave the behavior as-is, but
document it, or (B) change the behavior so that the above condition is
'if the LHS is a string that begins with 0, except for the string "0"
itself' (and document it) - this would cause the "01".."31" case to
still work, but also cause "0".."-1" to act like 0..-1.

Patches for both A (just document) and B (change behavior) are
attached, with tests included (a full build passes all tests on my
end). My internals knowledge is quite limited so I hope my use of
SvCUR in the second patch is correct.

My personal preference is option B, since it gets rid of the above
inconsistency, but I understand that if there are worries about
backwards compatibility; option A may be better in that respect. The
way I've worded the documentation pretty much nails down the behavior
and wouldn't allow for future changes, a third option might be to word
the documentation more loosely and leave the door open for future
changes.

I think I prefer B too. It would be nice to find out what anyone else thinks.

Unfortunately I don't think I'd want to put a change in behaviour into core at this point in the release cycle.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 7, 2019

From @xenu

On Wed, 13 Feb 2019 15​:59​:02 -0800, tonyc wrote​:

On Fri, 30 Nov 2018 06​:09​:07 -0800, haukex@​zero-g.net wrote​:

Hi,

Thanks for looking into this!

The code comment in the code you showed [1] mentions #18165 [2] which
references #18114 [3] where a reply by Slaven Rezic makes sense to
me​:
'There is a special handling for numeric strings beginning with a
"0".
This is to allow things like "01".."31" to preserve the leading zero
for one-digit numbers.' The basic behavior appears to go all the way
back to 5.000 [4].

[1]
https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179
[2] https://rt.perl.org/Public/Bug/Display.html?id=18165
[3] https://rt.perl.org/Public/Bug/Display.html?id=18114
[4] https://perl5.git.perl.org/perl.git/blob/refs/tags/perl-
5.000​:/pp_ctl.c#l694

So my interpretation of the rules is this​: If the left and right
operands are strings, then check if they looks_like_number. If they
do, treat them as integers. However, make an exception when the left-
hand side begins with "0", for the reason stated above.

The key word here is *begins* with zero; the condition
*SvPVX_const(left)!='0' causes this inconsistency​:

-3..-1 and "-3".."-1" are (-3,-2,-1)
-2..-1 and "-2".."-1" are (-2,-1)
-1..-1 and "-1".."-1" are (-1)
1..-1 and "1".."-1" are ()
however​:
0..-1 is () but "0".."-1" is (0..99)

That latter behavior may be in line with "01".."-1", which is
("01","02","03",...), but IMO it's still surprising, and in any case
the fact that strings that look like numbers are treated as such
appears to be undocumented.

I have two alternative proposals​: (A) leave the behavior as-is, but
document it, or (B) change the behavior so that the above condition
is
'if the LHS is a string that begins with 0, except for the string "0"
itself' (and document it) - this would cause the "01".."31" case to
still work, but also cause "0".."-1" to act like 0..-1.

Patches for both A (just document) and B (change behavior) are
attached, with tests included (a full build passes all tests on my
end). My internals knowledge is quite limited so I hope my use of
SvCUR in the second patch is correct.

My personal preference is option B, since it gets rid of the above
inconsistency, but I understand that if there are worries about
backwards compatibility; option A may be better in that respect. The
way I've worded the documentation pretty much nails down the behavior
and wouldn't allow for future changes, a third option might be to
word
the documentation more loosely and leave the door open for future
changes.

I think I prefer B too. It would be nice to find out what anyone else
thinks.

Unfortunately I don't think I'd want to put a change in behaviour into
core at this point in the release cycle.

Tony

Now we're in a brand new release cycle, so I think it's time to revisit this ticket.

Personally, I think that the option B is better, it's unlikely that anything relies on the current (broken) behaviour.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 8, 2019

From @tonycoz

On Tue, 06 Aug 2019 23​:58​:10 -0700, me@​xenu.pl wrote​:

On Wed, 13 Feb 2019 15​:59​:02 -0800, tonyc wrote​:

On Fri, 30 Nov 2018 06​:09​:07 -0800, haukex@​zero-g.net wrote​:

Hi,

Thanks for looking into this!

The code comment in the code you showed [1] mentions #18165 [2]
which
references #18114 [3] where a reply by Slaven Rezic makes sense to
me​:
'There is a special handling for numeric strings beginning with a
"0".
This is to allow things like "01".."31" to preserve the leading
zero
for one-digit numbers.' The basic behavior appears to go all the
way
back to 5.000 [4].

[1]
https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179
[2] https://rt.perl.org/Public/Bug/Display.html?id=18165
[3] https://rt.perl.org/Public/Bug/Display.html?id=18114
[4] https://perl5.git.perl.org/perl.git/blob/refs/tags/perl-
5.000​:/pp_ctl.c#l694

So my interpretation of the rules is this​: If the left and right
operands are strings, then check if they looks_like_number. If they
do, treat them as integers. However, make an exception when the
left-
hand side begins with "0", for the reason stated above.

The key word here is *begins* with zero; the condition
*SvPVX_const(left)!='0' causes this inconsistency​:

-3..-1 and "-3".."-1" are (-3,-2,-1)
-2..-1 and "-2".."-1" are (-2,-1)
-1..-1 and "-1".."-1" are (-1)
1..-1 and "1".."-1" are ()
however​:
0..-1 is () but "0".."-1" is (0..99)

That latter behavior may be in line with "01".."-1", which is
("01","02","03",...), but IMO it's still surprising, and in any
case
the fact that strings that look like numbers are treated as such
appears to be undocumented.

I have two alternative proposals​: (A) leave the behavior as-is, but
document it, or (B) change the behavior so that the above condition
is
'if the LHS is a string that begins with 0, except for the string
"0"
itself' (and document it) - this would cause the "01".."31" case to
still work, but also cause "0".."-1" to act like 0..-1.

Patches for both A (just document) and B (change behavior) are
attached, with tests included (a full build passes all tests on my
end). My internals knowledge is quite limited so I hope my use of
SvCUR in the second patch is correct.

My personal preference is option B, since it gets rid of the above
inconsistency, but I understand that if there are worries about
backwards compatibility; option A may be better in that respect.
The
way I've worded the documentation pretty much nails down the
behavior
and wouldn't allow for future changes, a third option might be to
word
the documentation more loosely and leave the door open for future
changes.

I think I prefer B too. It would be nice to find out what anyone
else
thinks.

Unfortunately I don't think I'd want to put a change in behaviour
into
core at this point in the release cycle.

Tony

Now we're in a brand new release cycle, so I think it's time to
revisit this ticket.

Personally, I think that the option B is better, it's unlikely that
anything relies on the current (broken) behaviour.

I've applied to the B version to blead, so we should find out if anything depends on the old behaviour.

Leaving open for now.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 27, 2019

From @tonycoz

On Wed, 07 Aug 2019 18​:19​:54 -0700, tonyc wrote​:

On Tue, 06 Aug 2019 23​:58​:10 -0700, me@​xenu.pl wrote​:

On Wed, 13 Feb 2019 15​:59​:02 -0800, tonyc wrote​:

On Fri, 30 Nov 2018 06​:09​:07 -0800, haukex@​zero-g.net wrote​:

Hi,

Thanks for looking into this!

The code comment in the code you showed [1] mentions #18165 [2]
which
references #18114 [3] where a reply by Slaven Rezic makes sense
to
me​:
'There is a special handling for numeric strings beginning with a
"0".
This is to allow things like "01".."31" to preserve the leading
zero
for one-digit numbers.' The basic behavior appears to go all the
way
back to 5.000 [4].

[1]
https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179
[2] https://rt.perl.org/Public/Bug/Display.html?id=18165
[3] https://rt.perl.org/Public/Bug/Display.html?id=18114
[4] https://perl5.git.perl.org/perl.git/blob/refs/tags/perl-
5.000​:/pp_ctl.c#l694

So my interpretation of the rules is this​: If the left and right
operands are strings, then check if they looks_like_number. If
they
do, treat them as integers. However, make an exception when the
left-
hand side begins with "0", for the reason stated above.

The key word here is *begins* with zero; the condition
*SvPVX_const(left)!='0' causes this inconsistency​:

-3..-1 and "-3".."-1" are (-3,-2,-1)
-2..-1 and "-2".."-1" are (-2,-1)
-1..-1 and "-1".."-1" are (-1)
1..-1 and "1".."-1" are ()
however​:
0..-1 is () but "0".."-1" is (0..99)

That latter behavior may be in line with "01".."-1", which is
("01","02","03",...), but IMO it's still surprising, and in any
case
the fact that strings that look like numbers are treated as such
appears to be undocumented.

I have two alternative proposals​: (A) leave the behavior as-is,
but
document it, or (B) change the behavior so that the above
condition
is
'if the LHS is a string that begins with 0, except for the string
"0"
itself' (and document it) - this would cause the "01".."31" case
to
still work, but also cause "0".."-1" to act like 0..-1.

Patches for both A (just document) and B (change behavior) are
attached, with tests included (a full build passes all tests on
my
end). My internals knowledge is quite limited so I hope my use of
SvCUR in the second patch is correct.

My personal preference is option B, since it gets rid of the
above
inconsistency, but I understand that if there are worries about
backwards compatibility; option A may be better in that respect.
The
way I've worded the documentation pretty much nails down the
behavior
and wouldn't allow for future changes, a third option might be to
word
the documentation more loosely and leave the door open for future
changes.

I think I prefer B too. It would be nice to find out what anyone
else
thinks.

Unfortunately I don't think I'd want to put a change in behaviour
into
core at this point in the release cycle.

Tony

Now we're in a brand new release cycle, so I think it's time to
revisit this ticket.

Personally, I think that the option B is better, it's unlikely that
anything relies on the current (broken) behaviour.

I've applied to the B version to blead, so we should find out if
anything depends on the old behaviour.

Leaving open for now.

Closing.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 27, 2019

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT p5pRT closed this Aug 27, 2019
@p5pRT p5pRT added the Severity Low label Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.