Skip to content

Commit

Permalink
Document use of pos() and /\G/
Browse files Browse the repository at this point in the history
Subject: Re: resetting pos broken in _20 

On Mon, 13 Jan 1997 12:49:24 EST, Ilya Zakharevich wrote:
>Gurusamy Sarathy writes:
>>  What's wrong with saying
>> C<pos $foo = length $foo> after /g fails, to get the behavior
>> you want?
>
>Since this has different semantics. You need to get `pos' before each
>match, and reset it after each failing match.
>
> /=/g; /;/g; /=/g; /;/g;
>
>may give you non-monotoneous movement of `pos' over the string, which
>is a bad thing.

Ahh, of course.

>But I still do not understand what you mean by "having pos at
>end". The bug was that position is reset at failing match, probably
>you have some other case in mind?

Never mind, I was missing the possibility of chaining //g matches
with the \G escape :-(

>I did not realize that pos was available at perl 4.?, bug-for-bug
>compatibility may be a reason if this was so for so many years...

The bug fix seems to make a lot sense (to me) now.  \G was essentially
useless without the new "incompatiblity", eh?

Here's a pod update that documents current behavior in all the
places I could think of.

 - Sarathy.
   gsar@engin.umich.edu

p5p-msgid: <199701132013.PAA26606@aatma.engin.umich.edu>
  • Loading branch information
Gurusamy Sarathy authored and Chip Salzenberg committed Jan 15, 1997
1 parent 7c36043 commit b2a07c1
Show file tree
Hide file tree
Showing 5 changed files with 67 additions and 4 deletions.
4 changes: 3 additions & 1 deletion pod/perlfunc.pod
Expand Up @@ -2132,7 +2132,9 @@ like shift().

Returns the offset of where the last C<m//g> search left off for the variable
is in question ($_ is used when the variable is not specified). May be
modified to change that offset.
modified to change that offset. Such modification will also influence
the C<\G> zero-width assertion in regular expressions. See L<perlre> and
L<perlop>.

=item print FILEHANDLE LIST

Expand Down
13 changes: 12 additions & 1 deletion pod/perlnews.pod
Expand Up @@ -23,7 +23,8 @@ file in the distribution for details.
There is a new Configure question that asks if you want to maintain
binary compatibility with Perl 5.003. If you choose binary
compatibility, you do not have to recompile your extensions, but you
might have symbol conflicts if you embed Perl in another application.
might have symbol conflicts if you embed Perl in another application,
just as in the 5.003 release.

=head2 New Opcode Module and Revised Safe Module

Expand Down Expand Up @@ -186,6 +187,16 @@ function whose prototype you want to retrieve.
Functions documented in the Camel to default to $_ now in
fact do, and all those that do are so documented in L<perlfunc>.

=head2 C<m//g> does not trigger a pos() reset on failure

The C<m//g> match iteration construct used to reset the iteration
when it failed to match (so that the next C<m//g> match would start at
the beginning of the string). You now have to explicitly do a
C<pos $str = 0;> to reset the "last match" position, or modify the
string in some way. This change makes it practical to chain C<m//g>
matches together in conjunction with ordinary matches using the C<\G>
zero-width assertion. See L<perlop> and L<perlre>.

=back

=head2 New Built-in Methods
Expand Down
29 changes: 28 additions & 1 deletion pod/perlop.pod
Expand Up @@ -695,7 +695,10 @@ In a scalar context, C<m//g> iterates through the string, returning TRUE
each time it matches, and FALSE when it eventually runs out of
matches. (In other words, it remembers where it left off last time and
restarts the search at that point. You can actually find the current
match position of a string using the pos() function--see L<perlfunc>.)
match position of a string or set it using the pos() function--see
L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
matches or intermix C<m//g> matches with C<m/\G.../>.

If you modify the string in any way, the match position is reset to the
beginning. Examples:

Expand All @@ -711,6 +714,30 @@ beginning. Examples:
}
print "$sentences\n";

# using m//g with \G
$_ = "ppooqppq";
while ($i++ < 2) {
print "1: '";
print $1 while /(o)/g; print "', pos=", pos, "\n";
print "2: '";
print $1 if /\G(q)/; print "', pos=", pos, "\n";
print "3: '";
print $1 while /(p)/g; print "', pos=", pos, "\n";
}

The last example should print:

1: 'oo', pos=4
2: 'q', pos=4
3: 'pp', pos=7
1: '', pos=7
2: 'q', pos=7
3: '', pos=7

Note how C<m//g> matches change the value reported by C<pos()>, but the
non-global match doesn't.


=item q/STRING/

=item C<'STRING'>
Expand Down
5 changes: 4 additions & 1 deletion pod/perlre.pod
Expand Up @@ -174,7 +174,10 @@ represents backspace rather than a word boundary.) The C<\A> and C<\Z> are
just like "^" and "$" except that they won't match multiple times when the
C</m> modifier is used, while "^" and "$" will match at every internal line
boundary. To match the actual end of the string, not ignoring newline,
you can use C<\Z(?!\n)>.
you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global
matches (using C<m//g>) and non-global ones, as described in L<perlop>.
The actual location where C<\G> will match can also be influenced
by using C<pos()> as an lvalue. See L<perlfunc/pos>.

When the bracketing construct C<( ... )> is used, \E<lt>digitE<gt> matches the
digit'th substring. Outside of the pattern, always use "$" instead of "\"
Expand Down
20 changes: 20 additions & 0 deletions pod/perltrap.pod
Expand Up @@ -1108,6 +1108,26 @@ repeatedly, like C</x/> or C<m!x!>.
# perl5 prints: perl5


=item * Regular Expression

Under perl4 and upto version 5.003, a failed C<m//g> match used to
reset the internal iterator, so that subsequent C<m//g> match attempts
began from the beginning of the string. In perl version 5.004 and later,
failed C<m//g> matches do not reset the iterator position (which can be
found using the C<pos()> function--see L<perlfunc/pos>).

$test = "foop";
for (1..3) {
print $1 while ($test =~ /(o)/g);
# pos $test = 0; # to get old behavior
}

# perl4 prints: oooooo
# perl5.004 prints: oo

You may always reset the iterator yourself as shown in the commented line
to get the old behavior.

=back

=head2 Subroutine, Signal, Sorting Traps
Expand Down

0 comments on commit b2a07c1

Please sign in to comment.