Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix each() to be attached to the operator, not the data #142

Closed
schwern opened this Issue · 11 comments

2 participants

@schwern
Owner

each() is dangerous because the iterator is attached to the data being operated on. For example...

while( my($k,$v) = each %h ) {
    foo(\%h);
}

sub foo {
    my $h = shift;

    while( my($k, $v) = each %$h ) {
        ...
    }
}

Both calls to each() share the same iterator. They should not. The iterator should be attached to the operator. How do you do this? One possibility is to use Devel::Declare to replace "each %hash" with "safe_each 1234, \%hash" where 1234 is a unique identifier for that call to safe each. The iterator is stored by %hash->mo->id and each ident.

@cowens

This should be trivial, he said with the air of man about to try something really difficult. I see two cases to deal with: /each \s* (?: % | @ )/x and /each \s* \( \s* (?: % | @ )/. Does Devel::Declare grab methods (i.e. $foo->each)?

@schwern
Owner

No, I don't think it does. You should be able to just catch the keyword, replace it with "perl5i::2::safe_each $identifier, " where perl5i::2::safe_each has a prototype of $\%.

Problem is, I don't think Devel::Declare will pick up a keyword.

@cowens

Ah, because it picks up keywords that haven't been defined yet. Yes, I think I remember something about that as well. Maybe the Pluggable Keywords will allow it. Must spend some time investigating this.

@cowens

Ah, caller may be the answer, override CORE::each and use caller to determine the location of the call to each. Use the location as the index to the iterator. How does that sound? Unfortunately, caller can only return file and line location information not character position, so two eachs on the same line would have the same iterator. Of course, we don't have to limit ourselves to just what Perl can do; we can use XS. Do you know how warn gets the character offset in the messages that have them?

@cowens

Stumbling block: keys and values are documented as reseting the each iterator. How will they know which one to reset? If we document that they don't reset the each iterator, how will can you reset it? You don't know the magic number being used to find the iterator. Maybe it would be better to just use %h->each(ID). This would also have the benefit of letting you still doing things like iterate up to point A in the hash in one loop and from point A to the end in another like you can do today. Backwards compatibility would be maintained by making a call to each without an ID use a default ID. We would also need to change keys and values to take the same ID. I think this puts it well within what I can do, I am going to try to have a working prototype by Monday.

@schwern
Owner

I'd say change keys and values to not reset the iterator. Once the iterator is tied to the position and reference, most of the use cases for resetting it goes away. I can think of one edge case. Let's say you have a subroutine which contains...

while( my($k, $v) = each $hashref ) {
    last if $k eq 'foo';
}

and the same hash ref is passed to it twice. The second time through it will have an unreset iterator. Its a shame there can't be an implied reset before the loop, I think that requires making too many assumptions.

I don't like %h->each(ID) because how do you pick an ID that isn't going to clash with someone else? And its ugly. I like your idea of using the file and line number better.

Its also worth considering that the major use case for each goes away once we have nfor and can write nfor my($k, $v) (%hash), so maybe this isn't worthwhile.

I think what each exposes is that iteration needs a way to know when its stopped. That implies a block syntax. %hash->each { ... } Then when the loop terminates, either from running out of items or via last, the iterator can be reset.

@cowens

One of each's strengths is that I can break up the iteration. For instance, I use code like this when I write benchmarks:

my ($first_name, $sub) = each %subs;
my $result = $sub->();
while (my ($name, $sub) = each %subs) {
    my $cur_result = $sub->();
    unless ($result ~~ $cur_result) {
        print "$name doesn't agree with $first_name\n";
    }
} 

This relies on being able to use the same iterator for both of the eaches; if they are not, then I run one of the functions twice. Fixing each would break this code.

Perhaps we could return the iterator number in list context:

my ($first_name, $sub, $iter) = %subs->each;
my $result = $sub->();
while (my ($name, $sub) = %subs->each($iter)) {
    my $cur_result = $sub->();
    unless ($result ~~ $cur_result) {
        print "$name doesn't agree with $first_name\n";
    }
} 
@schwern
Owner

I think we should split the use cases. An iterator associated with the data structure for doing Clever Things and one that's associated with the data + call for Mere Mortals.

@cowens

Here is a proof of concept. There are at least three problems with it: we have to use "0e0" to get around the while loop, the hash implementation is incredibly inefficient and pretty much defeats the purpose of each, and if you call each twice on the same line with the same variable you get the same iterator. We also need to go back and determine exactly what calling delete does to the iterator and what happens when we add keys to hashes or splice onto an array. There may also be problems related to using refaddr as part of the key (I can't remember if variables reuse addresses) and if we can't use refaddr then there can only be one call to each per line.

edit: minor fix to use 0 when in list context

#!/usr/bin/perl

use 5.012;
use warnings;
use Scalar::Util qw/refaddr reftype/;

use subs "each";

sub each(\[@%]) {
    goto &safe_each_hash  if "HASH" eq reftype $_[0];
    goto &safe_each_array;
}

sub safe_each_hash {
    my $h    = shift;
    my $key  = join "/", (caller)[1,2], refaddr $h;
    state %iter;

    unless (exists $iter{$key}) {
        $iter{$key} = [ keys %$h ];
    }

    unless (@{$iter{$key}}) {
        delete $iter{$key};
        return;
    }

    my $cur = shift @{$iter{$key}};

    return wantarray ? ($cur, $h->{$cur}) : $cur;
}

sub safe_each_array {
    my $array = shift;
    my $key   = join "/", (caller)[1,2], refaddr $array;
    state %iter;

    if (exists $iter{$key}) {
        $iter{$key}++;
    } else {
        $iter{$key} = "0e0";
    }

    if ($iter{$key} >= @$array) {
        delete $iter{$key};
        return;
    }

    return wantarray ? ($iter{$key}, 0+$array->[$iter{$key}]) : $iter{$key};
}

my %h = (a => 1, b => 2);
while (my $k = each %h) {
    say "key $k";
    while (my ($k, $v) = each %h) {
        say "\t$k => $v";
    }
}

my @a = qw/a b c/;

while (my $index = each @a) {
    say "index $index";
    while (my ($index, $value) = each @a) {
        say "\t$index $value";
    }
}
@schwern
Owner

After using jQuery, thinking about subroutine signatures and seeing the complexity of fixing each() I think this will be much better served by a method with a callback.

%hash->each( def($k, $v) {
    say "$k => $v";
});

Its simple and clear. We just need anonymous functions with signatures which are on the way.

@schwern
Owner

That should do it, I'm going to call this done.

@schwern schwern closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.