each() can go bad if the callback calls keys or values #210

Closed
schwern opened this Issue Nov 9, 2011 · 15 comments

Comments

Projects
None yet
2 participants
@schwern
Contributor

schwern commented Nov 9, 2011

%hash->each can go into an infinite loop if keys or values is called on %hash inside the callback.

use perl5i::2;
use Test::More;

my %hash = (
    foo => 23,
    bar => 42,
    baz => 99
);

my $i = 0;
note sprintf "This should run %d times", scalar keys %hash;
%hash->each(func($k,$v) {
    pass();
    keys %hash;
    die "We're in an infinite loop" if $i++ > 10;
});

done_testing(3);

The only solution I can see is to have our each to do its own hash iteration. This likely requires XS. Ideally it would be its own CPAN module.

@schwern

This comment has been minimized.

Show comment Hide comment
@schwern

schwern Jul 19, 2012

Contributor

It also goes bad if you nest %hash->each inside %hash->each.

Contributor

schwern commented Jul 19, 2012

It also goes bad if you nest %hash->each inside %hash->each.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 4, 2013

Contributor

This shouldn't need to be hard. I am assuming that the reason ti goes bad is because it uses http://perldoc.perl.org/functions/each.html under the hood which uses an iterator that is part of the hash, and re-uses the same iterator anywhere it is called.

However a pure perl implementation of each() can solve this problem easily:

  sub each(...) {
    my ( $hashref, $callback ) = @_;
    my @keys = keys @$hashref;
    for my $key ( @keys ) {
      $callback->( $key, $hashref->{$key} );
    }
  }

Whatever magic currently attaches each() to the hash could attach that pure perl snippet instead of the builtin and easily solve the problem.

There are of course a lot of assumptions to my comment, I have no idea how autobox actually works, or how hard it would be to put this snippet into place. I also don't know what performance difference there is in the PP solution verse the built-in iterator.

Contributor

exodist commented Mar 4, 2013

This shouldn't need to be hard. I am assuming that the reason ti goes bad is because it uses http://perldoc.perl.org/functions/each.html under the hood which uses an iterator that is part of the hash, and re-uses the same iterator anywhere it is called.

However a pure perl implementation of each() can solve this problem easily:

  sub each(...) {
    my ( $hashref, $callback ) = @_;
    my @keys = keys @$hashref;
    for my $key ( @keys ) {
      $callback->( $key, $hashref->{$key} );
    }
  }

Whatever magic currently attaches each() to the hash could attach that pure perl snippet instead of the builtin and easily solve the problem.

There are of course a lot of assumptions to my comment, I have no idea how autobox actually works, or how hard it would be to put this snippet into place. I also don't know what performance difference there is in the PP solution verse the built-in iterator.

@schwern

This comment has been minimized.

Show comment Hide comment
@schwern

schwern Mar 4, 2013

Contributor

@exodist You're right that your code would be easy to implement, autobox means you're basically writing a method and what you wrote would basically work. See perl5i::2::HASH for example.

The main feature of each is it does not make a copy of all the keys in the hash, avoiding consuming an extra O(N) memory. If our each makes a copy, it defeats the point.

Contributor

schwern commented Mar 4, 2013

@exodist You're right that your code would be easy to implement, autobox means you're basically writing a method and what you wrote would basically work. See perl5i::2::HASH for example.

The main feature of each is it does not make a copy of all the keys in the hash, avoiding consuming an extra O(N) memory. If our each makes a copy, it defeats the point.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 4, 2013

Contributor

http://cpansearch.perl.org/src/AMS/Storable-2.30/Storable.xs

grep the code for 'itera' seems they store the iterator, do their thing, then restore the old value. This would be a good example for writing XS extensions that let you get (as a scalar) and set (as a scalar) the iterator value of a hash. With these methods on a hash one could use this:

sub each() {
  my ( $hash, $callback ) = @_;
  my $old_iter = $hash->iterator();
  my $i = 0;
  while ( my $key = $hash->iter_advance( \$i )) {
    $callback->( $key, $hash->{$key} );
  }
}

iter_advance would be code that sets the iterator to the value in the ref passed in, advances it, sets the iterator to the next value, then returns the key for the iterator that was passed in. Since $i tracks the iterator and stores the value after each step, state is not corrupted by other uses of keys and each.

There are still a lot of unknowns here, it is very hand-wavy... but hopefully it is a good launch point for someone that wants to grab this. If I have any free time in the near future I may try,never done XS before.

Contributor

exodist commented Mar 4, 2013

http://cpansearch.perl.org/src/AMS/Storable-2.30/Storable.xs

grep the code for 'itera' seems they store the iterator, do their thing, then restore the old value. This would be a good example for writing XS extensions that let you get (as a scalar) and set (as a scalar) the iterator value of a hash. With these methods on a hash one could use this:

sub each() {
  my ( $hash, $callback ) = @_;
  my $old_iter = $hash->iterator();
  my $i = 0;
  while ( my $key = $hash->iter_advance( \$i )) {
    $callback->( $key, $hash->{$key} );
  }
}

iter_advance would be code that sets the iterator to the value in the ref passed in, advances it, sets the iterator to the next value, then returns the key for the iterator that was passed in. Since $i tracks the iterator and stores the value after each step, state is not corrupted by other uses of keys and each.

There are still a lot of unknowns here, it is very hand-wavy... but hopefully it is a good launch point for someone that wants to grab this. If I have any free time in the near future I may try,never done XS before.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 4, 2013

Contributor

we might be able to make it safer simply by having the each, keys, values, and as_list methods all store and restore the iterator value after each use, then it is safe to nest them all, so long as you do not combine them with non-autoboxed versions.

Contributor

exodist commented Mar 4, 2013

we might be able to make it safer simply by having the each, keys, values, and as_list methods all store and restore the iterator value after each use, then it is safe to nest them all, so long as you do not combine them with non-autoboxed versions.

@schwern

This comment has been minimized.

Show comment Hide comment
@schwern

schwern Mar 5, 2013

Contributor

Good sleuthing! Progress!

Here's the magic @exodist referenced from Storable.xs...

/*
 * Save possible iteration state via each() on that table.
 */
riter = HvRITER_get(hv);
eiter = HvEITER_get(hv);
hv_iterinit(hv);

/* Restore hash iterator state */
HvRITER_set(hv, riter);
HvEITER_set(hv, eiter);

I checked hv.h and those macros are available in 5.10.0.

I like the idea of wrapping hv_iter_save() and hv_iter_restore() around that XS code and saving and restoring. This avoids having to write all of each in XS and, as @exodist pointed out, makes it available for other methods and should work with nested hash iteration method calls.

The caveat about this not mixing with keys/values is unfortunate, but already true. Its a shame there's not a way to work with the iterator directly.

Contributor

schwern commented Mar 5, 2013

Good sleuthing! Progress!

Here's the magic @exodist referenced from Storable.xs...

/*
 * Save possible iteration state via each() on that table.
 */
riter = HvRITER_get(hv);
eiter = HvEITER_get(hv);
hv_iterinit(hv);

/* Restore hash iterator state */
HvRITER_set(hv, riter);
HvEITER_set(hv, eiter);

I checked hv.h and those macros are available in 5.10.0.

I like the idea of wrapping hv_iter_save() and hv_iter_restore() around that XS code and saving and restoring. This avoids having to write all of each in XS and, as @exodist pointed out, makes it available for other methods and should work with nested hash iteration method calls.

The caveat about this not mixing with keys/values is unfortunate, but already true. Its a shame there's not a way to work with the iterator directly.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 5, 2013

Contributor

The questions is, do we want:

$val = hash->iter_get()
and hash->iter_set($val);
and hash->iter_init;

or do we want:
$hash->iter_push
$hash->iter_pop

where push stores the current iterator in a stack internal to the hash, re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if something happens in code down from yours you do not have to worry about mismatched push and pop. The benefit of the second is you don't have to worry about the iterator value stored in the scalar getting corrupted (that is assuming the iterator value can work as a scalar at all, looks like a 32-bit unsigned integer, but I am guessing it stores a hash value, not an index, if someone tried to use ++ on the value what would happen, would it be valid?

Contributor

exodist commented Mar 5, 2013

The questions is, do we want:

$val = hash->iter_get()
and hash->iter_set($val);
and hash->iter_init;

or do we want:
$hash->iter_push
$hash->iter_pop

where push stores the current iterator in a stack internal to the hash, re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if something happens in code down from yours you do not have to worry about mismatched push and pop. The benefit of the second is you don't have to worry about the iterator value stored in the scalar getting corrupted (that is assuming the iterator value can work as a scalar at all, looks like a 32-bit unsigned integer, but I am guessing it stores a hash value, not an index, if someone tried to use ++ on the value what would happen, would it be valid?

@schwern

This comment has been minimized.

Show comment Hide comment
@schwern

schwern Mar 5, 2013

Contributor

I think we want the latter. It's safer, easier to manage and associates the
iterator stack with the hash.

Fortunately, we have a mechanism to store meta data on anything without
putting it in that thing. So we don't need to store the iterator stack in
the hash. This is currently used to store signatures with code references.
Long story short, we're using the field hash mechanism for inside out
objects.

If @exodist can take care of writing the XS functions to get and set a hash
iterator, I can take care of the autobox and meta data magic.
On Mar 5, 2013 12:29 PM, "Chad Granum" notifications@github.com wrote:

The questions is, do we want:

$val = hash->iter_get()
and hash->iter_set($val);
and hash->iter_init;

or do we want:
$hash->iter_push
$hash->iter_pop

where push stores the current iterator in a stack internal to the hash,
re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if
something happens in code down from yours you do not have to worry about
mismatched push and pop. The benefit of the second is you don't have to
worry about the iterator value stored in the scalar getting corrupted (that
is assuming the iterator value can work as a scalar at all, looks like a
32-bit unsigned integer, but I am guessing it stores a hash value, not an
index, if someone tried to use ++ on the value what would happen, would it
be valid?


Reply to this email directly or view it on GitHubhttps://github.com/schwern/perl5i/issues/210#issuecomment-14463264
.

Contributor

schwern commented Mar 5, 2013

I think we want the latter. It's safer, easier to manage and associates the
iterator stack with the hash.

Fortunately, we have a mechanism to store meta data on anything without
putting it in that thing. So we don't need to store the iterator stack in
the hash. This is currently used to store signatures with code references.
Long story short, we're using the field hash mechanism for inside out
objects.

If @exodist can take care of writing the XS functions to get and set a hash
iterator, I can take care of the autobox and meta data magic.
On Mar 5, 2013 12:29 PM, "Chad Granum" notifications@github.com wrote:

The questions is, do we want:

$val = hash->iter_get()
and hash->iter_set($val);
and hash->iter_init;

or do we want:
$hash->iter_push
$hash->iter_pop

where push stores the current iterator in a stack internal to the hash,
re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if
something happens in code down from yours you do not have to worry about
mismatched push and pop. The benefit of the second is you don't have to
worry about the iterator value stored in the scalar getting corrupted (that
is assuming the iterator value can work as a scalar at all, looks like a
32-bit unsigned integer, but I am guessing it stores a hash value, not an
index, if someone tried to use ++ on the value what would happen, would it
be valid?


Reply to this email directly or view it on GitHubhttps://github.com/schwern/perl5i/issues/210#issuecomment-14463264
.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 5, 2013

Contributor

It is doubtful I will have time to do this before the weekend, and I have never touched xs before, but I will see what I can do.

Contributor

exodist commented Mar 5, 2013

It is doubtful I will have time to do this before the weekend, and I have never touched xs before, but I will see what I can do.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 7, 2013

Contributor

I played around with this last night to great success. Tonight (after work) I will finish it into a library I will put on cpan that does what we need for this, then you can pop it into perl5i's autoboxing.

Contributor

exodist commented Mar 7, 2013

I played around with this last night to great success. Tonight (after work) I will finish it into a library I will put on cpan that does what we need for this, then you can pop it into perl5i's autoboxing.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 8, 2013

Contributor

each() is fixed in 6a744ea however it is dependent on Hash::StoredIterator which I just now uploaded to cpan, it might take a bit for it to filter in. You can download it here in the meantime: https://github.com/exodist/Hash-StoredIterator

Contributor

exodist commented Mar 8, 2013

each() is fixed in 6a744ea however it is dependent on Hash::StoredIterator which I just now uploaded to cpan, it might take a bit for it to filter in. You can download it here in the meantime: https://github.com/exodist/Hash-StoredIterator

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 8, 2013

Contributor

I am not closing the ticket because:

Contributor

exodist commented Mar 8, 2013

I am not closing the ticket because:

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 8, 2013

Contributor

To be clear, here are my docs of the edge case:

sort() edge case
For some reason "[sort hkeys %hash]" and "[sort hkeys(%hash)]" both
result in a list that has all the keys and values (and strangly not
in sorted order). However "[sort(hkeys(%hash))]" works fine.

Contributor

exodist commented Mar 8, 2013

To be clear, here are my docs of the edge case:

sort() edge case
For some reason "[sort hkeys %hash]" and "[sort hkeys(%hash)]" both
result in a list that has all the keys and values (and strangly not
in sorted order). However "[sort(hkeys(%hash))]" works fine.

@schwern

This comment has been minimized.

Show comment Hide comment
@schwern

schwern Mar 8, 2013

Contributor

Great work! Looking forward to nipping this problem once and for all.

Contributor

schwern commented Mar 8, 2013

Great work! Looking forward to nipping this problem once and for all.

@exodist

This comment has been minimized.

Show comment Hide comment
@exodist

exodist Mar 8, 2013

Contributor

It suddenly occured to me why the edge case occurs. sort doesn't just take a block as a special use-case, it can also be sort FUNC_NAME items... or sort FUNC_NAME (items), so the special case is trying to use hkeys as the sort function, and %hash as items. The parser does not do this for sort keys %hash because keys is a keyword, not a function. I don't think my edge case is a bug to be fixed (not sure it could be fixed either)

I will close this ticket because perl5i doesn't suffer from this edge case since it uses each as a method.

Contributor

exodist commented Mar 8, 2013

It suddenly occured to me why the edge case occurs. sort doesn't just take a block as a special use-case, it can also be sort FUNC_NAME items... or sort FUNC_NAME (items), so the special case is trying to use hkeys as the sort function, and %hash as items. The parser does not do this for sort keys %hash because keys is a keyword, not a function. I don't think my edge case is a bug to be fixed (not sure it could be fixed either)

I will close this ticket because perl5i doesn't suffer from this edge case since it uses each as a method.

@exodist exodist closed this Mar 8, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment