Skip to content

each() can go bad if the callback calls keys or values #210

schwern opened this Issue Nov 9, 2011 · 15 comments

2 participants

schwern commented Nov 9, 2011

%hash->each can go into an infinite loop if keys or values is called on %hash inside the callback.

use perl5i::2;
use Test::More;

my %hash = (
    foo => 23,
    bar => 42,
    baz => 99

my $i = 0;
note sprintf "This should run %d times", scalar keys %hash;
%hash->each(func($k,$v) {
    keys %hash;
    die "We're in an infinite loop" if $i++ > 10;


The only solution I can see is to have our each to do its own hash iteration. This likely requires XS. Ideally it would be its own CPAN module.

schwern commented Jul 19, 2012

It also goes bad if you nest %hash->each inside %hash->each.

exodist commented Mar 4, 2013

This shouldn't need to be hard. I am assuming that the reason ti goes bad is because it uses under the hood which uses an iterator that is part of the hash, and re-uses the same iterator anywhere it is called.

However a pure perl implementation of each() can solve this problem easily:

  sub each(...) {
    my ( $hashref, $callback ) = @_;
    my @keys = keys @$hashref;
    for my $key ( @keys ) {
      $callback->( $key, $hashref->{$key} );

Whatever magic currently attaches each() to the hash could attach that pure perl snippet instead of the builtin and easily solve the problem.

There are of course a lot of assumptions to my comment, I have no idea how autobox actually works, or how hard it would be to put this snippet into place. I also don't know what performance difference there is in the PP solution verse the built-in iterator.

schwern commented Mar 4, 2013

@exodist You're right that your code would be easy to implement, autobox means you're basically writing a method and what you wrote would basically work. See perl5i::2::HASH for example.

The main feature of each is it does not make a copy of all the keys in the hash, avoiding consuming an extra O(N) memory. If our each makes a copy, it defeats the point.

exodist commented Mar 4, 2013

grep the code for 'itera' seems they store the iterator, do their thing, then restore the old value. This would be a good example for writing XS extensions that let you get (as a scalar) and set (as a scalar) the iterator value of a hash. With these methods on a hash one could use this:

sub each() {
  my ( $hash, $callback ) = @_;
  my $old_iter = $hash->iterator();
  my $i = 0;
  while ( my $key = $hash->iter_advance( \$i )) {
    $callback->( $key, $hash->{$key} );

iter_advance would be code that sets the iterator to the value in the ref passed in, advances it, sets the iterator to the next value, then returns the key for the iterator that was passed in. Since $i tracks the iterator and stores the value after each step, state is not corrupted by other uses of keys and each.

There are still a lot of unknowns here, it is very hand-wavy... but hopefully it is a good launch point for someone that wants to grab this. If I have any free time in the near future I may try,never done XS before.

exodist commented Mar 4, 2013

we might be able to make it safer simply by having the each, keys, values, and as_list methods all store and restore the iterator value after each use, then it is safe to nest them all, so long as you do not combine them with non-autoboxed versions.

schwern commented Mar 5, 2013

Good sleuthing! Progress!

Here's the magic @exodist referenced from Storable.xs...

 * Save possible iteration state via each() on that table.
riter = HvRITER_get(hv);
eiter = HvEITER_get(hv);

/* Restore hash iterator state */
HvRITER_set(hv, riter);
HvEITER_set(hv, eiter);

I checked hv.h and those macros are available in 5.10.0.

I like the idea of wrapping hv_iter_save() and hv_iter_restore() around that XS code and saving and restoring. This avoids having to write all of each in XS and, as @exodist pointed out, makes it available for other methods and should work with nested hash iteration method calls.

The caveat about this not mixing with keys/values is unfortunate, but already true. Its a shame there's not a way to work with the iterator directly.

exodist commented Mar 5, 2013

The questions is, do we want:

$val = hash->iter_get()
and hash->iter_set($val);
and hash->iter_init;

or do we want:

where push stores the current iterator in a stack internal to the hash, re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if something happens in code down from yours you do not have to worry about mismatched push and pop. The benefit of the second is you don't have to worry about the iterator value stored in the scalar getting corrupted (that is assuming the iterator value can work as a scalar at all, looks like a 32-bit unsigned integer, but I am guessing it stores a hash value, not an index, if someone tried to use ++ on the value what would happen, would it be valid?

exodist commented Mar 5, 2013

It is doubtful I will have time to do this before the weekend, and I have never touched xs before, but I will see what I can do.

exodist commented Mar 7, 2013

I played around with this last night to great success. Tonight (after work) I will finish it into a library I will put on cpan that does what we need for this, then you can pop it into perl5i's autoboxing.

exodist commented Mar 8, 2013

each() is fixed in 6a744ea however it is dependent on Hash::StoredIterator which I just now uploaded to cpan, it might take a bit for it to filter in. You can download it here in the meantime:

exodist commented Mar 8, 2013

I am not closing the ticket because:

exodist commented Mar 8, 2013

To be clear, here are my docs of the edge case:

sort() edge case
For some reason "[sort hkeys %hash]" and "[sort hkeys(%hash)]" both
result in a list that has all the keys and values (and strangly not
in sorted order). However "[sort(hkeys(%hash))]" works fine.

exodist commented Mar 8, 2013

It suddenly occured to me why the edge case occurs. sort doesn't just take a block as a special use-case, it can also be sort FUNC_NAME items... or sort FUNC_NAME (items), so the special case is trying to use hkeys as the sort function, and %hash as items. The parser does not do this for sort keys %hash because keys is a keyword, not a function. I don't think my edge case is a bug to be fixed (not sure it could be fixed either)

I will close this ticket because perl5i doesn't suffer from this edge case since it uses each as a method.

@exodist exodist closed this Mar 8, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.