Issue/145 #216

Merged
merged 14 commits into from Dec 7, 2011

Projects

None yet

3 participants

Contributor
qrazhan commented Nov 29, 2011

Updated the picking algorithm.

Using as_hash is not safe, you cannot trust that values will be set to 1. It's only documented to have true values.

Because the values in %arr_hash happen to be set to 1 by as_hash, you've got one too many.

This is all better done in a single for loop.

for my $val (@$array) { $arr_hash{$val}++ }

Very clever, but overkill. Your tests below wouldn't work if it returned anything but a scalar.

That is makes the mistake of assuming that a true value will be 1. It may not be so. Use ok.

You can cut out a lot of those temporary variables.

my $elem = $array->pick_one;
ok $array->as_hash->{$elem};

This is taking advantage of as_hash returning a hash reference in scalar context. We can chain off of that without having to assign the hash ref to its own variable, just like you can chain method calls that return objects.

Cuddled elses are discouraged in Perl. It crams things together and makes it harder to see the structure.

my @arr = (undef, undef, undef) works without the temporary variables.

That isn't right. pick() shouldn't care at all about the contents of the array.

You can just call it "pick()". The rest can be figured out from context.

What's with the commented out bit?

Owner

That's my unfinished code, I just wanted to commit what I had so far before the deadline on GCI.

@qrazhan As long as things are proceeding, don't sweat the deadline. Just let us know when you need an extension.

Use vertical whitespace to give visual breaks between tests.

my @array = (1,2,3);
pick_one_ok(\@array);

@array = qw(a b c);
pick_one_ok(\@array);

If you're only gong to use the array once, you don't need the variable. Sometimes this reads better.

pick_ok( [1, 1, 2, 2, 3, 3] => 6 );

I also used the "fat comma" instead of the regular comma for visual clarity, its otherwise the same in this case. This is handy when there's pairs of things or you want more visual clarity between arguments.

However, it is important to note that '=>' will quote a value on the left in many cases:

1 => 2 # will quote 1 so that it is "1" (not a big deal in perl)
foo => 'bar' #will quote 'foo' for you, which may not be what you want if foo is actually a function call (easily fixed by adding parens)

In most cases, specially [...] => ... the => will act just like a comma, but the quoting can bite you if you use a keyword or function, or special thing on the left. But I agree with schwern that the visual difference is well worth it.

schwern commented on 99e6b80 Dec 1, 2011

This test fails for me on the "undefined elements" tests.

This does not count the number of times an element is seen in @rand, just that it was seen at all. It eliminates duplicates.

This algorithm doesn't work to detect duplicates, @rand->as_hash has already eliminated them.

A better algorithm iterates through the picks subtracting from the count hash and checking immediately.

for my $val (@picks) {
    $array_counts{$val}--;
    return 0 if $array_counts{$val} < 0;
}
return 1;

It's important to make sure your tests fail. You could do this by introducing a bug into pick(). You can also test your test functions. If you structure pick_ok() it becomes testable (good chunk of testing is making things testable).

Essentially what we're doing is a set operation. We want to determine if @rand is a subset of $array. So write an is_subset routine just for that. It looks like pick_ok(), but it takes two arrays and returns true or false. Then you write pick_ok() around that.

func pick_ok($array, $num) {
    my @picks = $array->pick($num);

    my $num_picks = $num > @$array ? @$array : $num;
    is @picks, $num_picks;

    ok is_subset($array, \@picks), "is_subset" or diag explain \@picks;
}

The advantage is you can test is_subset() directly. ok is_subset( [1,2,3,4], [1,1,1] ) and ok is_subset( [1,1,2,2,3,3], [1,1,2] );

The ok here is redundant. pick_ok already checks that is_subtest is ok.

If is_subtest returns false, the tester doesn't know why. Failure diagnostics are important, especially in tests involving random numbers where you might not be able to generate the same result twice. Because we're using ok, you have to provide diagnostics. ok returns false on failure, so we can chain an or diag on the end if it fails.

    ok( is_subtest($array, \@rand) ) or diag sprintf <<END, explain $array, explain \@rand;
set: %s
subset: %s
END

explain is a Test::More function which dumps data structures. <<END is called a "here-doc" and lets you write out a multi-line string in a wysiwyg way. The end result is when the function fails the tester will see the contents of the set and subset and be able to debug what went wrong.

Similar issue with the diagnostics as with pick_one.

In addition, there's a subtle issue with writing your own test functions as wrappers around Test::More functions. Test::More reports where the test fails from by looking up the call stack a fixed number of times. It stores this in $Test::Builder::Level. If you wrap an extra layer of test functions around ok you have to increment this, but only do it in this localized context. To do this you use local. local temporarily assigns a new value to a global, and then restores the old one at the end of the block. Because of the way local works, you can't just increment. You have to local $Test::Builder::Level = $Test::Builder::Level + 1; before the ok. pick_one needs this treatment as well.

This allows failures to be reported at the point where pick_one_ok was called, and not at the ok inside pick_one_ok. This helps testers debug the failure by knowing which test failed.

A bit better to just convert $val once. $val = safe_key($val). Then you won't forget later.

This isn't all the redundant ok's.

An optimization is to calculate safe_key($elem) outside the grep. This is a test, so optimizing isn't important, but its good practice to look for things you can pull outside a loop.

No $studlyCaps please. $use_this_style, it's easier to read. And $number is sufficient.

No discussion of the underlying algorithm in the docs. Documentation is a contract, and it's good to be as unspecific as possible in a contract to give us the most wiggle room in the future.

@schwern schwern closed this Dec 6, 2011
@schwern schwern reopened this Dec 6, 2011
@schwern schwern merged commit 96b9b05 into evalEmpire:master Dec 7, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment