Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Issue/145 #216

Merged
merged 14 commits into from

3 participants

@qrazhan

Updated the picking algorithm.

@schwern
Owner

Using as_hash is not safe, you cannot trust that values will be set to 1. It's only documented to have true values.

Because the values in %arr_hash happen to be set to 1 by as_hash, you've got one too many.

This is all better done in a single for loop.

for my $val (@$array) { $arr_hash{$val}++ }
@schwern
Owner

Very clever, but overkill. Your tests below wouldn't work if it returned anything but a scalar.

@schwern
Owner

That is makes the mistake of assuming that a true value will be 1. It may not be so. Use ok.

You can cut out a lot of those temporary variables.

my $elem = $array->pick_one;
ok $array->as_hash->{$elem};

This is taking advantage of as_hash returning a hash reference in scalar context. We can chain off of that without having to assign the hash ref to its own variable, just like you can chain method calls that return objects.

@schwern
Owner

Cuddled elses are discouraged in Perl. It crams things together and makes it harder to see the structure.

@schwern
Owner

my @arr = (undef, undef, undef) works without the temporary variables.

@schwern
Owner

That isn't right. pick() shouldn't care at all about the contents of the array.

@schwern
Owner

You can just call it "pick()". The rest can be figured out from context.

@schwern
Owner

What's with the commented out bit?

That's my unfinished code, I just wanted to commit what I had so far before the deadline on GCI.

Owner

@qrazhan As long as things are proceeding, don't sweat the deadline. Just let us know when you need an extension.

@schwern
Owner

Use vertical whitespace to give visual breaks between tests.

my @array = (1,2,3);
pick_one_ok(\@array);

@array = qw(a b c);
pick_one_ok(\@array);
@schwern
Owner

If you're only gong to use the array once, you don't need the variable. Sometimes this reads better.

pick_ok( [1, 1, 2, 2, 3, 3] => 6 );

I also used the "fat comma" instead of the regular comma for visual clarity, its otherwise the same in this case. This is handy when there's pairs of things or you want more visual clarity between arguments.

However, it is important to note that '=>' will quote a value on the left in many cases:

1 => 2 # will quote 1 so that it is "1" (not a big deal in perl)
foo => 'bar' #will quote 'foo' for you, which may not be what you want if foo is actually a function call (easily fixed by adding parens)

In most cases, specially [...] => ... the => will act just like a comma, but the quoting can bite you if you use a keyword or function, or special thing on the left. But I agree with schwern that the visual difference is well worth it.

@schwern
Owner

This test fails for me on the "undefined elements" tests.

@schwern
Owner

This does not count the number of times an element is seen in @rand, just that it was seen at all. It eliminates duplicates.

@schwern
Owner

This algorithm doesn't work to detect duplicates, @rand->as_hash has already eliminated them.

A better algorithm iterates through the picks subtracting from the count hash and checking immediately.

for my $val (@picks) {
    $array_counts{$val}--;
    return 0 if $array_counts{$val} < 0;
}
return 1;

It's important to make sure your tests fail. You could do this by introducing a bug into pick(). You can also test your test functions. If you structure pick_ok() it becomes testable (good chunk of testing is making things testable).

Essentially what we're doing is a set operation. We want to determine if @rand is a subset of $array. So write an is_subset routine just for that. It looks like pick_ok(), but it takes two arrays and returns true or false. Then you write pick_ok() around that.

func pick_ok($array, $num) {
    my @picks = $array->pick($num);

    my $num_picks = $num > @$array ? @$array : $num;
    is @picks, $num_picks;

    ok is_subset($array, \@picks), "is_subset" or diag explain \@picks;
}

The advantage is you can test is_subset() directly. ok is_subset( [1,2,3,4], [1,1,1] ) and ok is_subset( [1,1,2,2,3,3], [1,1,2] );

@schwern
Owner

The ok here is redundant. pick_ok already checks that is_subtest is ok.

@schwern
Owner

If is_subtest returns false, the tester doesn't know why. Failure diagnostics are important, especially in tests involving random numbers where you might not be able to generate the same result twice. Because we're using ok, you have to provide diagnostics. ok returns false on failure, so we can chain an or diag on the end if it fails.

    ok( is_subtest($array, \@rand) ) or diag sprintf <<END, explain $array, explain \@rand;
set: %s
subset: %s
END

explain is a Test::More function which dumps data structures. <<END is called a "here-doc" and lets you write out a multi-line string in a wysiwyg way. The end result is when the function fails the tester will see the contents of the set and subset and be able to debug what went wrong.

@schwern
Owner

Similar issue with the diagnostics as with pick_one.

In addition, there's a subtle issue with writing your own test functions as wrappers around Test::More functions. Test::More reports where the test fails from by looking up the call stack a fixed number of times. It stores this in $Test::Builder::Level. If you wrap an extra layer of test functions around ok you have to increment this, but only do it in this localized context. To do this you use local. local temporarily assigns a new value to a global, and then restores the old one at the end of the block. Because of the way local works, you can't just increment. You have to local $Test::Builder::Level = $Test::Builder::Level + 1; before the ok. pick_one needs this treatment as well.

This allows failures to be reported at the point where pick_one_ok was called, and not at the ok inside pick_one_ok. This helps testers debug the failure by knowing which test failed.

@schwern
Owner

A bit better to just convert $val once. $val = safe_key($val). Then you won't forget later.

@schwern
Owner

This isn't all the redundant ok's.

@schwern
Owner

An optimization is to calculate safe_key($elem) outside the grep. This is a test, so optimizing isn't important, but its good practice to look for things you can pull outside a loop.

@schwern

No $studlyCaps please. $use_this_style, it's easier to read. And $number is sufficient.

@schwern

No discussion of the underlying algorithm in the docs. Documentation is a contract, and it's good to be as unspecific as possible in a contract to give us the most wiggle room in the future.

@schwern schwern closed this
@schwern schwern reopened this
@schwern schwern merged commit 96b9b05 into evalEmpire:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 29, 2011
  1. @qrazhan @schwern

    Add @array->pick() and @array->pick_one().

    qrazhan authored schwern committed
    For #145
  2. @schwern

    Whitespace and style fixups.

    schwern authored
    For #145
  3. @qrazhan

    Updating picking algorithm #145

    qrazhan authored
Commits on Nov 30, 2011
  1. @qrazhan

    Fixed style problems in pick method. Also updated pick.t to check if …

    qrazhan authored
    …an element occurs more than once.
  2. @qrazhan

    Fixed a typo in pick.t

    qrazhan authored
  3. @qrazhan
Commits on Dec 1, 2011
  1. @qrazhan
Commits on Dec 4, 2011
  1. @qrazhan

    Fixed errors in pick.t #145

    qrazhan authored
Commits on Dec 5, 2011
  1. @qrazhan
  2. @qrazhan

    fixed pick_one_ok() #145

    qrazhan authored
  3. @qrazhan

    added documentation

    qrazhan authored
  4. @qrazhan
Commits on Dec 7, 2011
  1. @qrazhan

    Fixed word wrap in comments

    qrazhan authored
  2. @qrazhan

    Made pick more readable

    qrazhan authored
This page is out of date. Refresh to see the latest.
Showing with 162 additions and 0 deletions.
  1. +24 −0 lib/perl5i.pm
  2. +38 −0 lib/perl5i/2/ARRAY.pm
  3. +100 −0 t/pick.t
View
24 lib/perl5i.pm
@@ -635,6 +635,30 @@ Example usage:
my %hash = @array->as_hash;
say q[@array contains 'a'] if $hash{"a"};
+=head3 pick
+
+ my @rand = @array->pick($number);
+
+The pick() method returns a list of $number elements in @array.
+If $number is larger than the size of the list, it returns the entire list shuffled.
+
+Example usage:
+
+ my @array = (1, 2, 3, 4);
+ my @rand = @array->pick(2);
+
+=head3 pick_one
+
+ my $rand = @array->pick_one;
+
+The pick_one() method returns a random element in @array.
+It is similar to @array->pick(1), except that it does not return a list.
+
+Example usage:
+
+ my @array = (1,2,3,4);
+ my $rand = @array->pick_one;
+
=head3 diff
Calculate the difference between two (or more) arrays:
View
38 lib/perl5i/2/ARRAY.pm
@@ -49,6 +49,44 @@ method as_hash{
return wantarray ? %result : \%result;
}
+
+method pick ( $num ){
+ Carp::croak("pick() takes the number of elements to pick")
+ unless defined $num;
+ Carp::croak("pick() takes a positive integer or zero, not '$num'")
+ unless $num->is_integer && ($num->is_positive or $num == 0);
+
+ if($num >= @$self){
+ my @result = List::Util::shuffle(@$self);
+ return wantarray ? @result : \@result;
+ }
+
+ # for the first position in the array, generate a random number that gives
+ # that element an n/N chance of being picked (where n is the number of elements to pick and N is the total array size);
+ # repeat for the rest of the array, each time altering the probability of
+ # the element being picked to reflect the number of elements picked so far and the number left.
+ my $num_left = @$self;
+ my @result;
+ my $i=0;
+ while($num > 0){
+ my $rand = int(rand($num_left));
+ if($rand < $num){
+ push(@result, $self->[$i]);
+ $num--;
+ }
+ $num_left--;
+ $i++;
+ }
+
+ return wantarray ? @result : \@result;
+}
+
+
+method pick_one() {
+ return @$self[int rand @$self];
+}
+
+
method grep($filter) {
my @result = CORE::grep { $_ ~~ $filter } @$self;
View
100 t/pick.t
@@ -0,0 +1,100 @@
+ #!/usr/bin/perl
+
+use lib 't/lib';
+use perl5i::latest;
+use Test::More;
+use Test::perl5i;
+
+func pick_ok($array, $num) {
+ local $Test::Builder::Level = $Test::Builder::Level +1;
+ my @rand = $array->pick($num);
+ if($num <= @$array){
+ is @rand, $num;
+ }
+ else{
+ is @rand, @$array;
+ }
+ ok (is_subset($array, \@rand)) or diag sprintf <<END, explain $array, explain \@rand;
+set: %s
+subset: %s
+END
+}
+
+func is_subset($array, $sub){
+ my %arr_hash;
+ for my $val (@$array){
+ $val = safe_key($val);
+ $arr_hash{$val}++;
+ }
+ for my $val (@$sub){
+ $val = safe_key($val);
+ $arr_hash{$val}--;
+ return 0 if $arr_hash{$val} < 0;
+ }
+ return 1;
+}
+
+func pick_one_ok($array){
+ local $Test::Builder::Level = $Test::Builder::Level +1;
+ my $elem = @$array->pick_one;
+ ok grep safe_key($_) eq safe_key($elem), @$array;
+}
+
+func safe_key($val){
+ return defined $val ? $val : "__UNDEFINED__";
+}
+
+note 'is_subset';{
+ ok !(is_subset([1,2,3,4] , [1,1,1]));
+
+ ok !(is_subset([1,1,1,1] , [1,2]));
+
+ ok is_subset([1,2,3,4] , [1,2]);
+}
+note 'pick()'; {
+ my @arr = qw(a b c d e f g h i);
+ pick_ok(\@arr, 5);
+
+ pick_ok(\@arr, 9);
+
+ pick_ok(\@arr, 100);
+
+ pick_ok(\@arr, 0);
+}
+
+note 'pick with undefined elements';{
+ pick_ok([undef,undef,undef] => 2);
+
+}
+
+note 'pick method with duplicate elements';{
+ pick_ok([1,1,2,2,3,3] => 6);
+}
+
+note "pick with no args"; {
+ my @array = (1, 2, 3);
+ throws_ok { @array->pick(); }
+ qr{^\Qpick() takes the number of elements to pick at $0 line };
+}
+
+note "pick with negative arg"; {
+ my @array = (1, 2, 3);
+ throws_ok { @array->pick(-20); }
+ qr{^\Qpick() takes a positive integer or zero, not '-20' at $0 line };
+}
+
+note "pick with non-numerical argument"; {
+ my @array = (1, 2, 3);
+ throws_ok { @array->pick("rawr"); }
+ qr{^\Qpick() takes a positive integer or zero, not 'rawr' at $0 line };
+}
+
+note "pick_one method";{
+ pick_one_ok([1,2,3,4,4]);
+
+ pick_one_ok(["a","b","c","d","e"]);
+
+ pick_one_ok([undef, undef, undef, undef]);
+}
+
+done_testing;
Something went wrong with that request. Please try again.