Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#296] Added chunked(long), chunked(Predicate<T>) #320

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
116 changes: 116 additions & 0 deletions src/main/java/org/jooq/lambda/Iterators.java
@@ -0,0 +1,116 @@
package org.jooq.lambda;

import java.util.Iterator;
import java.util.function.Predicate;

/**
* Created by billoneil on 7/26/17.
*/
public class Iterators {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make this class package-private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to make these package-private oops.

private enum PredicateCheckState { NOT_TESTED, MORE, ONE_MORE, NO_MORE }

static <E> PeekingIterator<E> peeking(Iterator<E> iterator) {
return new PeekingIteratorImpl<E>(iterator);
}

static <E> Iterator<E> takeWhileIterator(Iterator<E> iterator, Predicate<E> predicate) {
return takeWhileIterator(peeking(iterator), predicate);
}

static <E> Iterator<E> takeWhileIterator(PeekingIterator<E> iterator, Predicate<E> predicate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that we keep writing this iterator :) Perhaps, can this be refactored with existing code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to use the Seq.take* but it closed the stream after the predicate matched and we need it to stay open. I will look into a better way to refactor this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say: Perhaps Seq.take should use this new iterator. In that case, we should open up several tickets and do things one step at a time...

return new Iterator<E>() {

private final PeekingIterator<E> peeking = iterator;
private PredicateCheckState state = PredicateCheckState.NOT_TESTED;

@Override
public boolean hasNext() {
// Short circuit since its not possible
if (!peeking.hasNext()) {
state = PredicateCheckState.NOT_TESTED;
return false;
}

// We want the predicate test to be deterministic per element
// so we need to cache the result until we advance the iterator.
// An example predicate would be a counting predicate.
if (PredicateCheckState.NOT_TESTED == state) {
boolean stop = predicate.test(peeking.peek());
if (stop) {
state = PredicateCheckState.ONE_MORE;
} else {
state = PredicateCheckState.MORE;
}
}

if (state == PredicateCheckState.NO_MORE) {
return false;
}
return true;
}

@Override
public E next() {
if (PredicateCheckState.ONE_MORE == state) {
state = PredicateCheckState.NO_MORE;
} else {
state = PredicateCheckState.NOT_TESTED;
}
return peeking.next();
}
};
}

static class PeekingIteratorImpl<E> implements PeekingIterator<E> {
private final Iterator<E> delegate;
private E headElement;
private boolean hasPeeked;

PeekingIteratorImpl(Iterator<E> delegate) {
this.delegate = delegate;
this.headElement = null;
this.hasPeeked = false;
}

@Override
public E peek() {
if (hasPeeked) {
return headElement;
}
headElement = delegate.next();
hasPeeked = true;
return headElement;
}

@Override
public boolean hasNext() {
return hasPeeked || delegate.hasNext();
}

@Override
public E next() {
if (hasPeeked) {
hasPeeked = false;
return headElement;
}
return delegate.next();
}
}

static <T> Predicate<T> countingPredicate(long count) {
if (count < 1) {
throw new IllegalArgumentException("count must be greater than 1");
}
long[] countArray = new long[1];
countArray[0] = 0L;
long iterations = count -1;
return (T t) -> {
if (iterations == countArray[0]) {
countArray[0] = 0L;
return true;
}
countArray[0] = countArray[0] + 1;
return false;
};
}
}
13 changes: 13 additions & 0 deletions src/main/java/org/jooq/lambda/PeekingIterator.java
@@ -0,0 +1,13 @@
package org.jooq.lambda;

import java.util.Iterator;

public interface PeekingIterator<E> extends Iterator<E> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This is an internal type and shouldn't be public, in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I wonder if this needs to be an interface, given we only have a single implementation so far...

/**
* Returns the current head of the iterator without consuming it.
* Multiple sequential calls to {@code peek()} will return the same element.
* @return The current head of the iterator
* @throws java.util.NoSuchElementException if there are no more elements
*/
E peek();
}
54 changes: 54 additions & 0 deletions src/main/java/org/jooq/lambda/Seq.java
Expand Up @@ -3396,6 +3396,30 @@ default Tuple2<Optional<T>, Seq<T>> splitAtHead() {
return splitAtHead(this);
}

/**
* Chunk a Stream based on a predicate
* <p>
* <code><pre>
* // ((1,1,2), (2), (3,4))
* Seq.of(1, 1, 2, 2, 3, 4).chunked(n -> n %2 == 0)
Copy link
Member

@lukaseder lukaseder Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should exclude the chunk boundaries by default if using a predicate, and include them only explicitly. When including, the question is whether the boundary should belong to:

  1. The ending chunk ((1, 1, 2), (2), (3, 4), ())
  2. The beginning chunk ((1, 1), (2), (2, 3), (4))
  3. Both chunks (?)
  4. No chunk

Given that there is no clear preference between 1/2 and maybe even 3, the default must be 4. So, an enum might be a reasonable additional argument for this, where "no chunk" would be the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking an enum would be the appropriate solution for inclusion / exclusion. It makes sense to work for all four.

  1. The beginning chunk ((1, 1), (2), (2, 3), (4))

Should we assume the beginning chunk starts true? or should this actually return ((2), (2, 3), (4)) skipping anything before the first chunk.

  1. No chunk

Would this be excluding the matched element? ((1, 1), (), (3))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assume the beginning chunk starts true? or should this actually return ((2), (2, 3), (4)) skipping anything before the first chunk.

Egh... :) OK, so in the no chunk case, we get:

  • 1 chunk for 0 true evaluations of the predicate
  • 2 chunks for 1 true evaluation of the predicate
  • 3 chunks for 2 true evaluations of the predicate

For example:

// ((1, 1, 2, 2, 3, 4))
Seq.of(1, 1, 2, 2, 3, 4).chunked(i -> false);

// ((),(),(),(),(),(),())
Seq.of(1, 1, 2, 2, 3, 4).chunked(i -> true);

I think that options 1-3 should adhere to this logic. My example 1 was wrong, there should be a trailing empty chunk. Will fix the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence:

// ((1),(1),(2),(2),(3),(4),())
Seq.of(1, 1, 2, 2, 3, 4).chunked(i -> true, BEFORE);
// ((),(1),(1),(2),(2),(3),(4))
Seq.of(1, 1, 2, 2, 3, 4).chunked(i -> true, AFTER);
// ((1),(1, 1),(1, 2),(2, 2),(2, 3),(3, 4),(4))
Seq.of(1, 1, 2, 2, 3, 4).chunked(i -> true, BOTH);

Still a bit undecided about the BOTH part, although it could be useful and it would be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could always ignore BOTH for now and easily add it later if requested since it will be an enum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do that.

* </pre></code>
*/
default Seq<Seq<T>> chunked(Predicate<T> predicate) {
return chunked(this, predicate);
}

/**
* Chunk a Stream based on a number of elements
* <p>
* <code><pre>
* // ((0,1,2), (3,4,5), (6,7,8), (9))
* Seq.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9).chunked(3)
* </pre></code>
*/
default Seq<Seq<T>> chunked(long chunkSize) {
return chunked(this, Iterators.countingPredicate(chunkSize));
}

/**
* Returns a limited interval from a given Stream.
* <p>
Expand Down Expand Up @@ -9712,6 +9736,36 @@ static <T> Tuple2<Optional<T>, Seq<T>> splitAtHead(Stream<T> stream) {
Optional.empty(), seq(it));
}

/**
* Chunk a Stream based on a predicate
* <p>
* <code><pre>
* // ((1,1,2), (2), (3,4))
* Seq.of(1, 1, 2, 2, 3, 4).chunked(n -> n %2 == 0)
* </pre></code>
*/
static <T> Seq<Seq<T>> chunked(Stream<T> stream, Predicate<T> predicate) {
// This iterator can get in a bad state if you call next
// multiple times without consuming the intermediate Seq.
// It should be safe since its not exposed. A fix could
// be to force has next to make sure the current Seq was iterated.
Iterator<Seq<T>> it = new Iterator<Seq<T>>() {
private final PeekingIterator<T> peeking = Iterators.peeking(stream.iterator());

@Override
public boolean hasNext() {
return peeking.hasNext();
}

@Override
public Seq<T> next() {
return Seq.seq(Iterators.takeWhileIterator(peeking, predicate));
}
};

return Seq.seq(it);
}

// Methods taken from LINQ
// -----------------------

Expand Down
91 changes: 91 additions & 0 deletions src/test/java/org/jooq/lambda/IteratorsTest.java
@@ -0,0 +1,91 @@
package org.jooq.lambda;

import org.junit.Test;

import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.function.Predicate;

import static org.junit.Assert.*;
import static org.junit.Assert.assertTrue;

/**
* Created by billoneil on 7/26/17.
*/
public class IteratorsTest {

@Test(expected = NoSuchElementException.class)
public void testPeekingThrows() {
Seq<Integer> seq1 = Seq.of();
PeekingIterator<Integer> peeking1 = new Iterators.PeekingIteratorImpl<>(seq1.iterator());
peeking1.next();
}

@Test
public void testPeeking() {
Seq<Integer> seq1 = Seq.of();
PeekingIterator<Integer> peeking1 = new Iterators.PeekingIteratorImpl<>(seq1.iterator());
assertFalse(peeking1.hasNext());

Seq<Integer> seq2 = Seq.of(1, 2, 3);
PeekingIterator<Integer> peeking2 = new Iterators.PeekingIteratorImpl<>(seq2.iterator());
int i = 0;
while (peeking2.hasNext()) {
assertEquals(++i, (int) peeking2.next());
}

Seq<Integer> seq3 = Seq.of(1, 2, 3);
PeekingIterator<Integer> peeking3 = new Iterators.PeekingIteratorImpl<>(seq3.iterator());
assertEquals(1, (int) peeking3.peek());
assertEquals(1, (int) peeking3.peek());
assertEquals(1, (int) peeking3.next());
assertEquals(2, (int) peeking3.next());
assertEquals(3, (int) peeking3.peek());
assertEquals(3, (int) peeking3.peek());
assertEquals(3, (int) peeking3.next());
}

@Test(expected = IllegalArgumentException.class)
public void testCountingPredicateThrowsIllegalArg() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is probably the reason why you made the class public...

  1. If the tests are placed in the same package, you can access package-private classes.
  2. I'm not too fan of testing internals with unit tests. The internals will be tested transitively by testing the chunked API. With tests on internals, refactoring those internals will be much harder...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are in the same package so it should have been package-private. The first implementation was must have missed it when I cleaned it up.

  1. I'm not too fan of testing internals with unit tests. The internals will be tested transitively by testing the chunked API. With tests on internals, refactoring those internals will be much harder...

👍 totally agree wasn't sure of your preference so opted for more testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all in for more testing, but high level tests will never break because the API won't change again. Low level tests are a bit of a maintenance burden.

Iterators.countingPredicate(0L);
}

@Test
public void testCountingPredicate() {
Predicate<Integer> oneCount = Iterators.countingPredicate(1L);
assertTrue(oneCount.test(1));

Predicate<Integer> tenCount = Iterators.countingPredicate(10L);
Seq.range(0, 9).forEach(n -> {
assertFalse(tenCount.test(n));
});
assertTrue(tenCount.test(1));
}

@Test
public void testPredicateIterator() {
Predicate<Integer> tenCount = Iterators.countingPredicate(10L);
Seq<Integer> seq = Seq.range(0, 10).cycle();
Iterator<Integer> it = Iterators.takeWhileIterator(seq.iterator(), tenCount);
int i = 0;
while (it.hasNext()) {
// It should be safe to call hasNext multiple times.
it.hasNext();
it.next();
i++;
}
assertEquals(10, i);


Predicate<Integer> isEven = n -> n %2 == 0;
Seq<Integer> seq2 = Seq.of(1,1,2,3).cycle();
Iterator<Integer> it2 = Iterators.takeWhileIterator(seq2.iterator(), isEven);
assertTrue(it2.hasNext());
assertTrue(it2.hasNext());
assertEquals(1, (int) it2.next());
assertEquals(1, (int) it2.next());
assertTrue(it2.hasNext());
assertEquals(2, (int) it2.next());
assertFalse(it2.hasNext());
}
}
38 changes: 38 additions & 0 deletions src/test/java/org/jooq/lambda/SeqTest.java
Expand Up @@ -1285,6 +1285,44 @@ public void testSplitAtHead() {
assertEquals(asList(), Seq.of(1, 2, 3).splitAtHead().v2.splitAtHead().v2.splitAtHead().v2.toList());
}

@Test
public void testChunked() {
Seq<Seq<Integer>> seq1 = Seq.range(0, 10).chunked(5);
List<List<Integer>> expected1 = Seq.of(
Seq.of(0, 1, 2, 3, 4),
Seq.of(5, 6, 7, 8, 9)
).map(s -> s.toList()).toList();
assertEquals(expected1, seq1.map(s -> s.toList()).toList());

Seq<Seq<Integer>> seq2 = Seq.range(0, 10).chunked(3);
List<List<Integer>> expected2 = Seq.of(
Seq.of(0, 1, 2),
Seq.of(3, 4, 5),
Seq.of(6, 7, 8),
Seq.of(9)
).map(s -> s.toList()).toList();
assertEquals(expected2, seq2.map(s -> s.toList()).toList());

// Show that it is lazy
Seq<Seq<Integer>> seq3 = Seq.range(0, 10).cycle().chunked(5);
List<List<Integer>> expected3 = Seq.of(
Seq.of(0, 1, 2, 3, 4),
Seq.of(5, 6, 7, 8, 9),
Seq.of(0, 1, 2, 3, 4)
).map(s -> s.toList()).toList();
assertEquals(expected3, seq3.limit(3).map(s -> s.toList()).toList());

Predicate<Integer> isEven = n -> n % 2 == 0;
Seq<Seq<Integer>> seq4 = Seq.of(1,1,2,2,3,3,3,4).cycle().chunked(isEven);
List<List<Integer>> expected4 = Seq.of(
Seq.of(1, 1, 2),
Seq.of(2),
Seq.of(3, 3, 3, 4),
Seq.of(1, 1, 2)
).map(s -> s.toList()).toList();
assertEquals(expected4, seq4.limit(4).map(s -> s.toList()).toList());
}

@Test
public void testMinByMaxBy() {
Supplier<Seq<Integer>> s = () -> Seq.of(1, 2, 3, 4, 5, 6);
Expand Down