improve CursorIterator #7

phraktle · 2016-07-10T13:08:15Z

Consider making CursorIterator more extensible. It would be reasonable to be able to subclass to provide a range iterator (i.e. a forward iterator that checks an upper bound key, or a reverse iterator checking a lower bound). Since the class is final and tryToComputeNext is private, this is not currently feasible.

Another minor point is that the state machine should probably include a CLOSED state (in which hasNext returns false, and repeated calls to close are idempotent).

The text was updated successfully, but these errors were encountered:

krisskross · 2016-07-14T12:40:39Z

The specific case of an "end" key is that you need to know what buffer you're dealing in order to implement a compare method.

Even if we make tryToComputeNext public, the iterator is instantiated by Dbi so you cannot force a certain implementation at the moment. We could open up the API a little, make the class public and let users implement their own?

benalexau · 2017-07-05T04:52:06Z

I recently added a CursorIterator.KeyValPopulator without being aware of this ticket and this offers a simple mechanism for client code to modify the key or value being returned. My use case was iterating an index table where the keys were the index keys and values were the primary key of a different table (so a custom KeyValPopulator would return the value from the data table, not the value from the index table).

One idea is to rename KeyValPopulator and allow users to override whether the iteration is complete (eg boolean isComplete(T key, T val). We could leave a default method on KeyValPopulator which just returns true. Would this suit the use case @phraktle had in mind or are there other suggestions?

krisskross · 2017-07-06T08:54:16Z

I think we should be careful about introducing domain specific concepts into the API and somehow I feel that CursorIterator.KeyValPopulator is a bit too specific. Wouldn't it be cleaner to implement it with a wrapper class outside of lmdbjava?

For the range iterator, apart from knowing when iteration is complete (stop key), we also need to know where to begin (start key). Both of keys should be optional and can be either inclusive (closed) or exclusive (open). Ideally, we should also let users decide if iteration order is reversed or not.

One suggestion is to introduce a separate method Dbi.iterate(Range) similar to how ranges work in Guava. A Comparator would need to be provided by the user, or using a default implementation per BufferProxy (though implementing a performant Comparator is non-trivial).

krisskross · 2017-07-07T17:46:08Z

I implemented something similar in RxLMDB a while back which might serve as inspiration.

https://github.com/deephacks/RxLMDB/blob/master/rxlmdb/src/main/java/org/deephacks/rxlmdb/KeyRange.java

benalexau · 2017-07-08T01:44:38Z

I think we should be careful about introducing domain specific concepts into the API and somehow I feel that CursorIterator.KeyValPopulator is a bit too specific. Wouldn't it be cleaner to implement it with a wrapper class outside of lmdbjava?

I did consider this, but felt a wrapper class would be too verbose. I've pasted the two approaches below.

What can be done with LmdbJava 0.5.0 with this use case:

package au.com.acegi.core.lmdb;

import org.agrona.DirectBuffer;
import static org.apache.commons.lang3.Validate.notNull;
import org.lmdbjava.Cursor;
import org.lmdbjava.CursorIterator.KeyValPopulator;
import org.lmdbjava.Dbi;
import org.lmdbjava.Txn;

/**
 * LmdbJava <code>KeyValPopulator</code> used with index tables.
 *
 * <p>
 * Index tables have their values as the primary key of the data table.
 * When the index table cursor is presented, its value is therefore read to
 * obtain the primary key and the iterator value is fetched from the data
 * table.
 */
public final class IndexKeyValPop implements KeyValPopulator<DirectBuffer> {

  private final Dbi<DirectBuffer> dbi;
  private final Txn<DirectBuffer> txn;

  public IndexKeyValPop(final Txn<DirectBuffer> txn,
                        final Dbi<DirectBuffer> dbi) {
    notNull(txn, "Txn required");
    notNull(dbi, "Dbi required");
    this.txn = txn;
    this.dbi = dbi;
  }

  @Override
  public DirectBuffer getKey(final Cursor<DirectBuffer> cursor) {
    return cursor.key();
  }

  @Override
  public DirectBuffer getVal(final Cursor<DirectBuffer> cursor) {
    final DirectBuffer pk = cursor.val();
    return dbi.get(txn, pk);
  }

}

What would be required to do this without KeyValPopulator:

package au.com.acegi.core.lmdb;

import java.util.Iterator;
import java.util.NoSuchElementException;
import static org.apache.commons.lang3.Validate.notNull;
import org.lmdbjava.CursorIterator;
import org.lmdbjava.Dbi;
import org.lmdbjava.Txn;

/**
 * {@link CursorIterator} that supports transparent use with index tables.
 *
 * <p>
 * Index tables have their values as the primary key of the data table. When an
 * index table cursor is iterated, its value is therefore read to obtain the
 * primary key. The iterator value is then fetched from the primary key table.
 *
 * <p>
 * This class also supports primary key tables for iterator return type
 * consistency within DAOs. In this case the cursor value is simply returned
 * without a lookup.
 *
 * @param <T> buffer type
 */
public final class DaoIterator<T> implements
    Iterator<DaoIterator.KeyVal<T>>, AutoCloseable {

  private final Dbi<T> dbi;
  private final CursorIterator<T> delegate;
  private final KeyVal<T> entry = new KeyVal<>();
  private final Txn<T> txn;

  /**
   * Create a new instance (should be called from generated code only).
   *
   * <p>
   * The generated iterator will delegate all calls to the passed delegate.
   *
   * <p>
   * If this instance will wrap a cursor for an index table, the passed
   * {@link Dbi} must be non-null and represent the primary key table. When
   * {@link #next()} is called, the index table value will be replaced with the
   * corresponding primary key table value.
   *
   * <p>
   * If this instance will wrap a cursor for the primary key table, the passed
   * {@link Dbi} must be null. This avoids the aforementioned replacement of
   * values as would occur with an index table.
   *
   *
   * @param txn      transaction used within the delegate (required)
   * @param delegate to pass all calls to (required)
   * @param dbi      primary key table (must be null if this is the PK table)
   */
  public DaoIterator(final Txn<T> txn, final CursorIterator<T> delegate,
                     final Dbi<T> dbi) {
    notNull(txn, "Transaction required");
    notNull(delegate, "Delegate required");
    this.txn = txn;
    this.delegate = delegate;
    this.dbi = dbi;
  }

  @Override
  public void close() {
    delegate.close();
  }

  @Override
  public boolean hasNext() {
    return delegate.hasNext();
  }

  /**
   * Obtain an iterator.
   *
   * @return an iterator (never null)
   */
  public Iterable<KeyVal<T>> iterable() {
    return () -> DaoIterator.this;
  }

  @Override
  public KeyVal<T> next() throws NoSuchElementException {
    final CursorIterator.KeyVal<T> fetched = delegate.next();
    entry.setKey(fetched.key());
    if (dbi == null) {
      entry.setVal(fetched.val());
    } else {
      final T pk = fetched.val();
      entry.setVal(dbi.get(txn, pk));
    }
    return entry;
  }

  @Override
  public void remove() {
    delegate.remove();
  }

  /**
   * Holder for a key and value pair.
   *
   * <p>
   * The same holder instance will always be returned for a given iterator.
   * The returned keys and values may change or point to different memory
   * locations following changes in the iterator, cursor or transaction.
   *
   * @param <T> buffer type
   */
  public static final class KeyVal<T> {

    private T k;
    private T v;

    /**
     * The key.
     *
     * @return key
     */
    public T key() {
      return k;
    }

    /**
     * The value.
     *
     * @return value
     */
    public T val() {
      return v;
    }

    void setKey(final T key) {
      this.k = key;
    }

    void setVal(final T val) {
      this.v = val;
    }

  }

}

As such the KeyValPopulator is less than one third the lines of code and does not introduce largely duplicate types (DaoIterator = CursorIterator, DaoIterator.KeyVal = CursorIterator.KeyVal).

So while I totally agree we shouldn't introduce domain concepts, I've encountered many occasions where I needed to fetch data using more than one key. I assumed it was a common enough requirement that others would be doing so as well. I'm happy to revert KeyValPopulator if you like though, as DaoIterator solves the problem adequately enough and I might simply be wrong on this being a common requirement.

I implemented something similar in RxLMDB a while back which might serve as inspiration.

Looks good. I'm assuming we could make it buffer agnostic? It would seem we'll need to make BufferProxy<T> implements Comparator<T> to make this work though? I'd only add there would be value in making the KeyRangeType and KeyRange constructor public so that generated DAOs can express the desired behaviour in terms of their domain objects (not the buffer type, which is handled behind-the-scenes in the DAO implementation). For example:

  CursorIterator<DirectBuffer> findPeopleByName(Txn<DirectBuffer> txn, KeyRangeType type, String firstName, String lastName);

In the above the firstName and lastName are converted to a fixed-length (or other delimited) key, with the passed type governing which of the arguments (if any) are actually used. This would suit code generators fine and allow them to abstract away how the key is actually constructed. Certainly the CursorIterator<DirectBuffer> is still returned, but the value is likely the user's domain object (eg a SBE-encoded payload, or other direct input to a deserializer).

benalexau · 2017-07-12T09:19:34Z

A Comparator would need to be provided by the user, or using a default implementation per BufferProxy (though implementing a performant Comparator is non-trivial).

I've just pushed a commit that contains tested comparators for all current BufferProxys as prerequisite work for range support. Can @krisskross please take a look? Also would anyone else like to write the range code are you guys happy for me to get started with it?

@krisskross

Netty already provides Comparable<ByteBuf> so we delegate there. The work shared by @krisskross on DirectBuffer comparable support at https://github.com/deephacks/RxLMDB/blob/master/rxlmdb/src/main/java/org/deephacks/rxlmdb/DirectBufferComparator.java was updated to Java 8. This in turn serves as the basis of the other BufferProxy comparable implementations. Public static methods are provided for end user convenience in case they wish to compare buffers. ComparatorTest was developed using String.compareTo(String) as the reference of what is expected for varying-length strings. ComparatorTest therefore ensures all implementations return the same comparable integer result for the same inputs. This gives some confidence the implementations are correct, as they align across Java String, Netty and the hand-written comparable methods. No attempt has been made to benchmark or optimise the hand-written comparable methods at this stage.

This commit captures the available key ranges and unit tests that confirm the JavaDoc specification is correct. The KeyRange class provides package protected methods that encapsulate the iteration and control logic such that the eventual Iterator class will not have any awareness of the various KeyRangeTypes and it should be possible to add new KeyRangeTypes without modifying the actual Iterator.

This reverts commit da78abd.

This is possible because KeyRangeType.BACKWARD and KeyRangeType.FORWARD do not use start/stop buffers and thus can be static singletons to reduce allocation and GC pressure.

Existing code should continue to work without modification, as confirmed by the unmodified CursorIteratorTest.

benalexau · 2017-07-14T03:33:51Z

All done. Please take a look and let me know if this meets the CursorIterator flexibility goals.

I reverted KeyValPopulator, as the flexibility now offered by KeyRange requires extended scope for start and stop keys with certain KeyRangeTypes (eg FORWARD_RANGE). Previously CursorIterator only required the passed key for the initial get, so one could trigger this via hasNext() and then deallocate the key buffer before returning control flow. With extended start and stop key scopes now making this infeasible, an iterator wrapper such as the aforementioned DaoIterator is necessary to deallocate those buffers.

krisskross · 2017-07-14T13:03:27Z

Nice work! I think the changes look good.

The only thing that feels a bit unclear is whether keys are inclusive or exclusive. Ideally we should let users choose for themselves.

I suppose that we intend to let users provide their own comparator by overriding the BufferProxy.compare method, right? I think this is good since there are quite a few comparators out there that users might want to use for performance reasons. Maybe we could add a test using (for example) the lexicographical comparators in UnsignedBytes and SignedBytes?

benalexau · 2017-07-14T13:26:07Z

The only thing that feels a bit unclear is whether keys are inclusive or exclusive. Ideally we should let users choose for themselves.

I too found the names unclear, which is why I put examples in the KeyRangeType JavaDocs. I intend to add some more KeyRangeType values to cover the exclusive cases.

I suppose that we intend to let users provide their own comparator by overriding the BufferProxy.compare method, right? I think this is good since there are quite a few comparators out there that users might want to use for performance reasons.

Users can provide their own Comparator by invoking CursorIterator<T> Dbi.iterate(Txn<T>, KeyRange<T>, Comparator<T>). Is that sufficient?

Maybe we could add a test using (for example) the lexicographical comparators in UnsignedBytes and SignedBytes?

Good idea. I'll tackle that tomorrow.

krisskross · 2017-07-14T13:35:46Z

Sounds all good!

Deferring to a widely-known external source simplifies adoption and allows us to more easily identify unsupported range types.

This simplifies KeyRange, which is the user-facing class. It also colocates the control logic with the detailed descriptions and examples of what each KeyRangeType is supposed to do, making it easier to understand and maintain.

This is an example of how users could pass their own Comparator, subject of course to the comparator reflecting the key order returned by LMDB by Cursor previous / next invocations (if a comparator would result in a different key order than LMDB will return from its B+ tree index, we cannot efficiently use LMDB to iterate the key space).

benalexau · 2017-07-15T03:39:00Z

I just pushed the final commits. LmdbJava now supports every range type offered by Guava (in both forward and backward cursor directions) and method names now reflect its terminology. There is also an example of using a Guava comparator, plus I added Guava to the ComparatorTest to further validate the project's comparators against third-party references. Are we happy to close this ticket?

krisskross · 2017-07-15T15:00:17Z

Awesome work again! Yes, the API looks great now.

I noticed the switch for the iterator. Not sure, but the code path might be more performant with separate implementation/function for each KeyRange type? Anyway, it doesn't really matter since we can always refactor it later without affecting the API.

A release might be appropriate?

benalexau · 2017-07-15T23:57:54Z

Not sure, but the code path might be more performant with separate implementation/function for each KeyRange type? Anyway, it doesn't really matter since we can always refactor it later without affecting the API.

I found the KeyRangeType switch approach provided the easiest way to see the difference between each range type, which is generally what to do with a just-fetched key. Having all of those those decisions beside one another in the same method was helpful when writing them (and I suspect reasoning about them should we receive related bug reports down the track). As the logic is fairly simple and mostly deals with enum values or final fields, I am hoping it will be optimised away or immaterial versus the broader buffer, JNI and LMDB native overhead. We have to benchmark it but I'd like to see if we receive bug reports or usability suggestions first.

A release might be appropriate?

Absolutely. I will explore #56 and generate a release.

benalexau · 2017-07-16T01:37:04Z

Commit 8ec9550 just pushed for ticket #56 may also be of interest, as it shows a user-defined comparator being passed to LMDB itself (via mdb_set_compare) and this user-defined comparator automatically being used with the resulting CursorIterator.

benalexau assigned krisskross Jul 21, 2016

benalexau unassigned krisskross Jul 5, 2017

benalexau added a commit that referenced this issue Jul 14, 2017

Revert "Add KeyValPopulator to CursorIterator" (discussed in #7)

f363cca

This reverts commit da78abd.

benalexau added a commit that referenced this issue Jul 14, 2017

Make KeyRange forward and backward static (#7)

6ab87b7

This is possible because KeyRangeType.BACKWARD and KeyRangeType.FORWARD do not use start/stop buffers and thus can be static singletons to reduce allocation and GC pressure.

benalexau added a commit that referenced this issue Jul 14, 2017

Dbi and CursorIterator changes to use KeyRange (#7)

4d6a2a7

Existing code should continue to work without modification, as confirmed by the unmodified CursorIteratorTest.

benalexau added a commit that referenced this issue Jul 14, 2017

Expand CursorIteratorTest to match KeyRangeTest cases (#7)

7cb3ca2

benalexau added a commit that referenced this issue Jul 14, 2017

Update tutorial to reflect CursorIterator changes (#7)

bb6d2a6

benalexau added a commit that referenced this issue Jul 15, 2017

Rename range descriptors to reflect Guava range terminology (#7)

7881199

Deferring to a widely-known external source simplifies adoption and allows us to more easily identify unsupported range types.

benalexau added a commit that referenced this issue Jul 15, 2017

Eliminate raw type KeyRange warnings (#7)

80f5308

benalexau added a commit that referenced this issue Jul 15, 2017

Improve KeyRangeTest assertion reporting (#7)

68f1310

benalexau added a commit that referenced this issue Jul 15, 2017

Add range support for bidirectional greaterThan (#7)

8b1e5fa

benalexau added a commit that referenced this issue Jul 15, 2017

Add range support for bidirectional lessThan (#7)

98d46bf

benalexau added a commit that referenced this issue Jul 15, 2017

Add range support for bidirectional open (#7)

51671c5

benalexau added a commit that referenced this issue Jul 15, 2017

Add range support for bidirectional closedOpen (#7)

f45addd

benalexau added a commit that referenced this issue Jul 15, 2017

Add range support for bidirectional openClosed (#7)

d268660

benalexau added a commit that referenced this issue Jul 15, 2017

Add Guava comparators to further verify project's comparators (#7)

e6a863f

benalexau added a commit that referenced this issue Jul 15, 2017

Fallback to buffer default comparator if none passed to Dbi (#7)

e806a1c

benalexau closed this as completed Jul 15, 2017

benalexau mentioned this issue Jul 16, 2017

Vote: Release 0.6.0 #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve CursorIterator #7

improve CursorIterator #7

phraktle commented Jul 10, 2016

krisskross commented Jul 14, 2016

benalexau commented Jul 5, 2017

krisskross commented Jul 6, 2017

krisskross commented Jul 7, 2017 •

edited

benalexau commented Jul 8, 2017 •

edited

benalexau commented Jul 12, 2017

benalexau commented Jul 14, 2017

krisskross commented Jul 14, 2017

benalexau commented Jul 14, 2017

krisskross commented Jul 14, 2017

benalexau commented Jul 15, 2017

krisskross commented Jul 15, 2017

benalexau commented Jul 15, 2017

benalexau commented Jul 16, 2017

improve CursorIterator #7

improve CursorIterator #7

Comments

phraktle commented Jul 10, 2016

krisskross commented Jul 14, 2016

benalexau commented Jul 5, 2017

krisskross commented Jul 6, 2017

krisskross commented Jul 7, 2017 • edited

benalexau commented Jul 8, 2017 • edited

benalexau commented Jul 12, 2017

benalexau commented Jul 14, 2017

krisskross commented Jul 14, 2017

benalexau commented Jul 14, 2017

krisskross commented Jul 14, 2017

benalexau commented Jul 15, 2017

krisskross commented Jul 15, 2017

benalexau commented Jul 15, 2017

benalexau commented Jul 16, 2017

krisskross commented Jul 7, 2017 •

edited

benalexau commented Jul 8, 2017 •

edited