New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve CursorIterator #7
Comments
The specific case of an "end" key is that you need to know what buffer you're dealing in order to implement a compare method. Even if we make |
I recently added a One idea is to rename |
I think we should be careful about introducing domain specific concepts into the API and somehow I feel that For the range iterator, apart from knowing when iteration is complete (stop key), we also need to know where to begin (start key). Both of keys should be optional and can be either inclusive (closed) or exclusive (open). Ideally, we should also let users decide if iteration order is reversed or not. One suggestion is to introduce a separate method |
I implemented something similar in RxLMDB a while back which might serve as inspiration. |
I did consider this, but felt a wrapper class would be too verbose. I've pasted the two approaches below.
package au.com.acegi.core.lmdb;
import org.agrona.DirectBuffer;
import static org.apache.commons.lang3.Validate.notNull;
import org.lmdbjava.Cursor;
import org.lmdbjava.CursorIterator.KeyValPopulator;
import org.lmdbjava.Dbi;
import org.lmdbjava.Txn;
/**
* LmdbJava <code>KeyValPopulator</code> used with index tables.
*
* <p>
* Index tables have their values as the primary key of the data table.
* When the index table cursor is presented, its value is therefore read to
* obtain the primary key and the iterator value is fetched from the data
* table.
*/
public final class IndexKeyValPop implements KeyValPopulator<DirectBuffer> {
private final Dbi<DirectBuffer> dbi;
private final Txn<DirectBuffer> txn;
public IndexKeyValPop(final Txn<DirectBuffer> txn,
final Dbi<DirectBuffer> dbi) {
notNull(txn, "Txn required");
notNull(dbi, "Dbi required");
this.txn = txn;
this.dbi = dbi;
}
@Override
public DirectBuffer getKey(final Cursor<DirectBuffer> cursor) {
return cursor.key();
}
@Override
public DirectBuffer getVal(final Cursor<DirectBuffer> cursor) {
final DirectBuffer pk = cursor.val();
return dbi.get(txn, pk);
}
}
package au.com.acegi.core.lmdb;
import java.util.Iterator;
import java.util.NoSuchElementException;
import static org.apache.commons.lang3.Validate.notNull;
import org.lmdbjava.CursorIterator;
import org.lmdbjava.Dbi;
import org.lmdbjava.Txn;
/**
* {@link CursorIterator} that supports transparent use with index tables.
*
* <p>
* Index tables have their values as the primary key of the data table. When an
* index table cursor is iterated, its value is therefore read to obtain the
* primary key. The iterator value is then fetched from the primary key table.
*
* <p>
* This class also supports primary key tables for iterator return type
* consistency within DAOs. In this case the cursor value is simply returned
* without a lookup.
*
* @param <T> buffer type
*/
public final class DaoIterator<T> implements
Iterator<DaoIterator.KeyVal<T>>, AutoCloseable {
private final Dbi<T> dbi;
private final CursorIterator<T> delegate;
private final KeyVal<T> entry = new KeyVal<>();
private final Txn<T> txn;
/**
* Create a new instance (should be called from generated code only).
*
* <p>
* The generated iterator will delegate all calls to the passed delegate.
*
* <p>
* If this instance will wrap a cursor for an index table, the passed
* {@link Dbi} must be non-null and represent the primary key table. When
* {@link #next()} is called, the index table value will be replaced with the
* corresponding primary key table value.
*
* <p>
* If this instance will wrap a cursor for the primary key table, the passed
* {@link Dbi} must be null. This avoids the aforementioned replacement of
* values as would occur with an index table.
*
*
* @param txn transaction used within the delegate (required)
* @param delegate to pass all calls to (required)
* @param dbi primary key table (must be null if this is the PK table)
*/
public DaoIterator(final Txn<T> txn, final CursorIterator<T> delegate,
final Dbi<T> dbi) {
notNull(txn, "Transaction required");
notNull(delegate, "Delegate required");
this.txn = txn;
this.delegate = delegate;
this.dbi = dbi;
}
@Override
public void close() {
delegate.close();
}
@Override
public boolean hasNext() {
return delegate.hasNext();
}
/**
* Obtain an iterator.
*
* @return an iterator (never null)
*/
public Iterable<KeyVal<T>> iterable() {
return () -> DaoIterator.this;
}
@Override
public KeyVal<T> next() throws NoSuchElementException {
final CursorIterator.KeyVal<T> fetched = delegate.next();
entry.setKey(fetched.key());
if (dbi == null) {
entry.setVal(fetched.val());
} else {
final T pk = fetched.val();
entry.setVal(dbi.get(txn, pk));
}
return entry;
}
@Override
public void remove() {
delegate.remove();
}
/**
* Holder for a key and value pair.
*
* <p>
* The same holder instance will always be returned for a given iterator.
* The returned keys and values may change or point to different memory
* locations following changes in the iterator, cursor or transaction.
*
* @param <T> buffer type
*/
public static final class KeyVal<T> {
private T k;
private T v;
/**
* The key.
*
* @return key
*/
public T key() {
return k;
}
/**
* The value.
*
* @return value
*/
public T val() {
return v;
}
void setKey(final T key) {
this.k = key;
}
void setVal(final T val) {
this.v = val;
}
}
} As such the So while I totally agree we shouldn't introduce domain concepts, I've encountered many occasions where I needed to fetch data using more than one key. I assumed it was a common enough requirement that others would be doing so as well. I'm happy to revert
Looks good. I'm assuming we could make it buffer agnostic? It would seem we'll need to make CursorIterator<DirectBuffer> findPeopleByName(Txn<DirectBuffer> txn, KeyRangeType type, String firstName, String lastName); In the above the |
I've just pushed a commit that contains tested comparators for all current |
Netty already provides Comparable<ByteBuf> so we delegate there. The work shared by @krisskross on DirectBuffer comparable support at https://github.com/deephacks/RxLMDB/blob/master/rxlmdb/src/main/java/org/deephacks/rxlmdb/DirectBufferComparator.java was updated to Java 8. This in turn serves as the basis of the other BufferProxy comparable implementations. Public static methods are provided for end user convenience in case they wish to compare buffers. ComparatorTest was developed using String.compareTo(String) as the reference of what is expected for varying-length strings. ComparatorTest therefore ensures all implementations return the same comparable integer result for the same inputs. This gives some confidence the implementations are correct, as they align across Java String, Netty and the hand-written comparable methods. No attempt has been made to benchmark or optimise the hand-written comparable methods at this stage.
This commit captures the available key ranges and unit tests that confirm the JavaDoc specification is correct. The KeyRange class provides package protected methods that encapsulate the iteration and control logic such that the eventual Iterator class will not have any awareness of the various KeyRangeTypes and it should be possible to add new KeyRangeTypes without modifying the actual Iterator.
This is possible because KeyRangeType.BACKWARD and KeyRangeType.FORWARD do not use start/stop buffers and thus can be static singletons to reduce allocation and GC pressure.
Existing code should continue to work without modification, as confirmed by the unmodified CursorIteratorTest.
All done. Please take a look and let me know if this meets the I reverted |
Nice work! I think the changes look good. The only thing that feels a bit unclear is whether keys are inclusive or exclusive. Ideally we should let users choose for themselves. I suppose that we intend to let users provide their own comparator by overriding the |
I too found the names unclear, which is why I put examples in the
Users can provide their own
Good idea. I'll tackle that tomorrow. |
Sounds all good! |
Deferring to a widely-known external source simplifies adoption and allows us to more easily identify unsupported range types.
This simplifies KeyRange, which is the user-facing class. It also colocates the control logic with the detailed descriptions and examples of what each KeyRangeType is supposed to do, making it easier to understand and maintain.
This is an example of how users could pass their own Comparator, subject of course to the comparator reflecting the key order returned by LMDB by Cursor previous / next invocations (if a comparator would result in a different key order than LMDB will return from its B+ tree index, we cannot efficiently use LMDB to iterate the key space).
I just pushed the final commits. LmdbJava now supports every range type offered by Guava (in both forward and backward cursor directions) and method names now reflect its terminology. There is also an example of using a Guava comparator, plus I added Guava to the |
Awesome work again! Yes, the API looks great now. I noticed the switch for the iterator. Not sure, but the code path might be more performant with separate implementation/function for each KeyRange type? Anyway, it doesn't really matter since we can always refactor it later without affecting the API. A release might be appropriate? |
I found the
Absolutely. I will explore #56 and generate a release. |
Consider making
CursorIterator
more extensible. It would be reasonable to be able to subclass to provide a range iterator (i.e. a forward iterator that checks an upper bound key, or a reverse iterator checking a lower bound). Since the class is final andtryToComputeNext
is private, this is not currently feasible.Another minor point is that the state machine should probably include a
CLOSED
state (in whichhasNext
returns false, and repeated calls toclose
are idempotent).The text was updated successfully, but these errors were encountered: