New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Dbi<byte[]> support #3
Comments
The main issue is Given legacy migration is the main concern here, I've added some static helper methods that take care of |
Hi @benalexau, Thanks for looking into this. I was able to implement the wrap/unwrap methods myself easily (in fact your implementation of I'm also curious about how resizing works in general if the buffers are final? What are the safety considerations if you alter |
I feel the convenience methods does not belong in the API. We can show users how to do it through examples instead. Me and Ben discussed the address/size altering offline and our conclusion is to never modify buffers that are provided by users. The key here is ownership of the buffer. We only provide buffers that are owned by lmdbjava and these buffers last for as long as a transaction / operation and users should never used them outside the scope of a transaction. Since we never let the Cleaner kick in, we can "safely" modify the address without interfering with GC. |
Agreed that it shouldn't be convenience methods. I would suggest changing the API design / implementation so it allows for byte[] and heap allocated buffers (and can work from those directly). |
Do you mean like a typed |
From an API usage standpoint, a |
Yes, but this is problematic since a byte array cannot be re-positioned to any memory address. All interaction with LMDB goes through memory addresses and we designed the API for off-heap memory access. Is this really a problem that lmdbjava needs to deal with? If your application is on the heap then you always need that extra copy anyway? |
Isn't it two extra copies here on a simple put? on-heap |
In the case of lmdbjni, I don't think the heap buffer will be copied directly into lmdb memory. You can get the address of a byte array but i'm not sure if that's how it works. If you want good performance you really should use direct memory, maybe by reusing large pre-allocated buffers or something similar. |
You can also use the |
My point is, that it's not just laziness that leads one to want to keep using on-heap arrays and expecting performant native calls at the same time :) |
I hear you and have sympathy with this use case, but not sure how to move forward. What's your opinion @benalexau? |
First up, thanks to @phraktle for his interest in the project and taking the time to explain his use case and possible solutions. I have removed the I have no philosophical objection against Having said that, the JNI calls do appear in JNR-FFF's JNINativeInterface. However, they are labelled as |
There are two, somewhat orthogonal issues here:
I'm not yet familiar enough with the current design to comment on how to go about this. (Also, running into #10 seems to indicate that even the current buffer handling could use some love). |
I have logged jnr/jnr-ffi#68 seeking suggestions on how we might be able to use a |
For reference, here's what I ended up doing in the client code for now. There's a reserved buffer for keys reused for all operations, such as Overall, this yields a pretty good performance profile. These tricks could be encapsulated within a Note: a private final AtomicReference<ByteBuffer> bufferCache = new AtomicReference<>();
private ByteBuffer acquireKeyBuffer(byte[] key) {
if (key.length > env.getMaxKeySize()) {
throw new IllegalArgumentException("Key too long " + key.length);
}
ByteBuffer buf = bufferCache.getAndSet(null);
if (buf == null) {
buf = ByteBuffer.allocateDirect(env.getMaxKeySize());
}
buf.put(key);
buf.flip();
return buf;
}
private void releaseKeyBuffer(ByteBuffer bb) {
bb.clear();
bufferCache.lazySet(bb);
}
public void put(byte[] key, byte[] value) {
ByteBuffer k = acquireKeyBuffer(key);
try (Txn<ByteBuffer> tx = env.txnWrite()) {
ByteBuffer v = db.reserve(tx, k, value.length);
v.put(value);
tx.commit();
} finally {
releaseKeyBuffer(k);
}
} |
That's pretty neat. I'm not against implementing this if it turns out the JNR-FFI does not provide native support for byte arrays. |
One challenge with putting it in I'm hopeful JNR-FFI offers something that would allow That would still mean direct Let's wait and see what the JNR-FFI ticket yields. |
There's still no reply to the JNR-FFI issue, so it seems there's no obvious way to pass a |
One idea is to refactor the internals of (Note, that "use |
Is this only for storing data or do you need byte arrays also for reading? Instead of having a typed |
This entire request is mostly about convenience of migrating from other DB APIs, such as leveldbjni, lmdbjni, RocksJava, etc. all of which use on-heap It is also a safety concern: the on-heap copies are valid after leaving scope of Of course I was able to make things work with lmdbjava not supporting on-heap arrays (with some extra cycles spent on surprises, such as |
Why noy copy the |
I'm not sure the JNR-FFI or Project Panana will do any better than copy memory anyways after watching the JCrete conference [1] with comments from Cliff Click saying that JNR-FFI is just a convince API that will not improve performance by any means. |
As I said, I was able to use the off-heap-only
|
I don't think its worth the effort to allow for |
How about extracting the methods of |
Sounds doable. I haven't looked at the code for a while so I need to refresh my memory. Let me get back after taking a stab at it. |
@benalexau does this approach sound good to you? |
Sounds fine to me. I'd prefer a |
Agreed. I did not envision the solution that @phraktle mentioned but it feels cleaner than poking around with |
I have been poking around with different approaches that share Is this what we want? Makes me wonder if such an implementation would be more appropriate in a separate project to avoid the extra overhead in lmdbjava? |
10 types is a lot, but then again, byte arrays seem core enough to try to find a solution. Is your code at a point where you could commit it to a branch for us to take a look at (there might be another pattern we could collectively identify, like a template or delegate etc? |
Maybe we should allow |
No prototype ready yet. I was only messing around to see the effect in the code base. |
Sorry but I cannot find a clean solution for putting byte arrays and buffers under the same roof. Seems like every solution cause either and explosion of classes or code paths, including tests to cover them. But we could provide a Opinions? |
Given you've been thinking about this, if hypothetically |
I'm not familiar enough with the codebase yet to have deep intuitions here. But I'm guessing it would be easier to only address one concern at a time: 1) add support for on-heap Why are duplicate implementations of all the classes you listed required? I imagine, for example, |
The problem is how to get data out from Does this sound doable? |
I just quickly hacked something together and it seems that using a A new direct There are probably other ways to do it -- but this might be one way forward. Let me know what you think. public class ByteArrayProxy extends BufferProxy<byte[]> {
@Override
protected Holder<byte[]> allocate() {
return new Holder(new byte[0]);
}
@Override
protected void deallocate(Holder<byte[]> buff) {
}
@Override
protected void in(Holder<byte[]> buffer, Pointer ptr, long ptrAddr) {
ByteBuffer tmp = ByteBuffer.allocateDirect(buffer.get().length);
tmp.put(buffer.get());
tmp.flip();
UNSAFE.putLong(ptrAddr + STRUCT_FIELD_OFFSET_SIZE, tmp.remaining());
UNSAFE.putLong(ptrAddr + STRUCT_FIELD_OFFSET_DATA, address(tmp));
}
@Override
protected void in(Holder<byte[]> buffer, int size, Pointer ptr, long ptrAddr) {
}
@Override
protected void out(Holder<byte[]> buffer, Pointer ptr, long ptrAddr) {
final long addr = UNSAFE.getLong(ptrAddr + STRUCT_FIELD_OFFSET_DATA);
final long size = UNSAFE.getLong(ptrAddr + STRUCT_FIELD_OFFSET_SIZE);
byte[] bytes = new byte[(int) size];
buffer.wrap(bytes);
org.agrona.UnsafeAccess.UNSAFE.copyMemory(null, addr, bytes, BufferUtil.ARRAY_BASE_OFFSET, size);
}
protected final long address(final ByteBuffer buffer) {
return ((sun.nio.ch.DirectBuffer) buffer).address() + buffer.position();
}
} |
This looks encouraging. It requires a copy on the way in and the way out, but on the bright side it would mean an idiomatic API for those using It seems unlikely we'll ever be able to offer a copy-free We can only open an
Our current |
Indeed and if JNR-FFI would get better means to handle byte arrays we could easily adopt those. I'm guessing that its only right that ByteArrayProxy use JNR-FFI for safety reasons? Here's the refactored JNR-FFI version of it. public class ByteArrayProxy extends BufferProxy<byte[]> {
private static final MemoryManager MEM_MGR = RUNTIME.getMemoryManager();
@Override
protected Holder<byte[]> allocate() {
return new Holder(new byte[0]);
}
@Override
protected void deallocate(Holder<byte[]> buff) {
}
@Override
protected void in(Holder<byte[]> buffer, Pointer ptr, long ptrAddr) {
byte[] bytes = buffer.get();
Pointer pointer = MEM_MGR.allocateDirect(bytes.length);
pointer.put(0, bytes, 0, bytes.length);
ptr.putLong(STRUCT_FIELD_OFFSET_SIZE, bytes.length);
ptr.putLong(STRUCT_FIELD_OFFSET_DATA, pointer.address());
}
@Override
protected void in(Holder<byte[]> buffer, int size, Pointer ptr, long ptrAddr) {
}
@Override
protected void out(Holder<byte[]> buffer, Pointer ptr, long ptrAddr) {
final long addr = ptr.getLong(STRUCT_FIELD_OFFSET_DATA);
final int size = (int) ptr.getLong(STRUCT_FIELD_OFFSET_SIZE);
Pointer pointer = MEM_MGR.newPointer(addr, size);
byte[] bytes = new byte[size];
pointer.get(0, bytes, 0, size);
buffer.wrap(bytes);
}
} Multiple buffer types seems a little artificial at the moment. I suggest we wait until we have a well defined requirement? |
Note that I use |
Unless there aren't any objections I will try to make a PR during the weekend. |
A PR would be great. Did you take a look at the various On the multiple buffer types per |
Yes, I looked through the different Pointer types but could only find |
@phraktle I merged @krisskross's #22 a week or so ago. Do you have any further thoughts on this ticket in light of that merge and the discussion in jnr/jnr-ffi#68? |
I'll close this given I assume folks are happy and there's not much more we can do without upstream JNR-FFI enhancements in any event. Please comment here if you'd like it re-opened. |
In some contexts (such as migrating from legacy lmdbjni/leveldbjni APIs) it would be nice to have a
Dbi<byte[]>
instead of wrapping arrays withByteBuffers
and copying out the data in the caller. On first glance the design seems to indicate this should be accomplished with aBufferProxy
implementation – but it's not clear how... eg. at the point of theallocate()
call the size is unknown, etc.The text was updated successfully, but these errors were encountered: