Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New class PagedBytesReference: BytesReference over pages #5427

Closed
wants to merge 1 commit into from

Conversation

@hhoffstaette
Copy link
Contributor

commented Mar 13, 2014

BytesRefererence over a BigArray/ByteArray list of pages. Passes both a boatload of new tests and the full test suite. About as compatible as it can be for now; minor perf improvements require extensions to BigArrays (see #5420)

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

public class PagedBytesReference implements BytesReference {

private ByteArray bytearray;

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

can this be final?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 13, 2014

Author Contributor

Used to be - done.

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
import java.io.OutputStream;
import java.util.Arrays;

public class PagedBytesReference implements BytesReference {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

IMO the entire class should be final

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 13, 2014

Author Contributor

No problem, done.

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public void writeTo(OutputStream os) throws IOException {
if (length == 0) {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

this check is obsolet - we can just remove it

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 13, 2014

Author Contributor

Which one, in slice() or writeTo()?

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

if (length == 0) in writeTo? the one this comment is on though :)

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
}

// for now copy byte-by-byte
for (int i = 0; i < this.length; i++) {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

I think we should not push this without doing bulk writes... maybe we can add a util method to ByteArray? @jpountz?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 13, 2014

Author Contributor

Agreed - this was just for review. I wanted to add bulk copy/hash/equals when I've done the corresponding issue.

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 13, 2014

Contributor

totally +1 on having bulk operations, we already started discussing it with @hhoffstaette

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public ChannelBuffer toChannelBuffer() {
// TODO: probably should use CompositeChannelBuffer?

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

I think we should implement a a class that wraps a ByteArray? otherwise we are loosing a lot of the opto and need to materialize the array again in toBytes?

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 14, 2014

Member

we should use CompositeChannelBuffer, but then extra care need to be taken when writing to netty, and if it should be a gathering one or not. It might make sense for it to be gathering if under ~512k, and not if above.

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
@Override
public int read(byte[] b, int bOffset, int len) throws IOException {
if (b == null) {
throw new NullPointerException();

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

IMO this is useless since it will throw an NPE anyway the check is obsolet

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
public int read(byte[] b, int bOffset, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (bOffset < 0 || len < 0 || len > b.length - bOffset) {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

this also I guess since it will throw IOOB as well?

@jpountz
jpountz reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

// undo the single-page optimization by ByteArray.get(), otherwise
// a materialized stream will contain traling garbage/zeros
// TODO: should use boolean value from get() for this?

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 13, 2014

Contributor

I think I prefer the current logic of this method to relying on the return value of get().

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 13, 2014

Author Contributor

Sure, the comment was just based on our previous discussion. Using ref.offset is clean enough.

return 0;
}

if (pos >= offset + length) {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

wait, when does this happen? this should be an error condition?

// ByteArray.get(BytesRef) does not work since it flips the
// ref's byte[] pointer, so for now we copy byte-by-byte
int written = 0;
while (written < len) {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

hmm can't you read the BytesRef once and then use array copy until it's exhausted then move to the next page? You should maybe use a BytesRef internally to hold the current page I guess since then you can just copy from that one?

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public int hashCode() {
// TODO: this would materialize to a byte[] - better: BigArrays.hashCode(ByteArray)?

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

can you add this please? this would be a very expensive hashcode! I guess you should also cache it no?

@jpountz
jpountz reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public boolean equals(Object obj) {
// TODO: this would materialize to a byte[] - better: BigArrays.equals(ByteArray, ByteArray)?

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 13, 2014

Contributor

+1

@s1monw
s1monw reviewed Mar 13, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
@Override
public boolean equals(Object obj) {
// TODO: this would materialize to a byte[] - better: BigArrays.equals(ByteArray, ByteArray)?
return Helper.bytesEqual(this, (BytesReference) obj);

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

see above - I think you should add it though

import java.io.IOException;
import java.util.Arrays;

public class PagedBytesReferenceTest extends ElasticsearchTestCase {

This comment has been minimized.

Copy link
@s1monw

s1monw Mar 13, 2014

Contributor

none of the tests uses a really big array which it should I guess.... the page size it 16kb but we never create one everything is pretty static in here can you add some nice randomization to this test?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 17, 2014

Author Contributor

Latest version has lots of randomization and also much larger composite arrays.


// ByteArray.get(BytesRef) does not work since it flips the
// ref's byte[] pointer, so for now we copy byte-by-byte
int written = 0;

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 13, 2014

Contributor

I think having friend Input/Output classes to ByteArray, as you suggested, would help make this easier.

@kimchy
kimchy reviewed Mar 14, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public BytesArray toBytesArray() {
return copyBytesArray();

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 14, 2014

Member

if this a single page, we don't have to copy, just return a pointer to the first page with a length?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 14, 2014

Author Contributor

Yes, right. I'm adding that and more tests that distinguish between cases of smaller/larger than pagesize.

@kimchy
kimchy reviewed Mar 14, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public boolean hasArray() {
return true;

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 14, 2014

Member

hasArray API concept is that it is backed by a direct reference array, so calling array() is cheap (similar to netty ChannelBuffer. In this case its not, so this should return false, and array() should return null.

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 14, 2014

Member

actually, if its a single page, then we can just reference the first page byte array, if not, then we should return false. same with array and arrayOffset.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 14, 2014

Author Contributor

hasArray() actually used to be false initially but then caused a lot of code to fail because of asserts that check hasArray() at the end of some action/block. Making it conditional would make that code fail (when assertions are enabled) depending on whatever happened. Also all other non-Netty-based impls (esp. BytesArray) always return true, which is why this always worked so far.

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 17, 2014

Member

which asserts failed? the idea of hasArray is that if its cheap to get the array for it, so any code block that fails is potentially a bug for implementations that don't have support for it.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 17, 2014

Author Contributor

I only got failures when it always returned false, and the one location I remember was SourceFieldMapper#344, which caused lots of cascading failures everywhere else. Not sure how much we can do here other than return true and convert on-demand. toArray/array is already zero-copy for common cases (non-slice < pagesize).

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 17, 2014

Author Contributor

@kimchy I have now changed hasArray() to return true/false depending on whether the length is <= pagesize, and array() to return the first page or null. With the assert in SourceFieldMapper#344 removed and only the first change to hasArray() it ran through a full test. The second change causes a repeatable failure at the end of SourceFieldMapper#createParseField() for payloads >pagesize, which I don't understand how to fix due to interaction with compression & StoredFields. This can be recreated e.g. by running GetActionTests.realtimeGetWithCompress(). Creating copies in createParseField() does not help (different exception with Translog).
So all things considered I'd rather always delegate array() to toBytes() until we can dig further into this, since it seems to be a completely different problem and what I had before worked fine everywhere.
If you want to take a look I can commit the changes with comments where they fail in this branch.

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 18, 2014

Member

@hhoffstaette I fixed the places where we didn't respect the hasArray contract in #5455, so now we can go ahead and return true when we have a single page, and false otherwise. Also, array and arrayOffset should throw an exception if its not a single page (i.e. hasArray is not true)

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Done: now throws IllegalStateException. If you have another preference please let me know.

@hhoffstaette

This comment has been minimized.

Copy link
Contributor Author

commented Mar 17, 2014

Squashed version with non-materializing equals()/hashCode(), support for zero-copy writeTo(OutputStream), composite ChannelBuffers and randomized tests with lots of corner-case fixes esp. for slices. The Netty buffers are non-gathering for now until I've read up on how Netty implements scatter/gather and the implications.

@kimchy

This comment has been minimized.

Copy link
Member

commented Mar 17, 2014

Here is a quick breakdown of gathering in Netty. Gathering tells netty if to send the list of ByteBuffer[] to the nio layer (when true), or write it one by one to the nio layer (when false). This is only enabled on Java 7 because of bugs in sending the array of buffers in 1.6. (for Netty logic, see SocketSendBufferPool#acquire,

The part that is tricky is the fact that in the NIO layer, at least in 1.7, ends up taking those ByteBuffer arrays, copying over the direct buffers, and then trying to write as much as possible. If this message is very large, that it can't be sent at a single go, then extra copying will happen over and over again. Imagine sending a 1MB, 10 of 100k buffers, and only 1 buffer was sent at one go, then it will copy back the rest of the 9 buffers to try and send them again. (see IOUtil.java in the jdk source code).

This is not the end of the world, and this is what happens today without the paged data structure, so its already an improvement (assuming we set gathering to true, which should be the default if we don't add any other logic). But, we can potentially do better, and only try and send in gathering mode if the total length of the bytes is less than, lets say 512k (or something we expect, in the most common case, to succeed). If its more, it might be worth it to send the buffers one by one.

@kimchy
kimchy reviewed Mar 18, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
// we can use gathering writes from the ChannelBuffers, but only if they are
// moderately small to prevent OOMs due to DirectBuffer allocations.
// up to 32 buffers result in gathering writes for payloads <= 512k.
return ChannelBuffers.wrappedBuffer((numBuffers <= 32), buffers);

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 18, 2014

Member

can we have this limit as a constant, and respect the PAGE_SIZE constance as well in case we decided to change it in the future?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Latest commit ignores number of pages (and thus page size) and only uses a constant of 512k.

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
}

// a slice > pagesize will always require an extra buffer for the initial fragment
int numBuffers = (length / PAGE_SIZE) + (offset != 0 ? 2 : 1);

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

I think there is an issue here when length is a multiple of PAGE_SIZE: for example I think a length of 2*PAGE_SIZE with offset == 0 should use 2 buffers, but when I do the math here, this gives: (2*PAGE_SIZE / PAGE_SIZE) + 1 = 3.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

You are completely correct, that was written too hastily. I'll fix the remainder check.

int pos = 0;

// are we a slice?
if (offset != 0) {

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

maybe the check should rather be offset % PAGE_SIZE != 0?

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
}

// a slice > pagesize will always require an extra buffer for the initial fragment
int numBuffers = (length / PAGE_SIZE) + (offset != 0 ? 2 : 1);

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

I think it has the same issue as writeTo with lengths that are multiples of PAGE_SIZE.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Yes, same code. Just to be clear, I'm not too hapy about this duplication either but the creation of the concrete buffers is too tangled to be factored out without too much effort. Ideally this is something I'd like to reduce once we have better ways of iterating over pages.

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
hash = slicehash;
}
}
return hash;

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

I'm a bit concerned that we have different code paths depending on the value of offset since this means that changing the code for BigArrays.hashCode might break this hashCode implementation.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Once BigArrays.hashCode(range) is in place this was supposed to be removed. I can reduce it for now to always use the class-local version. Same result, I guess.

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

+1 to do that and then fix BigArrays.hashCode to accept offsets in a separate change.

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
return false;
}

if (offset == 0) {

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

I think it should also check that other.offset is 0?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

I guess this is the eternal "when are two things equal" problem; it should not matter for the equality of the contents. Are a non-slice and a slice "two different things"?

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

Indeed, I think two BytesReference instances should be considered equal if they have the same content (which is what the way other children of BytesReference are implemented suggests).

My point was this path of the equals method ignores the offset of other (it starts comparing its bytes at 0 instead of other.offset.

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Ah, OK. Valid point, removed the delegation for now (as with hashCode).

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
return true;
}

PagedBytesReference other = (PagedBytesReference)obj;

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

Should it accept any BytesReference instance?

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

That's a good question. Lots of equals() impls check for class equality, and primarily I wanted to avoid accidental array materialization. I could special-case that & delegate to the generic helper as fallback, but maybe that's one of those bridges we can cross when we come to it?

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

I agree avoiding materialization is important and I'm surprised that the Helper currently materializes if hasArray returns false. I just opened a change request for it #5517 so that it's safe to fallback to it.

@kimchy
kimchy reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
@Override
public byte[] array() {
if (hasArray()) {
return toBytes();

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 24, 2014

Member

is this correct? this will return a copied array if offset > 0, yet the arrayOffset method will return the offset into an array that has offset 0... .

This comment has been minimized.

Copy link
@kimchy

kimchy Mar 24, 2014

Member

I think we should return the BytesRef array we get from reading from byte array, and arrayOffset will use the same logic, and return the offset form the BytesRef

This comment has been minimized.

Copy link
@hhoffstaette

hhoffstaette Mar 24, 2014

Author Contributor

Both done now.

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
throw new ElasticsearchIllegalArgumentException("can't slice a buffer with length [" + length() + "], with slice parameters from [" + from + "], length [" + length + "]");
}

return new PagedBytesReference(bigarrays, bytearray, from, length);

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

s/from/offset + from/ ?

}

// this would indicate that our numBuffer calculation is off by one.
assert (numBuffers == bufferSlot);

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

good that you added this assertion :)

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
// a remaining fragment < pagesize needs at least one buffer
numBuffers += (pages == 0) ? 1 : pages;
// a remainder that is not a multiple of pagesize also needs an extra buffer
numBuffers += (pages > 0 && remainder % PAGE_SIZE > 0) ? 1 : 0;

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

maybe we can extract this computation of numBuffers to a helper method so that it can be used by both toChannelBuffer and writeTo?

@jpountz
jpountz reviewed Mar 24, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated

@Override
public boolean hasArray() {
return (length <= PAGE_SIZE);

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 24, 2014

Contributor

BigByteArray materializes as soon as you ask for something that goes across several pages. So I think this should returnoffset + length <= PAGE_SIZE instead.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2014

It looks good to me now. Maybe one last thing that should be fixed would be to make the equals method accept any BytesReference instance by falling back to BytesReference.Helper.bytesEquals (that should soon not materialize arrays anymore when #5517 is pushed) when the other instance is not an instance of PagedBytesReference?

@jpountz
jpountz reviewed Mar 25, 2014
View changes
src/main/java/org/elasticsearch/common/bytes/PagedBytesReference.java Outdated
bytes = Arrays.copyOf(ref.bytes, length);
}
else {
// BigArray has materialized for us, no need to do it again

This comment has been minimized.

Copy link
@jpountz

jpountz Mar 25, 2014

Contributor

This assumption is wrong if offset >= N * PAGE_SIZE and offset + length < (N+1) * PAGE_SIZE. I think the only way to fix it would be to make ByteArray.get return a boolean which can be used to know whether a copy has already been performed.

@hhoffstaette

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2014

@jpountz commit 7b1c94a has the last requested change using the return value from ByteArray.get(), as discussed. (not sure why old commits are suddenly showing up here?!)

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2014

not sure why old commits are suddenly showing up here?

This might be because you rebased against master?

commit 7b1c94a has the last requested change using the return value from ByteArray.get()

Lookgs good, you have my +1. Maybe @kimchy and/or @s1monw would like to do another round before it gets merged?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2014

I only did a quick glance but it looks good - go move forward

existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
hhoffstaette added a commit that referenced this pull request Mar 25, 2014
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
hhoffstaette added a commit that referenced this pull request Mar 25, 2014
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
@hhoffstaette

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2014

Finally in master & 1.x .. \o/

@hhoffstaette hhoffstaette self-assigned this Mar 25, 2014
@hhoffstaette hhoffstaette deleted the hhoffstaette:pagedbytesref branch Mar 26, 2014
@hhoffstaette hhoffstaette removed their assignment Mar 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.