Java: Fast writing of vectors #4881

ukclivecox · 2018-08-15T15:15:04Z

What is the fastest way to write a large vector in Java? The generated code does a loop over every element in an array.
For example in my flatbuffers generated code:

createValuesVector(FlatBufferBuilder builder, double[] data) { 
  builder.startVector(8, data.length, 8);
   for (int i = data.length - 1; i >= 0; i--) builder.addDouble(data[i]); 
   return builder.endVector(); 
}

Is there any way to use a System.arraycopy somehow which may be quicker?

The text was updated successfully, but these errors were encountered:

aardappel · 2018-08-16T21:32:46Z

That is likely quicker, yes. Of course, rather than calling the above function, you can call startVector yourself and do an arraycopy, though it would be nice if that was wrapped in a helper function and/or the generated code is updated for cases where this is possible.

ukclivecox · 2018-08-19T08:12:52Z

I think any advantage might only be for cases where the source data is in a byte[] as I haven't found a way to transfer data from a double[] to a byte[] that doesn't involve a loop over each double at some point.

aardappel · 2018-08-20T16:22:11Z

Sure, we can generate special purpose code just for byte.

stale · 2019-08-20T16:28:08Z

This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.

RalphSteinhagen · 2020-07-20T21:49:50Z

Is there any update on this?

I am dealing with double[] vectors in the order of 1k up to 1M elements. In principle, above code-snippet work also for these cases, but -- with the element-by-element access and the generation of vector -- shows a rather poor performance (roughly factor 2-3 penalty) compared to using bulk operation converting double[] to byte[] (e.g. using the Unsafe class and/or NIO buffers)

These bulk operations should be fine for the most common case where the systems where the buffers are (de-) serialized share the same endianness and bit representation and above snippets could be a fall-back if, for example, the endianness does not match.

aardappel · 2020-07-21T01:06:43Z

@RalphSteinhagen not that I know.. care to make a PR?

RalphSteinhagen · 2020-07-22T15:44:29Z

@aardappel basically sort-of-yes (since we have a working/tested prototype in another serializer that could be 'transplanted'), but overall depends on the time-line, support w.r.t. secondary issues/features, and whether other users would be interested in this as well.

The primary proof-of-concept (ie. same from/to architecture) is simple, but there are secondary questions like API changes, design choices (e.g. using JDK's Unsafe classes or NIO only), how to track/handle mixed-architectures for sender/receiver, reference implementations for C++/Java/.. etc. that would need to be followed up as well. The individual items are rather easy but the sum ...

This would be particularly important since there is already a strong/diverse user community around Flatbuffers...

aardappel · 2020-07-23T16:34:53Z

@RalphSteinhagen I thought the above would just either modify that generated loop to use arraycopy, or if not possible, add a new method. I don't understand what your concerns of architecture are about.

RalphSteinhagen · 2020-07-23T19:33:00Z

@aardappel well, unfortunately, it doesn't seem to be that simple (N.B. I dived a bit into the underlying source code): builder.addDouble(data[i]) does way more than just adding data[I] to an internal enlarging double[] array and writing this to a byte-array store, but creates an object (and key which default to 'null'), adds this to the stack, then parses the stack to a vector and writes the vector (again element-by-element) to the underlying byte buffer.

A lot of interfaces, wrappers etc. would need to be bypassed to do this efficiently.... Among other things, ReadWriteBufwould need to be enlarged to accommodate array-type primitives. Maybe this might be a too big undertaking and Flatbuffers seems to be (judging from the sources) targeted more towards small/simple data structures and size...

In some other serializer implementations, the array is taken from user-space, appended behind a small header that describes the array/primitive data format and then directly (low-level array) copied to the resulting byte-buffer... not impossible but a couple of days work and many API changes that would be only warranted if there is also a strong use-case by other users...

aardappel · 2020-07-23T21:18:18Z

@RalphSteinhagen not that I see.. addDouble simply ensures theres enough space in the buffer, then writes the bytes. We could easily add an addDoubles that allocates the space just once, and then copies in the doubles in some bulk fashion. Then call that from the generated function.

creates an object (and key which default to 'null'), adds this to the stack, then parses the stack to a vector

You appear to be reading the code for FlexBuffers. Check FlatBufferBuilder.java

@paulovap may have an opinion.

paulovap · 2020-07-23T21:55:25Z

ReadWriteBuf is an interface that exists only in FlexBuffers.

For FlatBuffers the builder is backed by an NIO's ByteBuffer. So you can only move as fast as a ByteBuffer can be. In comparison to naked byte[] it will always be slower due to the extra bound checking done by ByteBuffer.

Looping over a naked array might be helped by range check elimination done by some VMs and AFAIK you cannot get it from accessing bytes or doubles on ByteBuffer.

As @aardappel suggested addDoubles could be added with the bulk copy. And to potentially speed-up even further there could be a fast path where the data could be copied directly on ByteBuffer's backing array in other to allow range-check optimizations, thus getting very close to naked array performance.

RalphSteinhagen · 2020-07-25T03:49:23Z

@aardappel yes, was looking into the FlexBuffers builder. The FlatBufferBuilder as a sort of wrapper around the NIO ByteBuffer could indeed be more easily adapted to just support doubles but does not seem to allow insertions/additions without breaking the overall wire data format structure.

As a premise, I was looking into loosely coupled protocols where the server- (sending) side can evolve with time and the client (since there are many diverse ones) do not necessarily have to adapt the scheme for each micro-changes (ie. usually additional fields). Maybe a different use-case than FlatBuffers is designed for.

In comparison with other protocols, I found that the vector routines are rather slowish and I thought I'd make a mistake w.r.t. the usage of FlatBuffers (/FlexBuffers), thus my initial question.

aardappel · 2020-07-27T16:37:33Z

@RalphSteinhagen FlexBuffers is its own format that's different from FlatBuffers in many ways. They can be used together but are not interchangeable.

And FlatBuffers was totally designed for adding and removing fields.

evanf-humane · 2022-11-10T23:15:35Z

Just found this via Google search. @aardappel are you still open to accepting the change proposed, ie. addShorts, addDoubles, etc? If so, I'm willing to give it a shot

aardappel · 2022-11-10T23:27:40Z

@evanf-humane yes that would be very welcome I am sure. Please check the above discussion w.r.t. endianess and other issues.

stale bot added the stale label Aug 20, 2019

aardappel closed this as completed Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java: Fast writing of vectors #4881

Java: Fast writing of vectors #4881

ukclivecox commented Aug 15, 2018

aardappel commented Aug 16, 2018

ukclivecox commented Aug 19, 2018

aardappel commented Aug 20, 2018

stale bot commented Aug 20, 2019

RalphSteinhagen commented Jul 20, 2020

aardappel commented Jul 21, 2020

RalphSteinhagen commented Jul 22, 2020 •

edited

aardappel commented Jul 23, 2020

RalphSteinhagen commented Jul 23, 2020

aardappel commented Jul 23, 2020

paulovap commented Jul 23, 2020 •

edited

RalphSteinhagen commented Jul 25, 2020

aardappel commented Jul 27, 2020

evanf-humane commented Nov 10, 2022 •

edited

aardappel commented Nov 10, 2022

Java: Fast writing of vectors #4881

Java: Fast writing of vectors #4881

Comments

ukclivecox commented Aug 15, 2018

aardappel commented Aug 16, 2018

ukclivecox commented Aug 19, 2018

aardappel commented Aug 20, 2018

stale bot commented Aug 20, 2019

RalphSteinhagen commented Jul 20, 2020

aardappel commented Jul 21, 2020

RalphSteinhagen commented Jul 22, 2020 • edited

aardappel commented Jul 23, 2020

RalphSteinhagen commented Jul 23, 2020

aardappel commented Jul 23, 2020

paulovap commented Jul 23, 2020 • edited

RalphSteinhagen commented Jul 25, 2020

aardappel commented Jul 27, 2020

evanf-humane commented Nov 10, 2022 • edited

aardappel commented Nov 10, 2022

RalphSteinhagen commented Jul 22, 2020 •

edited

paulovap commented Jul 23, 2020 •

edited

evanf-humane commented Nov 10, 2022 •

edited