Java/C#/Python prefixed size support #4445

robert-schmidtke · 2017-10-02T09:38:50Z

Hi,

this is a WIP PR that hopefully eventually adds the possibility to prefix the buffer's size to it, just like in C++ with FinishSizePrefixed and GetSizePrefixedRoot. I have successfully tested the Java part in a separate project of mine, however the C# variant is untested. I would therefore like to ask your advice on how to best add unit tests for the changes, and how to run them on either a Linux or macOS box.

In the getSizePrefixedRoot code I generate, I always assume the length is 4 bytes. Having read the documentation, this is intentional, right? Otherwise there would need to be some check (but since ByteBuffer.capacity() is an int that will not be necessary, I think).

In general I'm of course happy for any feedback, and whether my understanding of the FlatBufferBuilders is correct, and whether my approach of implementation is compatible with the C++ version (a unit test one could add).

The build is succeeding: https://travis-ci.org/robert-schmidtke/flatbuffers/builds/282171784 Having not looked too much into what make test actually does, this may not cover all cases. I'll be working simultaneously on adding tests.

Cheers
Robert

googlebot · 2017-10-02T09:38:55Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

robert-schmidtke · 2017-10-02T09:42:40Z

I signed it!

googlebot · 2017-10-02T09:42:44Z

CLAs look good, thanks!

robert-schmidtke · 2017-10-02T15:44:37Z

Both the Java and C# tests succeed as well now, the Java part of which also tests the C++ written, binary size prefixed monster file.

aardappel · 2017-10-02T22:24:50Z

You can test under Linux with Mono, see tests/FlatBuffers.Test/NetTest.sh.

You may assume size 4, yes. It be nice to use a constant, though, either and existing or new one in Constants.java and FlatBufferConstants.cs

aardappel · 2017-10-02T22:32:04Z

src/idl_gen_general.cpp

+    code += ps_method_signature + "(ByteBuffer _psbb, " + struct_def.name + " obj) { ";
+    code += "ByteBuffer _bb = _psbb." + FunctionStart('S') + "lice(); ";
+    if (lang_.language == IDLOptions::kCSharp) {
+      code += "_bb.Position = 4; ";


I have added a constant for Java and C# each. The Java one had to be explicitly public though, because the other ones are package-private.

aardappel · 2017-10-02T22:34:01Z

src/idl_gen_general.cpp

@@ -828,6 +828,33 @@ void GenStruct(StructDef &struct_def, std::string *code_ptr) {
    code += ") + _bb.";
    code += lang_.get_bb_position;
    code += ", _bb)); }\n";
+
+    // recreate both methods for the prefixed size version
+    std::string ps_method_name = FunctionStart('G') + "etSizePrefixedRootAs" +


This code has a lot of overlap with the code above, any ideas to refactor?

I could see something like I've done in lines 1335+. However the method bodies would depend on i, sort of forcing the for-switch anti-pattern.

I've pushed a version with less overlap in e84452f.

aardappel · 2017-10-02T22:35:07Z

src/idl_parser.cpp

-                file_identifier_.length() ? file_identifier_.c_str() : nullptr);
+      if (opts.prefix_size) {
+        builder_.FinishSizePrefixed(Offset<Table>(toff),
+                  file_identifier_.length() ? file_identifier_.c_str() : nullptr);


maybe at least pull the file identifier arg out of the if?

aardappel · 2017-10-02T22:38:22Z

tests/MyGame/Example/Monster.java

@@ -14,6 +14,9 @@
 public final class Monster extends Table {
  public static Monster getRootAsMonster(ByteBuffer _bb) { return getRootAsMonster(_bb, new Monster()); }
  public static Monster getRootAsMonster(ByteBuffer _bb, Monster obj) { _bb.order(ByteOrder.LITTLE_ENDIAN); return (obj.__assign(_bb.getInt(_bb.position()) + _bb.position(), _bb)); }
+  public static Monster getSizePrefixedRootAsMonster(ByteBuffer _psbb) { return getSizePrefixedRootAsMonster(_psbb, new Monster()); }
+  public static Monster getSizePrefixedRootAsMonster(ByteBuffer _psbb, Monster obj) { ByteBuffer _bb = _psbb.slice(); _bb.position(4); return getRootAsMonster(_bb, obj); }


Not sure why we're creating a new ByteBuffer here, ideally this refers to the existing one?

There is a new buffer created, however slice() uses the data of the other buffer, there is no copying involved, only position, limit etc. are reset. The same happens with the Slice() version I have added to the C# ByteBuffer. My reasoning behind this was that I wanted to avoid advancing the passed-in _psbb to avoid side-effects for the user. One possibility would be to increment the position by 4, do the regular getRootAsMonster, and then decrement it again. However I was worried about concurrent access to the buffer. Any thoughts on this? Would the +4, getRootAsMonster, -4 version be okay with you?

I'm aware we're not copying the buffer, but I generally like to avoid object allocations where possible.
I guess the + then - trick would be ok, but really we shouldn't need to touch the position at all if you see how the existing getRootAsMonster works, you can just generate a similar method that does _bb.position() + 4 on the fly.

aardappel · 2017-10-02T22:39:37Z

tests/monster_test_ps.fbs

@@ -0,0 +1,10 @@
+// similar schema, except to be parsed with --prefix-size
+
+include "monster_test.fbs";


I'd prefer it if we keep the test simple without needing yet another schema. You could simply generate a size-prefixed buffer in code, and then see if you can access it correctly.

I understand that, I have thought about it similarly. However, I'm not quite sure how else I may check the interoperability between C++ and Java (in this case), if not for a generated file. One could add another dedicated test that checks for interoperability in all supported languages? This test would first generate the regular and the size-prefixed version, and then invoke a small test for each language. This way, the new schema would not need to be versioned in git.

I'm not sure that an intra-language test is that important, since all we'd be checking is if a language uses the right size (32 bits)..

This reverts commit 68be442.

This reverts commit 2939516.

robert-schmidtke · 2017-10-03T20:17:18Z

I have removed the explicit tests for the size prefixed version, and added inline tests. This brought to my attention, that it might be useful to introduce SizePrefixedMonsterBufferHasIdentifier methods that do not rely on the fact, that the user has advanced the provided buffer by 4 bytes themselves?

aardappel · 2017-10-05T16:38:02Z

I was thinking, that rather than introducing all these new generated methods (which few people will use, and may confuse), why don't we simply have non-generated functions that can either advance a size prefixed ByteBuffer, or slice into a new ByteBuffer? There is nothing type specific about these functions. You'd then simply call getRootAsMonster(bb.FromSizePrefixed()) or whatever.

robert-schmidtke · 2018-03-06T10:01:58Z

I understand. Let me try and rework the additional code into a subclass of ByteBuffer that capsules the slicing behavior, essentially leaving the original ByteBuffer alone.

robert-schmidtke · 2018-03-06T12:52:06Z

So I managed to encapsulate the slicing behavior in a subclass of ByteBuffer, such that 'pure' ByteBuffers do not suffer from the overhead of carrying the _off. Is that more acceptable for you?

aardappel · 2018-03-08T18:54:32Z

net/FlatBuffers/ByteBuffer.cs


-        public byte[] Data { get { return _buffer; } }


Why is Data removed? was this unused?

Data was used, mainly for length checks and memory copies. I thought that fully hiding the underlying buffer was slightly better design, especially when dealing with slices. Thus, the buffer's content is modified/read using the Put* and Get* methods.

aardappel · 2018-03-08T18:55:43Z

net/FlatBuffers/ByteBuffer.cs

+            _buffer = newBuffer;
+        }
+
+        public virtual byte[] ToArray(int pos, int len)


some methods here are virtual and others not.. any reason? I don't think we intend this class to be inherited from, certainly not if it affects performance for basic functions like GetInt etc, which it may.

aardappel · 2018-03-08T18:56:41Z

net/FlatBuffers/ByteBuffer.cs

+            return ToArray(Position, Length - Position);
+        }
+
+        public byte[] ToFullArray()


what's the use of this one? Why would anyone ever want the unused part of the buffer copied?

It's mainly for testing to check that the padding is correct. This was previously done by checking against Data directly.

aardappel · 2018-03-08T18:59:44Z

net/FlatBuffers/ByteBuffer.cs

+
+        // Increases the size of the ByteBuffer, and copies the old data towards
+        // the end of the new buffer.
+        public virtual void GrowFront(int newSize)


Where did this function come from / where is it called from?

By hiding Data, I moved growing the buffer from FlatBufferBuilder to ByteBuffer. This is a 1:1 copy of that method. It's called from FlatBufferBuilder::GrowBuffer.

aardappel · 2018-03-08T19:01:02Z

net/FlatBuffers/ByteBufferSlice.cs

+            return base.GetUshort(_off + index);
+        }
+
+        public override int GetInt(int index)


This function is called a LOT, so it potentially becoming slower by being virtual I am afraid is not acceptable.

Then I frankly do not see another way to support slices, which imho is the cleanest way of representing size-prefixed buffers. Note that using ByteBuffer in Java has exactly the same design, even for non-slices. The last resort would be do create a new ByteBuffer with a separate underlying byte[] array and copy all the previous contents to it, minus the size prefix. That would be a one-time overhead for creating the slice, and about 100% memory overhead, unless the caller discards all references to the size-prefixed buffer, and lets the GC handle that.

Just FYI, I have added a small benchmark:

[FlatBuffersTestMethod] public void BenchmarkFlatBuffer() { Random rnd = new Random(0); ByteBuffer bb = new ByteBuffer(File.ReadAllBytes(@"Resources/monsterdata_test.mon")); // bb = bb.Slice(); // toggle for ByteBufferSlice performance int maxIndex = bb.Length - 4; int sum = 0; Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < 1000000; ++i) { sum += bb.GetInt(rnd.Next(maxIndex)); } sw.Stop(); // write sum to avoid loop optimization Console.WriteLine("Regular: {0}, {1} GetInt ops/ms", sum, 1000000.0 / sw.Elapsed.TotalMilliseconds); }

I ran it on my Mac ten times with the regular ByteBuffer and with the ByteBufferSlice and averaged the results. Numbers are GetInt operations per millisecond.

Version Safe Unsafe

original ByteBuffer (current master) 33,876 44,436

ByteBuffer using + _off 33,790 (-0.3%) 43,837 (-1.3%)

ByteBuffer using virtual 33,017 (-2.5%) 44,017 (-1.0%)

ByteBufferSlice using override 32,400 (-4.4%) 42,877 (-3.5%)

This is an extremely dense testing scenario, and should thus represent the worst case performance degradation.

Given these results I'd vote for the slicing behavior implementation from 5c68259, as I think a 0.3% to 1.0% performance decrease would be acceptable.

You say this is on your Mac.. does that mean these timings are from Mono, or the MS implementation?

If this is using the MS JIT, then indeed you have a good point that performance is acceptable for _off. Though part of this may be that the C# ByteBuffer is already quite slow.

I think we can do better though. If you look at how getting the root works in C# generated code:

public static Monster GetRootAsMonster(ByteBuffer _bb, Monster obj) { return (obj.__assign(_bb.GetInt(_bb.Position) + _bb.Position, _bb)); }

This tells me, that to create a root from a size-prefixed buffer, all we need to do is pass it a ByteBuffer whose _pos has been moved forward by 4 bytes. All buffer accesses from there on are absolute, using the bb_pos stored in these table accessor structs as the starting point, so no need to keep adding _off at each access. And likely more robust since we can't forget to add _off somewhere.

Am I missing something?

Hi, the timings are from Mono:

Mono JIT compiler version 5.4.1.7 (2017-06/e66d9abbb27 Wed Oct 25 12:10:41 EDT 2017) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com

Compiled with mcs:

Mono C# compiler version 5.4.1.0

I'm not sure there is an alternative on Mac? Should I get my hands on a Windows machine and benchmark it there as well?

I was thinking to separate _pos and _off, because _pos can be set externally, and therefore the size prefix could be restored (accidentally, e.g. by Reset()). For the use case where bb_pos is used as offset (when interacting with the ByteBuffer through Monster) you're correct, but since the ByteBuffer may be used rather freely I figured it'd be better to logically separate them.

I have added your suggestion, and these are the results:

Version Safe Unsafe

original ByteBuffer (current master) 33,876 44,436

ByteBuffer using + _off 33,790 (-0.3%) 43,837 (-1.3%)

ByteBuffer using virtual 33,017 (-2.5%) 44,017 (-1.0%)

ByteBufferSlice using override 32,400 (-4.4%) 42,877 (-3.5%)

ByteBuffer using advanced _pos 32,263 (-4.8%) 45,593 (+2.6%)

I don't really know why this performs so poorly, I guess it's simply not the exact same system load today than it was last time. Your approach would certainly save a lot of code.

Supposedly parts of MS's implementation are portable now and run on Linux, and I presume OS X as well, but haven't looked into it. It is entirely possible that Mono is not quite as optimized.

I'd prefer it if we use just a single variable, for efficency (possibly), especially since size-prefixing is such a minor feature compared to the base functionality. To screw this up, someone would have to load a size-prefixed buffer, slice it to obtain a new one, then call Reset on it, and somehow not understand that will get you your size prefixing back? That sounds unlikely to me.

I'm not sure if we're getting the correct results here. You're only calling bb.GetInt which doesn't even touch _pos! It directly uses its argument into _buffer.

The GetInt in the ByteBufferSlice subclass would add _off in the call to base.GetInt, whereas the GetInt in ByteBuffer does not, correct. Hence the regular vs. Slice comparison.

However the current state implements your suggestion of simply passing an advanced ByteBuffer to GetRootAsMonster. For clarity I changed the terminology from 'slice' to 'duplicate', also for the Java version.

aardappel · 2018-03-12T16:00:25Z

Otherwise looks good to me now!

robert-schmidtke · 2018-03-12T16:05:47Z

Should I squash the commits into one, or will you do that upon merging eventually?

aardappel · 2018-03-12T16:07:46Z

No, I'll squash it on merge.
Just waiting for CI. I presume you are "done" with this PR?

robert-schmidtke · 2018-03-12T16:08:57Z

Unless you have any more comments/requested changes, I don't think I have anything to add.

aardappel · 2018-03-12T18:31:01Z

Thanks for sticking with it :)

* initial changes to support size prefixed buffers in Java * add slice equivalent to CSharp ByteBuffer * resolve TODO for slicing in CSharp code generation * add newly generated Java and CSharp test sources * fix typo in comment * add FinishSizePrefixed methods to CSharp FlatBufferBuilder as well * add option to allow writing the prefix as well * generate size-prefixed monster binary as well * extend JavaTest to test the size prefixed binary as well * use constants for size prefix length * fuse common code for getRootAs and getSizePrefixedRootAs * pulled file identifier out of if * add FinishSizePrefixed, GetSizePrefixedRootAs support for Python * Revert "extend JavaTest to test the size prefixed binary as well" This reverts commit 68be442. * Revert "generate size-prefixed monster binary as well" This reverts commit 2939516. * fix ByteBuffer.cs Slice() method; add proper CSharp and Java tests * fix unused parameter * increment version number * pulled out generated methods into separate utility class * pulled out generated methods into separate utility class for Python * fix indentation * remove unnecessary comment * fix newline and copyright * add ByteBufferUtil to csproj compilation * hide ByteBuffer's internal data; track offset into parent's array * test unsafe versions as well; compile and run in debug mode * clarify help text for size prefix * move ByteBuffer slicing behavior to subclass * fix protection levels * add size prefix support for text generation * add ByteBufferSlice to csproj compilation * revert size prefix handling for nested buffers * use duplicate instead of slice for removing size prefix * remove slice subclass and use duplicate for removing size prefix * remove slice specific tests * remove superfluous command line option

robert-schmidtke added 6 commits October 2, 2017 09:29

initial changes to support size prefixed buffers in Java

9b8bb2e

add slice equivalent to CSharp ByteBuffer

058e5a0

resolve TODO for slicing in CSharp code generation

18353a4

add newly generated Java and CSharp test sources

a5e25dc

fix typo in comment

095b9a1

add FinishSizePrefixed methods to CSharp FlatBufferBuilder as well

f481bf3

robert-schmidtke added 3 commits October 2, 2017 17:31

add option to allow writing the prefix as well

081a8a1

generate size-prefixed monster binary as well

2939516

extend JavaTest to test the size prefixed binary as well

68be442

aardappel requested changes Oct 2, 2017

View reviewed changes

robert-schmidtke added 9 commits October 3, 2017 09:48

use constants for size prefix length

ee0d0ba

fuse common code for getRootAs and getSizePrefixedRootAs

e84452f

pulled file identifier out of if

0f1cdf2

Merge remote-tracking branch 'upstream/master' into java-prefixed-size

e270f0c

add FinishSizePrefixed, GetSizePrefixedRootAs support for Python

eaee8b3

Revert "extend JavaTest to test the size prefixed binary as well"

ee7939c

This reverts commit 68be442.

Revert "generate size-prefixed monster binary as well"

da7f5a9

This reverts commit 2939516.

fix ByteBuffer.cs Slice() method; add proper CSharp and Java tests

51c2ef8

fix unused parameter

658b0de

robert-schmidtke added 2 commits October 4, 2017 09:41

Merge branch 'python-prefixed-size'

4a2560a

increment version number

bd67fdf

robert-schmidtke added 2 commits October 8, 2017 15:10

Merge remote-tracking branch 'upstream/master'

916d97a

pulled out generated methods into separate utility class

d7878dd

robert-schmidtke added 3 commits March 6, 2018 13:46

clarify help text for size prefix

e68dadf

move ByteBuffer slicing behavior to subclass

e6b9811

fix protection levels

89eaf73

robert-schmidtke added 5 commits March 6, 2018 13:59

Merge remote-tracking branch 'upstream/master'

d64fe39

Merge branch 'master' into java-prefixed-size

07f7902

add size prefix support for text generation

2904d75

add ByteBufferSlice to csproj compilation

7fbba01

revert size prefix handling for nested buffers

38df1ff

aardappel requested changes Mar 8, 2018

View reviewed changes

robert-schmidtke added 6 commits March 12, 2018 12:08

Merge remote-tracking branch 'upstream/master'

d8ff9ab

Merge branch 'master' into java-prefixed-size

9113f08

use duplicate instead of slice for removing size prefix

08cf4e7

remove slice subclass and use duplicate for removing size prefix

5966dac

remove slice specific tests

8222878

remove superfluous command line option

3aac368

aardappel approved these changes Mar 12, 2018

View reviewed changes

aardappel merged commit 08cf50c into google:master Mar 12, 2018

robert-schmidtke deleted the java-prefixed-size branch March 13, 2018 08:02

iceboy233 mentioned this pull request May 1, 2018

Delete ByteBuffer.exe #4721

Merged

bjornharrtell mentioned this pull request Sep 29, 2018

Support length prefix reading/writing for TypeScript #4956

Closed

bjornharrtell mentioned this pull request May 12, 2019

JS/TS issue RangeError: start offset of Float64Array should be a multiple of 8 #5325

Closed

vglavnyy mentioned this pull request May 6, 2020

Multithreaded reading of Flatbuffer data causing data corruption [c#, flatc v1.10.0 & 1.11.0, Win10] #5745

Closed

		@@ -0,0 +1,10 @@
		// similar schema, except to be parsed with --prefix-size

		include "monster_test.fbs";

Version	Safe	Unsafe
original `ByteBuffer` (current `master`)	33,876	44,436
`ByteBuffer` using `+ _off`	33,790 (-0.3%)	43,837 (-1.3%)
`ByteBuffer` using `virtual`	33,017 (-2.5%)	44,017 (-1.0%)
`ByteBufferSlice` using `override`	32,400 (-4.4%)	42,877 (-3.5%)

Java/C#/Python prefixed size support #4445

Java/C#/Python prefixed size support #4445

Conversation

robert-schmidtke commented Oct 2, 2017 • edited Loading

googlebot commented Oct 2, 2017

robert-schmidtke commented Oct 2, 2017

googlebot commented Oct 2, 2017

robert-schmidtke commented Oct 2, 2017

aardappel commented Oct 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert-schmidtke commented Oct 3, 2017

aardappel commented Oct 5, 2017

robert-schmidtke commented Mar 6, 2018

robert-schmidtke commented Mar 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert-schmidtke Mar 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aardappel commented Mar 12, 2018

robert-schmidtke commented Mar 12, 2018

aardappel commented Mar 12, 2018

robert-schmidtke commented Mar 12, 2018

aardappel commented Mar 12, 2018

robert-schmidtke commented Oct 2, 2017 •

edited

Loading

robert-schmidtke Mar 9, 2018 •

edited

Loading