-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "vector of union" feature to FlatBuffers C++ #4143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR.. some minor changes needed.
@@ -356,10 +356,25 @@ struct IDLOptions { | |||
bool allow_non_utf8; | |||
|
|||
// Possible options for the more general generator below. | |||
enum Language { kJava, kCSharp, kGo, kMAX }; | |||
enum Language { | |||
kJava = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 << 0
:)
@@ -510,6 +511,14 @@ inline bool VerifyEquipment(flatbuffers::Verifier &verifier, const void *obj, Eq | |||
} | |||
} | |||
|
|||
inline bool VerifyEquipmentVector(flatbuffers::Verifier &verifier, const flatbuffers::Vector<flatbuffers::Offset<void>> *values, const flatbuffers::Vector<uint8_t> *types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot be adding vectors of unions to the samples, since this sample is used by all languages. It has to be a separate schema and test until support is ported to all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the sample schema is not changed. the code here generated is due to the change of the GenUnionPost, which is invoked on the existing union.
code_ += "inline " + UnionVectorVerifySignature(enum_def) + " {"; | ||
code_ += " assert(values->size() == types->size());"; | ||
code_ += " for (flatbuffers::uoffset_t i = 0; i < values->size(); ++i) {"; | ||
code_ += " if (!Verify" + enum_def.name + "(verifier, values->Get(i), types->GetEnum<" + enum_def.name + ">(i))) return false;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure lines fit in 80 columns (google c++ style guide)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -510,6 +511,14 @@ inline bool VerifyEquipment(flatbuffers::Verifier &verifier, const void *obj, Eq | |||
} | |||
} | |||
|
|||
inline bool VerifyEquipmentVector(flatbuffers::Verifier &verifier, const flatbuffers::Vector<flatbuffers::Offset<void>> *values, const flatbuffers::Vector<uint8_t> *types) { | |||
assert(values->size() == types->size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the sizes of these arrays are created by whoever builds these arrays, returning false is better than an assert. It be nice to add an assert in the builder code, but I am not sure where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd love to return a error code / message so it's easier to debug. for now i'd just return false as you suggested. any plan/demand to add such code/msg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task of the verifier is to find out if a buffer is safe or corrupt, and do so quickly, and by, default, without asserts. If a buffer is corrupt, that could have all sorts of reasons, including bad transmissions etc, so errors are not necessarily that useful.
But yes, rather than just return, you should wrap the condition in Verifier::Check
, which allows anyone to debug a verifier issue.
return &language_parameters[idx]; | ||
} | ||
} | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.. this function should never be called with a language that is not in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be: if (lang == java) return .. else { assert(lang == C#); return .. }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -613,6 +605,17 @@ CheckedError Parser::ParseField(StructDef &struct_def) { | |||
type.enum_def->underlying_type, &typefield)); | |||
} | |||
|
|||
if (type.base_type == BASE_TYPE_VECTOR && type.element == BASE_TYPE_UNION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -613,6 +605,17 @@ CheckedError Parser::ParseField(StructDef &struct_def) { | |||
type.enum_def->underlying_type, &typefield)); | |||
} | |||
|
|||
if (type.base_type == BASE_TYPE_VECTOR && type.element == BASE_TYPE_UNION) { | |||
// Only cpp supports the union vector feature so far. | |||
assert(opts.lang_to_generate == IDLOptions::kCpp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has to be an error instead of assert, since it comes from flags supplied by user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. make sense.
TEST_EQ(flatbuffers::LoadFile( | ||
"tests/union_vector/union_vector.fbs", false, &schemafile), true); | ||
|
||
// parse schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that you're parsing this schema, but it is not used in the code below. for this to be useful, you should also load a json file with such a vector in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Added a TODO.
const flatbuffers::Offset<Movie> movie_offset = | ||
CreateMovie(fbb, fbb.CreateVector(types), fbb.CreateVector(characters)); | ||
FinishMovieBuffer(fbb, movie_offset); | ||
uint8_t *buf = fbb.GetBufferPointer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use auto whereever possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for flatbuffers::Offset types. I personally find buf concrete type to be pretty useful.
FinishMovieBuffer(fbb, movie_offset); | ||
uint8_t *buf = fbb.GetBufferPointer(); | ||
|
||
// verify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave out comments that don't add to what the code already says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@mikkelfj you may be interested in this PR, since we want to add support for other languages eventually. |
@bei, oh, and besides the comments above, some minimal additions to the docs would be good. This can be very minimal (in Schemas.md) since it is only supported by C++, and should be noted as such. Also, JSON parsing/generating would be awesome in this PR or a followup. |
Haven't read through it all, but to the original point at top
Using two vectors is likely very efficient. First, the type field is a byte so it packs better. In the cpu it will fit comfortably in data cache and you avoid 3 wasted bytes of padding both in cpu cache and in buffer format. Second, you are possibly only interested in specific types in the union. If so, you need to access much less memory and can find the offset to the actual object with a simple array lookup. Furthermore this lookup is more likely to be in lookahead cache because it is more densely packed. The initial lookup could be more costly to feed the cache with both type and vector, but if it is a small vector, both are likely cached. If it is a large vector, then having the separate is all the more important. |
@mikkelfj : glad you agree! but yes, besides efficiency, a big motivation was that a pair of type/union would require an entirely new kind of type in FlatBuffers, whereas this doesn't. |
It is probably too late now. But one thing worth considering it requiring the layout to be a type vector padded and immediately followed by the table vector. This will enable an implementation to reference the union vector with just one pointer which is highly practical while still allowing the union vector to appear as two normal vectors to other code. A verifier would enforce this. Otherwise the C implementation needs to define a struct to carry a dereferenced union vector. |
Sorting, searching, key attribute if all tables have a key field of same type (and possibly same name)? |
type.element == BASE_TYPE_UNION) { | ||
// Only cpp supports the union vector feature so far. | ||
if (opts.lang_to_generate != IDLOptions::kCpp) { | ||
Error("The union vector feature is C++ only."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Vectors of unions are not yet supported in all the specified programming languages"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@mikkelfj : I see your point, but such a special encoding requires significant new support code across languages. The beauty of the current implementation is that it is both efficient and a very minor extension of the code that is already there. |
6c0729d
to
85a8afa
Compare
closed the PR by accident. Trying to squash the commits by using " |
Restored the PR at #4148 |
Ugh. It's too bad we can't reopen this PR. We'll submit the new PR though (linked above), which is the same change. |
Currently to have a vector of union in FlatBuffers, unions have to be encapsulated by tables. However, tables have certain overhead, e.g., it has to keep vtable per table instance, and store offsets instead of actual values for many value types. vtable itself has a few bytes overhead for store the vtable metadata, and the actual indices.
In order to minimize the overhead, improve the coding throughput and reduce the wire format size, it would be nice to support the "vector of union" feature, without paying the one table per union cost.
wvo@ pointed out that being non-invasive to the existing coding format is one of the key issues for implementing this feature.
We agreed on making the vector of union into 2 parallel fields: a vector of types and a vector of values. This will not break any existing encoding and has efficient storage. The downside is, iterating the vector has more work, as there are two underlying vectors to read from instead of just one.
This CL only adds the feature to CPP. There will be a followup CL that adds JSON support. Supports of other programming languages will be added later.