Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go]streaming / verify flatbuffers in network transfer #5793

Closed
tsingson opened this issue Mar 9, 2020 · 4 comments
Closed

[Go]streaming / verify flatbuffers in network transfer #5793

tsingson opened this issue Mar 9, 2020 · 4 comments

Comments

@tsingson
Copy link
Contributor

tsingson commented Mar 9, 2020

0. movation

i trying implementing a instance message service via tcp client / server. The Message serialized by flatbuffers, so, read streaming flatbuffers Message is must.

learning from

#3898

#5775

#4604

#3905

i will implementing verify in Go, here is some knowdge:

1. struct of serialized flatbuffers

assume serialized flatbufers ( Message ) as two part: Header ( metadata ) / Payload ( data ....)

1.1 Message Header

  1. 4 byte of root header ( , get offset of payload start

  2. zero or more then one byte of prefilled ZERO

  3. vtable struct , as below reference from builder.go, the Payload size ( data size ) include in vtable


// A vtable has the following format:

//   <VOffsetT: size of the vtable in bytes, including this value>

//   <VOffsetT: size of the object in bytes, including the vtable offset>

//   <VOffsetT: offset for a field> * N, where N is the number of fields in

//	        the schema for this type. Includes deprecated fields.

// Thus, a vtable is made of 2 + N elements, each SizeVOffsetT bytes wide.

//

// An object has the following format:

//   <SOffsetT: offset to this object's vtable (may be negative)>

//   <byte: data>+

1.2 Message payload part struct

TBD....

2. read streaming serialized flatbuffers

  1. read Message (flatbuf serialized data) from network, 4 byte by 4 byte

  2. read and parse header. check is a valid flatbufers or not

  3. get the data size of message from header, and get the rest data

  4. un-serialize the Message when know the IDL

  5. ( option ) Experimental reflaction to guess ths struct of IDL

3. a bug in builder.go , in func (b *Builder) WriteVtable()

As my assume, serialized flatbufers ( Message ) as two part: Header ( metadata ) / Payload ( data ....) .

<VOffsetT: size of the vtable in bytes, including this value> -------------- call it as vTableSize
<VOffsetT: size of the object in bytes, including the vtable offset> -------- this " size of the object" should be size of Message Payload ( call it as payloadSize )

so,

whole size of serialized flatbuffers = HeaderSize ( include vTableSize ) + payloadSize

in builder.go

		// The two metadata fields are written last.

		// First, store the object bytesize:
		objectSize := objectOffset // - b.objectEnd   ---------------------- bug
		b.PrependVOffsetT(VOffsetT(objectSize))

		// Second, store the vtable bytesize:
		vBytes := (len(b.vtable) + VtableMetadataFields) * SizeVOffsetT
		b.PrependVOffsetT(VOffsetT(vBytes))

		// Next, write the offset to the new vtable in the
		// already-allocated SOffsetT at the beginning of this object:
		objectStart := SOffsetT(len(b.Bytes)) - SOffsetT(objectOffset)
		WriteSOffsetT(b.Bytes[objectStart:],
			SOffsetT(b.Offset())-SOffsetT(objectOffset))

		// Finally, store this vtable in memory for future
		// deduplication:
		b.vtables = append(b.vtables, b.Offset())

in builder.go , line 163:

      		objectSize := objectOffset - b.objectEnd  

testing IDL like this:

table String {
    Field:string;
}

filled the Field with "1234567", serialized data should be

[1100 0 0 0 0 0 110 0 10100 0 100 0 110 0 0 0 100 0 0 0 111 0 0 0 110001 110010 110011 110100 110101 110110 110111 0]

--> Object size in 9th byte shuld be 20 --- [ 101000 ] -----> 4 byte SOffset / 4 byte vector construction / 4 byte is real size of string ( "1234567" is 7 ) / 7 byte "1234567" + 1 byte ended string with "zero" byte , no any pre-filled "zero" byte in end of string

total serialized data is 32 byte:

  • 4 byte is UOffset.
  • 2 byte pre-filled "zero" byte
  • 6 byte vTable, include 2 byte VOffset for vTable size, 2 byte VOffset for data object size, 2 byte VOffset for one slot ( in this example , only one field in table )
  • 20 byte for data , include 4 byte SOffset, 16 byte for string vector construction

but , we got:

[1100 0 0 0 0 0 110 0 1000 0 100 0 110 0 0 0 100 0 0 0 111 0 0 0 110001 110010 110011 110100 110101 110110 110111 0]

--> Object size in 9th byte is 8 --- [ 1000]

in builder.go , line 163, change to :

      		objectSize := objectOffset  

will got correct


correct object size in vTable , will help to read streaming flatbuffers .

PR is #5794

any suggestion r welcome!

@aardappel
Copy link
Collaborator

For FlatBuffers format information, please follow these documents rather than the notes you made above:
https://google.github.io/flatbuffers/flatbuffers_internals.html
https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#flatbuffers-binary-format

If you're going to write a verifier, please follow the C++ code closely, it is the authorative source on how this should be done:

// Helper class to verify the integrity of a FlatBuffer
class Verifier FLATBUFFERS_FINAL_CLASS {
public:
Verifier(const uint8_t *buf, size_t buf_len, uoffset_t _max_depth = 64,
uoffset_t _max_tables = 1000000, bool _check_alignment = true)
: buf_(buf),
size_(buf_len),
depth_(0),
max_depth_(_max_depth),
num_tables_(0),
max_tables_(_max_tables),
upper_bound_(0),
check_alignment_(_check_alignment) {
FLATBUFFERS_ASSERT(size_ < FLATBUFFERS_MAX_BUFFER_SIZE);
}
// Central location where any verification failures register.
bool Check(bool ok) const {
// clang-format off
#ifdef FLATBUFFERS_DEBUG_VERIFICATION_FAILURE
FLATBUFFERS_ASSERT(ok);
#endif
#ifdef FLATBUFFERS_TRACK_VERIFIER_BUFFER_SIZE
if (!ok)
upper_bound_ = 0;
#endif
// clang-format on
return ok;
}
// Verify any range within the buffer.
bool Verify(size_t elem, size_t elem_len) const {
// clang-format off
#ifdef FLATBUFFERS_TRACK_VERIFIER_BUFFER_SIZE
auto upper_bound = elem + elem_len;
if (upper_bound_ < upper_bound)
upper_bound_ = upper_bound;
#endif
// clang-format on
return Check(elem_len < size_ && elem <= size_ - elem_len);
}
template<typename T> bool VerifyAlignment(size_t elem) const {
return Check((elem & (sizeof(T) - 1)) == 0 || !check_alignment_);
}
// Verify a range indicated by sizeof(T).
template<typename T> bool Verify(size_t elem) const {
return VerifyAlignment<T>(elem) && Verify(elem, sizeof(T));
}
bool VerifyFromPointer(const uint8_t *p, size_t len) {
auto o = static_cast<size_t>(p - buf_);
return Verify(o, len);
}
// Verify relative to a known-good base pointer.
bool Verify(const uint8_t *base, voffset_t elem_off, size_t elem_len) const {
return Verify(static_cast<size_t>(base - buf_) + elem_off, elem_len);
}
template<typename T>
bool Verify(const uint8_t *base, voffset_t elem_off) const {
return Verify(static_cast<size_t>(base - buf_) + elem_off, sizeof(T));
}
// Verify a pointer (may be NULL) of a table type.
template<typename T> bool VerifyTable(const T *table) {
return !table || table->Verify(*this);
}
// Verify a pointer (may be NULL) of any vector type.
template<typename T> bool VerifyVector(const Vector<T> *vec) const {
return !vec || VerifyVectorOrString(reinterpret_cast<const uint8_t *>(vec),
sizeof(T));
}
// Verify a pointer (may be NULL) of a vector to struct.
template<typename T> bool VerifyVector(const Vector<const T *> *vec) const {
return VerifyVector(reinterpret_cast<const Vector<T> *>(vec));
}
// Verify a pointer (may be NULL) to string.
bool VerifyString(const String *str) const {
size_t end;
return !str || (VerifyVectorOrString(reinterpret_cast<const uint8_t *>(str),
1, &end) &&
Verify(end, 1) && // Must have terminator
Check(buf_[end] == '\0')); // Terminating byte must be 0.
}
// Common code between vectors and strings.
bool VerifyVectorOrString(const uint8_t *vec, size_t elem_size,
size_t *end = nullptr) const {
auto veco = static_cast<size_t>(vec - buf_);
// Check we can read the size field.
if (!Verify<uoffset_t>(veco)) return false;
// Check the whole array. If this is a string, the byte past the array
// must be 0.
auto size = ReadScalar<uoffset_t>(vec);
auto max_elems = FLATBUFFERS_MAX_BUFFER_SIZE / elem_size;
if (!Check(size < max_elems))
return false; // Protect against byte_size overflowing.
auto byte_size = sizeof(size) + elem_size * size;
if (end) *end = veco + byte_size;
return Verify(veco, byte_size);
}
// Special case for string contents, after the above has been called.
bool VerifyVectorOfStrings(const Vector<Offset<String>> *vec) const {
if (vec) {
for (uoffset_t i = 0; i < vec->size(); i++) {
if (!VerifyString(vec->Get(i))) return false;
}
}
return true;
}
// Special case for table contents, after the above has been called.
template<typename T> bool VerifyVectorOfTables(const Vector<Offset<T>> *vec) {
if (vec) {
for (uoffset_t i = 0; i < vec->size(); i++) {
if (!vec->Get(i)->Verify(*this)) return false;
}
}
return true;
}
__supress_ubsan__("unsigned-integer-overflow") bool VerifyTableStart(
const uint8_t *table) {
// Check the vtable offset.
auto tableo = static_cast<size_t>(table - buf_);
if (!Verify<soffset_t>(tableo)) return false;
// This offset may be signed, but doing the subtraction unsigned always
// gives the result we want.
auto vtableo = tableo - static_cast<size_t>(ReadScalar<soffset_t>(table));
// Check the vtable size field, then check vtable fits in its entirety.
return VerifyComplexity() && Verify<voffset_t>(vtableo) &&
VerifyAlignment<voffset_t>(ReadScalar<voffset_t>(buf_ + vtableo)) &&
Verify(vtableo, ReadScalar<voffset_t>(buf_ + vtableo));
}
template<typename T>
bool VerifyBufferFromStart(const char *identifier, size_t start) {
if (identifier && (size_ < 2 * sizeof(flatbuffers::uoffset_t) ||
!BufferHasIdentifier(buf_ + start, identifier))) {
return false;
}
// Call T::Verify, which must be in the generated code for this type.
auto o = VerifyOffset(start);
return o && reinterpret_cast<const T *>(buf_ + start + o)->Verify(*this)
// clang-format off
#ifdef FLATBUFFERS_TRACK_VERIFIER_BUFFER_SIZE
&& GetComputedSize()
#endif
;
// clang-format on
}
// Verify this whole buffer, starting with root type T.
template<typename T> bool VerifyBuffer() { return VerifyBuffer<T>(nullptr); }
template<typename T> bool VerifyBuffer(const char *identifier) {
return VerifyBufferFromStart<T>(identifier, 0);
}
template<typename T> bool VerifySizePrefixedBuffer(const char *identifier) {
return Verify<uoffset_t>(0U) &&
ReadScalar<uoffset_t>(buf_) == size_ - sizeof(uoffset_t) &&
VerifyBufferFromStart<T>(identifier, sizeof(uoffset_t));
}
uoffset_t VerifyOffset(size_t start) const {
if (!Verify<uoffset_t>(start)) return 0;
auto o = ReadScalar<uoffset_t>(buf_ + start);
// May not point to itself.
if (!Check(o != 0)) return 0;
// Can't wrap around / buffers are max 2GB.
if (!Check(static_cast<soffset_t>(o) >= 0)) return 0;
// Must be inside the buffer to create a pointer from it (pointer outside
// buffer is UB).
if (!Verify(start + o, 1)) return 0;
return o;
}
uoffset_t VerifyOffset(const uint8_t *base, voffset_t start) const {
return VerifyOffset(static_cast<size_t>(base - buf_) + start);
}
// Called at the start of a table to increase counters measuring data
// structure depth and amount, and possibly bails out with false if
// limits set by the constructor have been hit. Needs to be balanced
// with EndTable().
bool VerifyComplexity() {
depth_++;
num_tables_++;
return Check(depth_ <= max_depth_ && num_tables_ <= max_tables_);
}
// Called at the end of a table to pop the depth count.
bool EndTable() {
depth_--;
return true;
}
// Returns the message size in bytes
size_t GetComputedSize() const {
// clang-format off
#ifdef FLATBUFFERS_TRACK_VERIFIER_BUFFER_SIZE
uintptr_t size = upper_bound_;
// Align the size to uoffset_t
size = (size - 1 + sizeof(uoffset_t)) & ~(sizeof(uoffset_t) - 1);
return (size > size_) ? 0 : size;
#else
// Must turn on FLATBUFFERS_TRACK_VERIFIER_BUFFER_SIZE for this to work.
(void)upper_bound_;
FLATBUFFERS_ASSERT(false);
return 0;
#endif
// clang-format on
}
private:
const uint8_t *buf_;
size_t size_;
uoffset_t depth_;
uoffset_t max_depth_;
uoffset_t num_tables_;
uoffset_t max_tables_;
mutable size_t upper_bound_;
bool check_alignment_;
};
// Convenient way to bundle a buffer and its length, to pass it around
// typed by its root.
// A BufferRef does not own its buffer.
struct BufferRefBase {}; // for std::is_base_of
template<typename T> struct BufferRef : BufferRefBase {
BufferRef() : buf(nullptr), len(0), must_free(false) {}
BufferRef(uint8_t *_buf, uoffset_t _len)
: buf(_buf), len(_len), must_free(false) {}
~BufferRef() {
if (must_free) free(buf);
}
const T *GetRoot() const { return flatbuffers::GetRoot<T>(buf); }
bool Verify() {
Verifier verifier(buf, len);
return verifier.VerifyBuffer<T>(nullptr);
}
uint8_t *buf;
uoffset_t len;
bool must_free;
};
// "structs" are flat structures that do not have an offset table, thus
// always have all members present and do not support forwards/backwards
// compatible extensions.
class Struct FLATBUFFERS_FINAL_CLASS {
public:
template<typename T> T GetField(uoffset_t o) const {
return ReadScalar<T>(&data_[o]);
}
template<typename T> T GetStruct(uoffset_t o) const {
return reinterpret_cast<T>(&data_[o]);
}
const uint8_t *GetAddressOf(uoffset_t o) const { return &data_[o]; }
uint8_t *GetAddressOf(uoffset_t o) { return &data_[o]; }
private:
// private constructor & copy constructor: you obtain instances of this
// class by pointing to existing data only
Struct();
Struct(const Struct &);
Struct &operator=(const Struct &);
uint8_t data_[1];
};
// "tables" use an offset table (possibly shared) that allows fields to be
// omitted and added at will, but uses an extra indirection to read.
class Table {
public:
const uint8_t *GetVTable() const {
return data_ - ReadScalar<soffset_t>(data_);
}
// This gets the field offset for any of the functions below it, or 0
// if the field was not present.
voffset_t GetOptionalFieldOffset(voffset_t field) const {
// The vtable offset is always at the start.
auto vtable = GetVTable();
// The first element is the size of the vtable (fields + type id + itself).
auto vtsize = ReadScalar<voffset_t>(vtable);
// If the field we're accessing is outside the vtable, we're reading older
// data, so it's the same as if the offset was 0 (not present).
return field < vtsize ? ReadScalar<voffset_t>(vtable + field) : 0;
}
template<typename T> T GetField(voffset_t field, T defaultval) const {
auto field_offset = GetOptionalFieldOffset(field);
return field_offset ? ReadScalar<T>(data_ + field_offset) : defaultval;
}
template<typename P> P GetPointer(voffset_t field) {
auto field_offset = GetOptionalFieldOffset(field);
auto p = data_ + field_offset;
return field_offset ? reinterpret_cast<P>(p + ReadScalar<uoffset_t>(p))
: nullptr;
}
template<typename P> P GetPointer(voffset_t field) const {
return const_cast<Table *>(this)->GetPointer<P>(field);
}
template<typename P> P GetStruct(voffset_t field) const {
auto field_offset = GetOptionalFieldOffset(field);
auto p = const_cast<uint8_t *>(data_ + field_offset);
return field_offset ? reinterpret_cast<P>(p) : nullptr;
}
template<typename T> bool SetField(voffset_t field, T val, T def) {
auto field_offset = GetOptionalFieldOffset(field);
if (!field_offset) return IsTheSameAs(val, def);
WriteScalar(data_ + field_offset, val);
return true;
}
bool SetPointer(voffset_t field, const uint8_t *val) {
auto field_offset = GetOptionalFieldOffset(field);
if (!field_offset) return false;
WriteScalar(data_ + field_offset,
static_cast<uoffset_t>(val - (data_ + field_offset)));
return true;
}
uint8_t *GetAddressOf(voffset_t field) {
auto field_offset = GetOptionalFieldOffset(field);
return field_offset ? data_ + field_offset : nullptr;
}
const uint8_t *GetAddressOf(voffset_t field) const {
return const_cast<Table *>(this)->GetAddressOf(field);
}
bool CheckField(voffset_t field) const {
return GetOptionalFieldOffset(field) != 0;
}
// Verify the vtable of this table.
// Call this once per table, followed by VerifyField once per field.
bool VerifyTableStart(Verifier &verifier) const {
return verifier.VerifyTableStart(data_);
}
// Verify a particular field.
template<typename T>
bool VerifyField(const Verifier &verifier, voffset_t field) const {
// Calling GetOptionalFieldOffset should be safe now thanks to
// VerifyTable().
auto field_offset = GetOptionalFieldOffset(field);
// Check the actual field.
return !field_offset || verifier.Verify<T>(data_, field_offset);
}
// VerifyField for required fields.
template<typename T>
bool VerifyFieldRequired(const Verifier &verifier, voffset_t field) const {
auto field_offset = GetOptionalFieldOffset(field);
return verifier.Check(field_offset != 0) &&
verifier.Verify<T>(data_, field_offset);
}
// Versions for offsets.
bool VerifyOffset(const Verifier &verifier, voffset_t field) const {
auto field_offset = GetOptionalFieldOffset(field);
return !field_offset || verifier.VerifyOffset(data_, field_offset);
}
bool VerifyOffsetRequired(const Verifier &verifier, voffset_t field) const {
auto field_offset = GetOptionalFieldOffset(field);
return verifier.Check(field_offset != 0) &&
verifier.VerifyOffset(data_, field_offset);
}
private:
// private constructor & copy constructor: you obtain instances of this
// class by pointing to existing data only
Table();
Table(const Table &other);
Table &operator=(const Table &);
uint8_t data_[1];
};

@tsingson
Copy link
Contributor Author

tsingson commented Mar 9, 2020

thanks for comments. should be learning indeep and try again.

@tsingson

This comment has been minimized.

@brlala
Copy link

brlala commented Oct 15, 2022

Would be nice to start kicking off this feature for golang again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants