[Rust] Feature request: do less work when deduplicating vtables #6511

rw · 2021-03-11T23:25:04Z

When writing many (say, millions) of tables, the search for vtable duplicates could potentially dominate serialization time:

https://github.com/google/flatbuffers/blob/master/rust/flatbuffers/src/builder.rs#L584

After seeing the benchmark results here, and the comments about FlatBuffers, I wonder if our N^2 algorithm is hurting our serialization speed.

@CasperN @aardappel

aardappel · 2021-03-12T00:33:32Z

That library is Rust only (and thus uses Rust as a schema).. which is an entirely different class of serializer. It is pretty easy to make hyper-efficient serialization when you directly use the language's classes as your schema, and likely will have pretty bad forwards/backwards compatibility story.

Oh you asked about vtable deduplication.

N^2? it is pretty much impossible to get to a large N for the comparison of 2 vtables, especially since if you have a lot of them that also means they must differ by a lot. Most languages first compare the length of the table, not sure if Rust does.

The only N that can get big is the number of vtables. That is a pretty extreme power-law thing, in that for most users, linear search is indeed the fastest, especially since a non-linear algorithm would require extra storage for a tree.

You could allocate a tree whenever N goes over.. 10? as a backup for the outliers. Or start sorting your existing vector of vtables if you really don't want to allocate more.

If it could be proven that the comparison of 2 vtables actually cost significant time (I doubt it), you could add a hash value just for quicker rejection.

CasperN · 2021-03-12T13:27:26Z

The article says we do worst on the minecraft saved data and looking at the code, there are a few allocations in the benchmark to hold WIPOffsets since they couldn't create vectors of strings or tables at the same time.

https://github.com/djkoloski/rust_serialization_benchmark/blob/master/src/datasets/minecraft_savedata/mod.rs

If that's the problem (I haven't profiled it), we could say its "user error" that they didn't scrub allocations and carry around their own warmed-up vectors... but, imo, this is a signal that we should improve ergonomics and performance by carrying around their WIP data, like how we do in Flexbuffers.

aardappel · 2021-03-15T18:37:18Z

Yes, taking care of WiP data would make for a much friendlier API, but it does come at cost, since the entirety of a table or vector needs then be stored in a WiP vector before it can be serialized, with type tags and all. As such I don't think its appropriate for a language that is about performance like Rust. Also don't see a way to easily make this an optional layer, but who knows y'all have ideas. I guess it could be made optional thru codegen.

CasperN · 2021-03-15T22:25:52Z

Yes, taking care of WiP data would make for a much friendlier API, but it does come at cost, since the entirety of a table or vector needs then be stored in a WiP vector before it can be serialized, with type tags

We don't have to store type tags or all the contents of a table like Flexbuffers. Maybe that was a bad comparison on my part. We just need to have scratch space where users can put vectors of WIPOffsets/structs/scalars before serializing. Users can use this API to save on allocations without having to carry around extra vectors on their own.

Essentially we're just targeting every instance of

let mut v = Vec::new();  // Allocation! BAD
for x in xs {
  let wip_x = do_stuff(&mut fbb, x);
  v.push(wip_x);
}
let wip_xs = fbb.create_vector(&v);

in the above benchmark and replacing it with

let mut v = fbb.build_vector();  // Reusing scratch space => fewer allocations
for x in xs {
  let wip_x = do_stuff(v.fbb(), x);
  v.push(wip_x);
}
let wip_xs = v.finish();

Something like the following could work.

struct VectorBuilder<T: Push + Copy, 'f>  {
  // Use scratch space in fbb to save on allocations.
  fbb: &'f mut FlatbufferBuilder;
  start: usize  // Starting index in the cache
  len: usize;
  phantom_data: Phantom<T>;
}

impl FlatBufferBuilder {
  fn build_vector<T>(&'f mut self) -> VectorBuilder<T, 'f>;
}

impl VectorBuilder<T> {
  // Pushes into cache
  fn push(&mut self, T); 
  // Borrows the flatbufferbuilder so you can make tables and strings and stuff.
  // Note that you can start a nested vector by calling fbb.wip_vector().fbb().wip_vector();
  // but due to borrowing rules you can only use the "deepest" VectorBuilder.
  fn fbb(&mut self) -> &FlatBufferBuilder; 
  // This serializes the vector to `fbb` and returns the WIPOffset, and clears the scratch space it used.
  fn finish(self) -> WIPOffset<Vector<T>>;
}

Such scratch space would be nice for #5024 too.

aardappel · 2021-03-15T23:10:57Z

Yes, that's essentially what I meant. For vectors this looks pretty easy, but for tables you'd need to store a type tag for each field in that scratch space. That does entail a copy, and a fair bit of code to be run that's not fast (with a switch over the type tag).

The C++ build uses scratch space at the bottom of the allocated builder buffer.

See, while we want to avoid Vec::new(), the bulk of tables & vectors being created already don't need that, so only would be slower with an extra copy.

aardappel · 2021-03-15T23:19:05Z

Essentially you have 3 possible scenarios:

Data to be serialized already sits in a vector.
Data comes from elsewhere but is serialized all inline (scalars/structs).
Data comes from elsewhere and contains non-inline elements, reusable WiP vector is present.
Data comes from elsewhere and contains non-inline elements, no reusable vector.

You'd want to use scratch space really only for situation 4. For the others, using scratch space is likely slower.

CasperN · 2021-03-16T00:09:22Z

Great, seems we're in general agreement on the scratch thing. Using the bottom of the allocated builder buffer seems like extra book keeping, with not much advantage to me, but sure.

I'm not sure what you mean by 1. Do you mean move data in the vector into the buffer, then overwrite where the data used to be with a WIPOffset (assuming the sizeof data is larger than WIPOffset)? Unfortunately, that's not a thing in Rust.

Good point on 2. No need to use scratch for inline types (unless length is unknown).

Situation 3 seems toilsome to users and therefore rarer than 4. Evidently, a benchmarker failed to reuse the WIP vector.

or tables you'd need to store a type tag for each field in that scratch space. That does entail a copy, and a fair bit of code to be run that's not fast (with a switch over the type tag).

Not that we should implement nested table builders, but wouldn't you only need to cache field id, offset, and size? I don't see why switch is necessary.

aardappel · 2021-03-18T17:55:10Z

With 1 I was mostly referring to inline types, though there may be rare cases where it refers to WiP offsets as well.

Situation 3 is common in C++ at least.. C++ programmers are used to doing this even outside of FlatBuffers. It is inconvenient, but again, the entirety of FlatBuffers has been designed that when convenience and speed clash, speed wins. And repeatedy serializing the same kind of vector of non-inline type is not so common that your code will be littered by it.

The switch would be necessary to serialize the final version of the table, given that the fields are of different sizes etc.?

TethysSvensson · 2021-05-10T10:30:32Z

For historical context, see also #5580 which tried to solve the same problem.

github-actions · 2021-11-08T20:34:05Z

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

djkoloski mentioned this issue Jul 8, 2021

Flatbuffers serialization could be faster djkoloski/rust_serialization_benchmark#19

Closed

CasperN mentioned this issue Aug 3, 2021

Store vtables sorted in Rust builder #6765

Merged

github-actions bot added the stale label Nov 8, 2021

github-actions bot closed this as completed Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] Feature request: do less work when deduplicating vtables #6511

[Rust] Feature request: do less work when deduplicating vtables #6511

rw commented Mar 11, 2021

aardappel commented Mar 12, 2021

CasperN commented Mar 12, 2021

aardappel commented Mar 15, 2021

CasperN commented Mar 15, 2021

aardappel commented Mar 15, 2021

aardappel commented Mar 15, 2021

CasperN commented Mar 16, 2021

aardappel commented Mar 18, 2021

TethysSvensson commented May 10, 2021

github-actions bot commented Nov 8, 2021

[Rust] Feature request: do less work when deduplicating vtables #6511

[Rust] Feature request: do less work when deduplicating vtables #6511

Comments

rw commented Mar 11, 2021

aardappel commented Mar 12, 2021

CasperN commented Mar 12, 2021

aardappel commented Mar 15, 2021

CasperN commented Mar 15, 2021

aardappel commented Mar 15, 2021

aardappel commented Mar 15, 2021

CasperN commented Mar 16, 2021

aardappel commented Mar 18, 2021

TethysSvensson commented May 10, 2021

github-actions bot commented Nov 8, 2021