Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Feature request: do less work when deduplicating vtables #6511

Closed
rw opened this issue Mar 11, 2021 · 10 comments
Closed

[Rust] Feature request: do less work when deduplicating vtables #6511

rw opened this issue Mar 11, 2021 · 10 comments
Labels

Comments

@rw
Copy link
Collaborator

rw commented Mar 11, 2021

When writing many (say, millions) of tables, the search for vtable duplicates could potentially dominate serialization time:

https://github.com/google/flatbuffers/blob/master/rust/flatbuffers/src/builder.rs#L584

After seeing the benchmark results here, and the comments about FlatBuffers, I wonder if our N^2 algorithm is hurting our serialization speed.

@CasperN @aardappel

@aardappel
Copy link
Collaborator

That library is Rust only (and thus uses Rust as a schema).. which is an entirely different class of serializer. It is pretty easy to make hyper-efficient serialization when you directly use the language's classes as your schema, and likely will have pretty bad forwards/backwards compatibility story.

Oh you asked about vtable deduplication.

N^2? it is pretty much impossible to get to a large N for the comparison of 2 vtables, especially since if you have a lot of them that also means they must differ by a lot. Most languages first compare the length of the table, not sure if Rust does.

The only N that can get big is the number of vtables. That is a pretty extreme power-law thing, in that for most users, linear search is indeed the fastest, especially since a non-linear algorithm would require extra storage for a tree.

You could allocate a tree whenever N goes over.. 10? as a backup for the outliers. Or start sorting your existing vector of vtables if you really don't want to allocate more.

If it could be proven that the comparison of 2 vtables actually cost significant time (I doubt it), you could add a hash value just for quicker rejection.

@CasperN
Copy link
Collaborator

CasperN commented Mar 12, 2021

The article says we do worst on the minecraft saved data and looking at the code, there are a few allocations in the benchmark to hold WIPOffsets since they couldn't create vectors of strings or tables at the same time.

https://github.com/djkoloski/rust_serialization_benchmark/blob/master/src/datasets/minecraft_savedata/mod.rs

If that's the problem (I haven't profiled it), we could say its "user error" that they didn't scrub allocations and carry around their own warmed-up vectors... but, imo, this is a signal that we should improve ergonomics and performance by carrying around their WIP data, like how we do in Flexbuffers.

@aardappel
Copy link
Collaborator

Yes, taking care of WiP data would make for a much friendlier API, but it does come at cost, since the entirety of a table or vector needs then be stored in a WiP vector before it can be serialized, with type tags and all. As such I don't think its appropriate for a language that is about performance like Rust. Also don't see a way to easily make this an optional layer, but who knows y'all have ideas. I guess it could be made optional thru codegen.

@CasperN
Copy link
Collaborator

CasperN commented Mar 15, 2021

Yes, taking care of WiP data would make for a much friendlier API, but it does come at cost, since the entirety of a table or vector needs then be stored in a WiP vector before it can be serialized, with type tags

We don't have to store type tags or all the contents of a table like Flexbuffers. Maybe that was a bad comparison on my part. We just need to have scratch space where users can put vectors of WIPOffsets/structs/scalars before serializing. Users can use this API to save on allocations without having to carry around extra vectors on their own.

Essentially we're just targeting every instance of

let mut v = Vec::new();  // Allocation! BAD
for x in xs {
  let wip_x = do_stuff(&mut fbb, x);
  v.push(wip_x);
}
let wip_xs = fbb.create_vector(&v);

in the above benchmark and replacing it with

let mut v = fbb.build_vector();  // Reusing scratch space => fewer allocations
for x in xs {
  let wip_x = do_stuff(v.fbb(), x);
  v.push(wip_x);
}
let wip_xs = v.finish();

Something like the following could work.

struct VectorBuilder<T: Push + Copy, 'f>  {
  // Use scratch space in fbb to save on allocations.
  fbb: &'f mut FlatbufferBuilder;
  start: usize  // Starting index in the cache
  len: usize;
  phantom_data: Phantom<T>;
}

impl FlatBufferBuilder {
  fn build_vector<T>(&'f mut self) -> VectorBuilder<T, 'f>;
}

impl VectorBuilder<T> {
  // Pushes into cache
  fn push(&mut self, T); 
  // Borrows the flatbufferbuilder so you can make tables and strings and stuff.
  // Note that you can start a nested vector by calling fbb.wip_vector().fbb().wip_vector();
  // but due to borrowing rules you can only use the "deepest" VectorBuilder.
  fn fbb(&mut self) -> &FlatBufferBuilder; 
  // This serializes the vector to `fbb` and returns the WIPOffset, and clears the scratch space it used.
  fn finish(self) -> WIPOffset<Vector<T>>;
}

Such scratch space would be nice for #5024 too.

@aardappel
Copy link
Collaborator

Yes, that's essentially what I meant. For vectors this looks pretty easy, but for tables you'd need to store a type tag for each field in that scratch space. That does entail a copy, and a fair bit of code to be run that's not fast (with a switch over the type tag).

The C++ build uses scratch space at the bottom of the allocated builder buffer.

See, while we want to avoid Vec::new(), the bulk of tables & vectors being created already don't need that, so only would be slower with an extra copy.

@aardappel
Copy link
Collaborator

Essentially you have 3 possible scenarios:

  1. Data to be serialized already sits in a vector.
  2. Data comes from elsewhere but is serialized all inline (scalars/structs).
  3. Data comes from elsewhere and contains non-inline elements, reusable WiP vector is present.
  4. Data comes from elsewhere and contains non-inline elements, no reusable vector.

You'd want to use scratch space really only for situation 4. For the others, using scratch space is likely slower.

@CasperN
Copy link
Collaborator

CasperN commented Mar 16, 2021

Great, seems we're in general agreement on the scratch thing. Using the bottom of the allocated builder buffer seems like extra book keeping, with not much advantage to me, but sure.

I'm not sure what you mean by 1. Do you mean move data in the vector into the buffer, then overwrite where the data used to be with a WIPOffset (assuming the sizeof data is larger than WIPOffset)? Unfortunately, that's not a thing in Rust.

Good point on 2. No need to use scratch for inline types (unless length is unknown).

Situation 3 seems toilsome to users and therefore rarer than 4. Evidently, a benchmarker failed to reuse the WIP vector.

or tables you'd need to store a type tag for each field in that scratch space. That does entail a copy, and a fair bit of code to be run that's not fast (with a switch over the type tag).

Not that we should implement nested table builders, but wouldn't you only need to cache field id, offset, and size? I don't see why switch is necessary.

@aardappel
Copy link
Collaborator

With 1 I was mostly referring to inline types, though there may be rare cases where it refers to WiP offsets as well.

Situation 3 is common in C++ at least.. C++ programmers are used to doing this even outside of FlatBuffers. It is inconvenient, but again, the entirety of FlatBuffers has been designed that when convenience and speed clash, speed wins. And repeatedy serializing the same kind of vector of non-inline type is not so common that your code will be littered by it.

The switch would be necessary to serialize the final version of the table, given that the fields are of different sizes etc.?

@TethysSvensson
Copy link
Contributor

For historical context, see also #5580 which tried to solve the same problem.

@github-actions
Copy link

github-actions bot commented Nov 8, 2021

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants