Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snazzy Rc/Arc handling #139

Closed
dnorman opened this issue Mar 16, 2017 · 9 comments
Closed

Snazzy Rc/Arc handling #139

dnorman opened this issue Mar 16, 2017 · 9 comments

Comments

@dnorman
Copy link

@dnorman dnorman commented Mar 16, 2017

Hey @TyOverby thank you for authoring this module. It's something I'd very much like to use in my project. That said, poking through the code, I wasn't able to spot any handling for Rc/Arc references.

Is there presently a way, or is it conceivable within the scope of your design to serialize the owned Arc internals once, then include pointers / offsets for subsequent Rc/Arc handles? Looking for a solution to send essentially a vec containing structs, each of which references an Rc/Arc, with high repetition of the referenced values. My main concern is high deserialization ( deduplication ) effort with substantially duplicated nested content, and serialized size.

I'm imagining something to the effect of the following.

impl Serializer for HoldsArc {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        match serializer.seen_arcs_map.entry(&*(self.some_data) as *const _) {
            Vacant(e) => {
                e.insert((serializer.present_offset, *self.some_data));
                serializer.serialize_whatever(*self.some_data);
            }
            Occupied(e) => {
                serializer.serialize_u64(e.get().0)
            }
        }

    }
}

Sidebar: What are the implications for potential ser/de parallelization in the future?
Cheers :)

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Mar 16, 2017

Hi @dnorman! Unfortunately, cyclical graph serialization is out-of-scope for Bincode, though I could certainly see something be built on top of Bincode at some future date!

I have run into this before, and my solution was to build an intermediate structure that contained a flat array of shared nodes, and then a bunch of "handles" that contained indexes into the array. Unfortunately this sounds similar to what you were trying to avoid :( Also unfortunately, I don't think I've ever seen a serialization/deserialization strategy that dealt with shared nodes in a way that makes sense for Rust.

What are the implications for potential ser/de parallelization in the future?

Could you go more into detail here?

@dnorman
Copy link
Author

@dnorman dnorman commented Mar 16, 2017

Hey @TyOverby, thanks for the timely response.

To clarify, what I'm looking for is really just efficient DAG serialization. Cyclical graph serialization is a whole other can of worms IMO. In theory, a DAG could be very efficiently serialized / deserialized on a streaming basis.

IE: See a reference, have I seen it before? No? - Ok, include it right here and store the offset in case I see it again in later elements of that serialization. See a reference, seen it before? Yes? -ok, serialize only the reference.

With regard to scope, is this something you could conceive of accepting as a pull request, or is it wholly out of scope for the future of bincode?

For context: my use case is a new DAG-based distributed database I'm working on, and I'm trying to make it screaming fast.

In the short run, I'll likely take the same kind of intermediate data-structure approach you suggest, but this potentially entails copying a lot of data needlessly. Even if can be made relatively memory efficient, the CPU time to do it is duplicative, and something I'll want to zealously optimize, thus my inquiry here.

With regard to parallelization of serialization (sounds cute, right?) It seems to me like this would be desirable when sending large datastructures, using something like Rayon to parallelize the effort. Not sure if any serde projects are doing this presently (I imagine not) but it would seem to me to be the logical progression. I mention this because of it's potential incompatibility with an offset-based approach, and complexities that it would entail with coordination between the various parallelized branches of the serialization process. A bit through the looking glass here, I confess.

IMO, the slight preposterousness of all of the above really brings the following question to the fore: What should be the domain of the serializer, and what should be that of the application logic?

One could easily argue that the marginal cost of marshaling the data structure, and any parallelization efforts are well within the realm of the application – just the "cost of doing business", but I am interested in trying to think about this firstly from a performance standpoint, and see what factorizations thereof could be broadly acceptable to others. IE: I'd rather not write and maintain my own bespoke serializer, but it may be necessary for my long term mission of brutal efficiency. Am I crazy, or is this an endeavor worth pursuing outside of my weird little project? (honest question)

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Mar 16, 2017

With regard to scope, is this something you could conceive of accepting as a pull request, or is it wholly out of scope for the future of bincode?

This is wholly out of scope for bincode because bincode simply provides a binary serialization mechanim for anything that Serde can handle. Since Serde doesn't know about shared references, neither do we.

In the short run, I'll likely take the same kind of intermediate data-structure approach you suggest, but this potentially entails copying a lot of data needlessly

If you are concerned about copying a bunch of references, strings, or slices, check out the RefBox, StringBox, and SliceBox helper types. (they are in the bincode::refbox module)

when sending large datastructures, using something like Rayon to parallelize the effort

I'm not sure how this would work. Serialization and deserialization is fundamentally a linear sequence. For example

enum Foo {
    Bar,
    Baz(u32),
}

serialize(vec![Foo::Bar, Foo::Baz(5), Foo::Bar])

would yield these bytes

0 --
0  |
0  |
0  |
0  | 
0  |
0  |
3 <- Length of vec
0 --
0  |
0  |
0 <- Tag for Foo::Bar
0 --
0  |
0  |
0 <- Tag for Foo::Baz
0 --
0  |
0  |
5 <- Int32 Value for Foo::Baz
0 --
0  |
0  |
0 <- Tag for Foo::Bar

Because the values in bincode are "jagged", you can't skip to an arbitrary location in the array and start deserializing.

Also, because bincode fundamentally writes to Write objects, it would be impossible to write from multiple threads at the same time.

Furthermore, if this application is serializing to the network or the filesystem, then those components are going to be your bottleneck. I've benchmarked bincode encoding at 3GB / s on my 5 year old laptop, and even then I was bottlenecked at the speed of just writing into memory. (decoding is even faster)

Not sure if any serde projects are doing this presently.

I'm not sure that there are any serializers / deserializers that do this. You spend so much time just waiting for bytes to be read/written, that parallelization buys you nothing.

Am I crazy, or is this an endeavor worth pursuing outside of my weird little project?

Por qué no los dos? Building a general purpose RC/ARC serializer would have its uses for sure. I'd imagine that the algorithm for implementing this would look a lot like the "Mark" phase of a Mark and Sweep garbage collector.

@dtolnay
Copy link
Collaborator

@dtolnay dtolnay commented Mar 17, 2017

Here is a proof of concept of an efficient Rc<> DAG serialization/deserialization approach that works with any Serde data format.

https://gist.github.com/dtolnay/dd05bc98c438e160d142fbbc0272d92c

@dnorman if you are interested in fleshing this out further, I would recommend building a crate around this DAG layer. The main missing pieces are some unsafe TypeId shenanigans to make it work for heterogeneous Rc types, and a simple proc macro to make it work for user-defined structs. There are also ways to make the representation more compact if you care about that.

Example DAG

  A
 / \
(   B
 \ / \
  C   )
 / \ /
D   E

JSON representation

{
  "data": "A",
  "left": {
    "Marked": {
      "id": 0,
      "data": "C",
      "left": {
        "Plain": {
          "data": "D",
          "left": null,
          "right": null
        }
      },
      "right": {
        "Marked": {
          "id": 1,
          "data": "E",
          "left": null,
          "right": null
        }
      }
    }
  },
  "right": {
    "Plain": {
      "data": "B",
      "left": {
        "Reference": 0
      },
      "right": {
        "Reference": 1
      }
    }
  }
}

Annotated bincode representation

[65] // data=A
[1, 1, 0, 0, 0] // left=marked
[0, 0, 0, 0] // id=0
[67] // data=C
[1, 0, 0, 0, 0] // left=plain
[68] // data=D
[0, 0] // left=none right=none
[1, 1, 0, 0, 0] // right=marked
[1, 0, 0, 0] // id=1
[69] // data=E
[0, 0] // left=none right=none
[1, 0, 0, 0, 0] // right=plain
[66] // data=B
[1, 2, 0, 0, 0] // left=reference
[0, 0, 0, 0] // id=0
[1, 2, 0, 0, 0] // right=reference
[1, 0, 0, 0] // id=1
@dnorman
Copy link
Author

@dnorman dnorman commented Mar 17, 2017

@TyOverby Thanks again for the most excellent feedback.
Some thoughts:

I'm not sure how this would work. Serialization and deserialization is fundamentally a linear sequence. For example

With respect, I would argue that a serialization (noun) is a fundamentally linear sequence, but serialization and deserialization (verb) need not be. The way I mean this is that some data structures may be quite computationally intensive to serialize / deserialize. Your statements about bottlenecks are completely apt for cases where only memory copying is entailed, but this is not always so.

This is wholly out of scope for bincode because bincode simply provides a binary serialization mechanim for anything that Serde can handle. Since Serde doesn't know about shared references, neither do we.

I suppose this is really the question I was attempting to ask, but without quite realizing it. The precise boundaries of what Serde is (and what it isn't) remain somewhat fuzzy to me. It is understood that serde does not provide an abstraction / api for shared references, but whether this is an express non-goal for the serde core, or merely a thing that wasn't yet considered valuable, is somewhat less so.

I'm not sure that there are any serializers / deserializers that do this. You spend so much time just waiting for bytes to be read/written, that parallelization buys you nothing.

In a similar vein: it's not totally clear to me precisely where, in the eyes of the serde maintainers, are the boundaries of what serde is and isn't. One definition I'd be inclined to consider is: a stream-capable interface for the transportation of data structures. Given that additional layers of marshaling and un-marshaling on top of serde would add (computational, code) complexity and limit the capacity for streaming serializations, it would seem, at least in the abstract, desirable to try to move serde and it's ecosystem more toward streaming, referential, and parallelization capabilities.

Though we may or may not agree on precisely where these boundaries ought to be set, I greatly appreciate your taking the time to clarify these things to me. It's been an enormous help.

@dtolnay Thank you for the generous provision of this code. The approach generally makes sense to me, and is approximately inline with what I was thinking. I believe this will work for my use case. That said, in my view, it would probably make sense to introduce this kind of functionality into serde itself, so as to not require playing games by instantiating the NodeToId map inside the root node Serialize impl. Of course, most serialization formats will never innately support DAG serialization – does this in your view preclude it from inclusion into the serde core library?

Of course, I'd much prefer to collaborate with the serde core / module folks in harmony with their respective visions; but in the event that's not forthcoming, I'd rather implement something bespoke rather than hacking around potentially express non-goals of serde.

Any suggestions on how I could help develop clarity about said boundaries, and roadmap for serde? If it's an "it is what it is" type of situation, that's ok. Either way, clarity is king :)

Cheers

@dtolnay
Copy link
Collaborator

@dtolnay dtolnay commented Mar 17, 2017

It is a goal of Serde to enable DAG serialization but there are much better approaches than baking it into the core APIs and requiring buy-in from every data format in order to work. The approach I recommended (once the functionality is built out in a helper crate) would be just as powerful from end users' perspective but would immediately work across all formats. The design of Serde is such that things like this do not need to be built in. They can be layered on top and be more flexible than a built-in approach.

Parallelism is exactly the same. It is a goal of Serde to enable parallel serializers and deserializers but that does not translate to parallelism-specific functionality needing to exist in the core APIs.

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Mar 17, 2017

Here's a sketch of what I think of as the "layers" of serde

layer 0 : data formats

These technically don't require serde to work (as an example, bincode used to also support rustc-serailize).

  • JSON
  • Bincode
  • YAML

Layer 1 : Serde API

Sitting in-between the data format and the serialization / deserialization implementations, the Serde API represents the method of transformation between rust structures in memory and their representation in a format.

Layer 2 : Structure Serialization / Deserialization implementation

The most common example of this would be #[derive(Serialize, Deserialize)], but it also extends to custom impls.

Layer 3 : Wrappers and optimization structures

These are utility structures like RefBox, SliceBox and StrBox that take advantage of custom impls to get performance wins.
DAG or graph serialization would likely be in this layer (though this layer and layer 2 are pretty blurry).

Layer 4 : Application layer

At this point, you aren't really thinking about serialization at all, you just take your structure and throw it into a data / transport format and everything underneath should "just work".

@dtolnay Does this analysis look right to you?

@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Mar 17, 2017

@dnorman

some data structures may be quite computationally intensive to serialize / deserialize

This may be true from an application-layer perspective, but at the layer that bincode operates, there is practically no room for parallelization.

Your statements about bottlenecks are completely apt for cases where only memory copying is entailed, but this is not always so.

Again, looking at this from the view that bincode gets, "memory copying" is our best-case scenario. Worst-case would be reading from the network. Any optimizations for more complex serialization / deserialization strategies will need to be in layers above bincode.

Given that additional layers of marshaling and un-marshaling on top of serde would add (computational, code) complexity and limit the capacity for streaming serializations, it would seem, at least in the abstract, desirable to try to move serde and it's ecosystem more toward streaming, referential, and parallelization capabilities.

Maybe I'm misunderstanding, but streaming and parallelization are fundamentally at odds here. Not having an appropriate window into the future means that only one thread of control can be making progress on a serialization stream. What would it mean for a network stream to be written to from multiple threads? 💥Chaos💥. Chaos is what it would mean.

What you could do is break your structures up into multiple chunks, serialize them (in memory) on separate threads, and then send them out on the same thread. This would happen (in my diagram) at Layer 4 (or maybe layer 3 if you were sneaky), but it wouldn't make sense in layers 1 or 2.

This process could theoretically also be done in reverse (read a bunch of bytes in, somehow figure out how to slice them up, send off to other threads for partial deserialization, join them back on some main thread), but it would be much harder, and I don't think there would be any real performance wins.

Though we may or may not agree on precisely where these boundaries ought to be set, I greatly appreciate your taking the time to clarify these things to me. It's been an enormous help.

Not at all! It's been pretty fun thinking about these things at any rate!

It's not really my place to be making critiques on your database architecture, but I don't think that parallel serialization/deserialization will help you that much even if the structural hurdles are solved. Assuming that this database is running on a machine with n physical CPU cores, as soon as you have n or more network connections to this database open and all these connections are fairly active, then spawning new threads from one of these network-connection threads to do parallel serialization / deserialization will only impact performance in a negative way.

@dtolnay
Copy link
Collaborator

@dtolnay dtolnay commented Mar 17, 2017

@TyOverby I don't think that accurately characterizes the ecosystem. I would recommend thinking of it more like this:

                       APPLICATION
                      /    / \    \
                     /    /   \    \
       data structures   /     \   data formats
                    |   /       \   |
                    |  /         \  |
                    | /           \ |
data structure adapters           data format adapters
                       \         /
                        \       /
                         \     /
                          SERDE

The application defines and/or imports the data structures and data formats that it wants to use. It also defines and/or imports zero or more adapters which mediate the interaction between data structures and data formats across Serde's APIs.

Application

  • Servo
  • Unbase

Data structures

  • UnbaseConfig
  • ListUsersResponse
  • CssGradient

Data formats

  • JSON
  • YAML
  • Bincode

Data structure adapters

Data format adapters

Serde

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.