Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and derive deserialize_from #1094

Merged
merged 18 commits into from
Dec 12, 2017
Merged

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 16, 2017

The actual deserialize work is WIP, because I probably need to refactor it so it can be feature-flagged off. I think the first two commits are "good to go", though.

One thing missing from this design is deriving deserialize_from for exactly C-like enums, but that can obviously be done later.

Pushing this up now because my brain is melting trying to do the non-C-like enum work I need for webrender, and so I'm gonna take a bit of a "break" by dealing with review complaints.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 16, 2017

Oh also obviously BLATANTLY MISSING: TESTS. ANY TESTS.

@Gankra Gankra changed the title Add and derive deserialize_from WIP: Add and derive deserialize_from Nov 16, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Nov 16, 2017

Any tips on testing this kind of thing would be appreciated. My local experiments have mostly consisted of asserting that serializing and deserializing round-trips correctly.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 16, 2017

Currently grinding through the serde test suite, lots of little derive bugs.

Starting to regret changing a bunch of things from -> Body to -> (Option<Common>, Body, Option<FromBody>) in order to prevent duplicating a static.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 17, 2017

Pushed fixes:

  • Fixed bad cfg's in builtin impl commit
  • Got existing test-suite passing with derive changes

One thing I didn't realize until now is that deserialize_from kinda doesn't make any sense when doing "remote" impls?

@@ -66,13 +68,37 @@ impl<'de> Visitor<'de> for BoolVisitor {
}
}

struct BoolFromVisitor<'a>(&'a mut bool);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think primitive types need this specialization. The pointer indirection will probably be less performant than the default implementation. I'm easily convinced otherwise by benchmarks ;)

Copy link
Contributor Author

@Gankra Gankra Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I implemented this, but I seem to recall getting wins from doing this. Gonna start profiling tomorrow.

E: Error,
{
match str::from_utf8(v) {
Ok(s) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forward to visit_str instead of repeating the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deserialize() impl doesn't, and this code seems clearer without it. (happy to change if you disagree)

where
D: Deserializer<'de>,
{
// Some enum is opaque, can't do anything good here :(
Copy link
Member

@oli-obk oli-obk Nov 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you do

match *self.0 {
    Some(ref mut data) => data.deserialize_from(deserializer),
    None => Some(T::deserialize(deserializer)?),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but hard to say if this is justified. Very workload-specific.

where
A: SeqAccess<'de>,
{
$clear(&mut self.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the real optimization here to not clear, but overwrite all already existing elements, if the new sequence has more elements, append, if it has less, drop the leftover ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this for BTreeSet and BinaryHeap, so I just did the generically correct thing. Seems like it could be left as future work to split this up and further optimize it?

@dtolnay
Copy link
Member

dtolnay commented Nov 28, 2017

I have fallen behind on PRs so I haven't had a chance to review the implementation yet, but -- I would be interested in exploring some ways to modularize the serde and serde_derive crates better so that large but uncommonly required features like this one can be delivered in their own crate. The serde_state / serde_derive_state feature is in a similar situation where it seems somewhat overwhelming to maintain as part of Serde but the community that wants the feature is willing to maintain it on the side.

Maybe this means factoring a lot of the serde_derive code into a separate pre-1.0 helper crate that is depended on by serde_derive, serde_derive_state, and a deserialize_from derive crate.

Would you be interested in figuring out a structure that would work for deserialize_from as part of getting this integrated? Otherwise would it be possible to provide deserialize_from in its own crate somehow, even if it means some code duplication with serde_derive?

@Gankra
Copy link
Contributor Author

Gankra commented Nov 28, 2017

I don't really see how deserialize_from could be seperated out, realisitically? It's part of the trait definition.

@oli-obk
Copy link
Member

oli-obk commented Nov 29, 2017

I don't really see how deserialize_from could be seperated out, realisitically? It's part of the trait definition.

You can make a DeserializeFrom trait which is implemented for all Deserialize types

@Gankra
Copy link
Contributor Author

Gankra commented Nov 29, 2017

How would that work? It can't be blanketed, because actual type-specific code needs to be emitted.

@dtolnay
Copy link
Member

dtolnay commented Dec 3, 2017

Okay I can see how shipping this outside of the Deserialize trait definition would be problematic.

I am on board with accepting this as long as it does not affect compile times for anyone not using deserialize_from (basically the same concern as for clone_from). As is, when I tried this it raises compile time for the serde crate by 13% (5.9 seconds to 6.7 seconds) which is a bit much for something that few people will use. Worse, Deserialize impls already compile very slow (serde-rs/json#313) so something like 2x more generated code in every impl would not be nice.

Could you propose what you think is the best way to feature gate this?

Up to you how you want to get this merged. If it is easier, we can merge the serde pieces up front and follow up in one or more other PRs with the serde_derive pieces as those are written and tested.

One thing I don't like here is all the deserialize_from impls that are conceptually no different from the default implementation. For example the bool one:

fn visit_bool<E>(self, v: bool) -> Result<(), E> {
    *self.0 = v;
    Ok(())
}

Could you share some benchmarks that show these outperform the default deserialize_from? Is there an easy way to tell which types benefit from a custom deserialize_from? Or is a custom deserialize_from always preferable?

@Gankra
Copy link
Contributor Author

Gankra commented Dec 4, 2017

Ok finally got a chance to double check primitive stuff. Looks like, at least on webrender, removing the primitive impls is neutral, or possibly even profitable (the deserializer went from 7.1% to 6.6% of backend time; probably noise).

I'll take them out, then.

@Gankra Gankra force-pushed the deserialize_from branch 2 times, most recently from 1e49cf2 to 108764b Compare December 4, 2017 18:22
@Gankra Gankra changed the title WIP: Add and derive deserialize_from Add and derive deserialize_from Dec 4, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Dec 4, 2017

Ok I refactored out all the deserialize_from derivation code out so it could be feature'd out. Ultimately we weren't sharing a ton of code, so it's not a huge loss. Writing it that way helped me incrementalize the development process for myself better, but that doesn't matter now.

I also did a kinda naive addition of deserialize_from to the test suite.

@Gankra
Copy link
Contributor Author

Gankra commented Dec 6, 2017

Webrender, and therefore gecko, will likely be moving forward with a soft fork of serde in the next few days with these patches, plus the additional patch found on https://github.com/Gankro/serde/tree/deserialize_from_enums3

Even if you aren't ready to make a call on merging this, we'd certainly appreciate more thorough review. :)

@oli-obk
Copy link
Member

oli-obk commented Dec 7, 2017

I'd be on board with merging this PR as it is if we make the deserialize_from feature require nightly. Then we can play around with it a little without having to immediately stabilize it

@Gankra
Copy link
Contributor Author

Gankra commented Dec 7, 2017

That makes sense to me.

@Gankra
Copy link
Contributor Author

Gankra commented Dec 7, 2017

Wait, would that also put the trait method behind the feature?

@Gankra
Copy link
Contributor Author

Gankra commented Dec 7, 2017

Also CC #855

@oli-obk
Copy link
Member

oli-obk commented Dec 7, 2017

Hmm right, that could be messy with derives and such :/

The custom one was functionally identical to the default implementation given by
the Deserialize trait. If someone has benchmarks that the custom one performs
better, we can put it back.
@dtolnay
Copy link
Member

dtolnay commented Dec 11, 2017

I started making progress on the review -- pushed a few fixup commits as I go. Some remaining items:

  • Before merging, I would like to get a sense of what this means for the future of csv::Reader and serde_json::StreamDeserializer. Is there a nice way to adapt these APIs to take advantage of deserialize_from? (I have not started thinking about this; it may turn out to be an easy answer.)

  • Benchmarks! It seems obvious that this would be faster in some usage patterns, but I would want to see at least one basic not-too-contrived benchmark that I can run using a real existing data format to see an improvement from this.

  • Currently for every struct with named fields we emit two totally identical copies of the __Field enum and the __FieldVisitor and its Visitor and Deserialize impls, which is a ton of redundant code. These should be emitted in a higher scope and shared by deserialize and deserialize_from. It is not critical to fix this before merging; will file an issue to follow up.

@Gankra
Copy link
Contributor Author

Gankra commented Dec 11, 2017

  1. I don't know what's interesting about those, I'll take a look in a bit.
  2. I'll try to cut out some basic workloads. Where do you put benches?
  3. Yes, I originally put in a fair bit of work trying to avoid this, but I backtracked on it when I deintegrated stuff to support cfg(). This ends up being pretty nasty to thread through the code for things like deserializing struct-like enum variants. Not clear it's worth the effort.

@dtolnay
Copy link
Member

dtolnay commented Dec 11, 2017

The CSV reader and JSON stream deserializer are interesting because they are APIs designed for deserializing lots of instances of the same type that are processed one at a time -- seems like an obvious candidate for deserialize_from. CSV is why we filed #855 in the first place so we may as well be certain that we are addressing their use case in the best way.

There aren't really benchmarks of Serde by itself since nothing interesting happens without a data format, but JSON benchmarks live here and Bincode benchmarks live here. I wouldn't bother with either of those, not looking for anything fancy just a 30 line thing I can run and see an improvement. I would do this myself but I am about to call it a night. 😴

@dtolnay dtolnay merged commit 0b89bc9 into serde-rs:master Dec 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants