Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize series using field names, not order #62

Merged

Conversation

SafariMonkey
Copy link
Contributor

Description

Add a custom Deserialize impl for Series which uses the field names in the columns field of the response to allow the inner type to deserialize itself as a map instead of relying on the field order. This means that the field order of your structs do not need to match the query field order, and prevents mistakes due to inconsistencies between those. It also means a SELECT * could go into a hashmap and work just fine if you wanted to.

I added a couple of crates while I was prototyping: smol_str and maplit (dev). These can easily be removed, but I found a 10% speedup on deserialising+serving (mocking the databse response) when using all SmolStr over String in both headers and my field values, and hashmap literals are nice in tests. (It's probably worth noting that the SmolStr change improved performance from 1700 to 1900 requests per second, but actually hitting the database it was more like a difference of 213 to 218 - certainly nothing to write home about.)

If you were interested in improving the performance, though, I imagine that the Results struct could hold the full response body + a Vec of serde_json::value::RawValue, and then you could also have a deserialize_next_borrowed<'de, T>(&'de self) -> Result<Return<T>, Error> where T: Deserialize<'de> + Send or something. I experimented with such an implementation on the deserialize-borrowed branch. I didn't go the whole way with the RawValues borrowing from the body, but using something like owning_ref it should be possible - it requires unsafe with the current API of owning_ref, though, and speaking briefly with the creator of that project today, they made it clear that they are not maintaining it at this time. I don't know if an API that copies each entire query result is really a win vs. an API that deserialises it into a structured JSON representation instead, especially as the current (structured JSON) one can just reuse the string allocations from the JSON in the result. Using the borrowing based API is also more awkward, of course, as the DatabaseQueryResult can't be a temporary... it's probably a case of overengineering.

In summary, I think the core of this PR is I think a correctness win without an API change. There are some performance improvements possible, but given the slowness of InfluxDB, they're unlikely to make enough of a difference to be noticeable.

Checklist

  • Formatted code using cargo fmt --all
  • Linted code using clippy cargo clippy --all-targets --all-features -- -D warnings (first commit in PR)
  • Updated README.md using cargo readme > README.md (no, because it mostly just deleted stuff)
  • Reviewed the diff. Did you leave any print statements or unnecessary comments?
  • Any unfinished work that warrants a separate issue captured in an issue with a TODO code comment (none)

@SafariMonkey
Copy link
Contributor Author

I made a further branch after realising that the current design does not handle GROUP BY correctly. For now I chose to copy the entire Return and Series structs as well as the entrypoint to the Deserialize implementation. This feels like a nicer API to work with than one that's generic over tagged-ness, at the expense of some copying in the code. The API can of course be changed, and struct/method names in particular are subject to bikeshedding.

My API experiment sped up by around 70% by using a "group by" on the three tags I have, presumably due to the reduction in redundancy in the payload.

Oh, and I just ran into an issue: if you don't get any results, "series" isn't returned. This fixes that.

@Empty2k12
Copy link
Collaborator

Thanks for this pull request! 👍🚀

Please excuse my slow response time, I am quite busy with university exams right now. I will be reviewing your pull request now.

@Empty2k12
Copy link
Collaborator

I have taken a look at your branches and want to thank you again for all the thoughts you have poured into improving the handling. Using the field order was one of the easier solutions at the time, but always felt quite hacky. I'm very happy with your implementation. Correctness improvements are always very welcome! 🏅

Personally, I'd have no problem adding smol_str and maplitbut I do wonder if it makes sense to keep them when the real-world speedup is so minimal. Maybe the performance improvements could be split into a separate PR, and this one being only about the improved deserialization?

How do you want to proceed with this PR? :)

Oh, and I just ran into an issue: if you don't get any results, "series" isn't returned. This fixes that.

Nice catch! I think you can open another pull-request for that and I'll merge it right away.

@SafariMonkey
Copy link
Contributor Author

Thanks for taking a look! And no worries, I've just been pointing at my branch, naturally. Good luck with the exams!

I have taken a look at your branches and want to thank you again for all the thoughts you have poured into improving the handling. Using the field order was one of the easier solutions at the time, but always felt quite hacky. I'm very happy with your implementation. Correctness improvements are always very welcome!

Glad to hear that you like it.

Personally, I'd have no problem adding smol_str and maplit but I do wonder if it makes sense to keep them when the real-world speedup is so minimal. Maybe the performance improvements could be split into a separate PR, and this one being only about the improved deserialization?

Absolutely. I originally used SmolStr for performance but not taking it out in the PR was honestly half being lazy about adding all the header slice lifetimes back in... 😅 And maplit should be easily worked around in the tests, it's just a bit less readable. I'll look into extracting those dependencies from the PR now.

Nice catch! I think you can open another pull-request for that and I'll merge it right away.

Sure, I'll extract that into a separate PR.

How do you want to proceed with this PR? :)

I've been using the series-default branch, based on this PR, without issue in my prototyping. It does not have any semver-breaking changes, as far as I can tell - only a manual implementation of Deserialize replacing an automatic one. (If someone relied on the previous field order behaviour, it could cause issues, but that seems unlikely.) I am happy for it to be merged (post-dependency-removal) if you're happy with it; if you have other concerns / suggested improvements feel free to let me know.

I also notice that you already did the lint fixes in a previous commit, so I'll remove them from this PR.

@SafariMonkey
Copy link
Contributor Author

Oh, I forgot to ask: do you want the TaggedSeries stuff added to this PR, broken out into a new PR after this is merged (my current plan), or are you not interested in it? I have a feeling that it may require more discussion, especially being an API change.

@Empty2k12
Copy link
Collaborator

It does not have any semver-breaking changes, as far as I can tell - only a manual implementation of Deserialize replacing an automatic one. (If someone relied on the previous field order behaviour, it could cause issues, but that seems unlikely.)

I'm not too worried about the API changing, as we're still below 1.0.0 and there's other breaking changes scheduled for the next release. If the changes are too big to be explained in the CHANGELOG.md, I think adding a MIGRATING.md document will make sense.

do you want the TaggedSeries stuff added to this PR, broken out into a new PR after this is merged (my current plan), or are you not interested in it? I have a feeling that it may require more discussion, especially being an API change.

It's an addition that's going to benefit everyone, so I'm interested in it. I will need to play around with the new API a bit to get a feeling for the changes. Maybe there could be a way to zip together tags and series again, maybe with a custom iterator, but I'd consider such usability improvements out of scope for this PR.
However, since you're the more experienced person of us both, I'd leave the final decision how you want to proceed up to you.

@SafariMonkey
Copy link
Contributor Author

SafariMonkey commented Aug 22, 2020

OK, so the new dependencies have been removed. I had forgotten that I could just pass around a &[String], which worked nicely.

I'm not too worried about the API changing, as we're still below 1.0.0 and there's other breaking changes scheduled for the next release. If the changes are too big to be explained in the CHANGELOG.md, I think adding a MIGRATING.md document will make sense.

Yeah, that makes sense. Rust expects patch changes to be compatible before 1.0.0, but as long the breaking change is a minor version bump that's fine.

It's an addition that's going to benefit everyone, so I'm interested in it. I will need to play around with the new API a bit to get a feeling for the changes.

Feel free. I tried a second type parameter on Series representing the tags, with some special value for no tags, but couldn't get it to a place where both the deserialisation and the API ergonomics were comfortable, mostly because the semantics are different: if there is a Tags struct you want to expect the tags key, while if it is set to the NoTags type you should reject a tags key. Without being able to check the type of T (and therefore add a weird T: Any bound), you don't know which of those to do. I spent some time writing a Deserialize impl for a NoTags struct before realising that this approach had better ergonomics with not much more code. If you have any other ideas I'd be interested.

Maybe there could be a way to zip together tags and series again, maybe with a custom iterator, but I'd consider such usability improvements out of scope for this PR.

In my case I don't want them zipped, so I don't have any strong feelings on such a feature. It would also be possible to make a Deserialize implementation that does that, but I don't know how much of the code would be able to be shared - I do have some ideas if you'd like me to attempt it, but just zipping with std::iter::repeat should do the trick otherwise.

However, since you're the more experienced person of us both, I'd leave the final decision how you want to proceed up to you.

I wouldn't assume that - I've only done some Rust as a hobby now and then over two years. This has been a learning experience for me too :)
serde docs and rust-csv's Deserialize implementation have been very helpful.
You are the author of this crate, and whatever approach is taken I want it to be one you are happy with. Take the time you need to feel out the API change, and let me know if something feels lacking or off. I aimed for a non-breaking change, but if you can think of a better API and are planning on breaking compat anyway, again, let me know.

Oh, and I'm off for the day, so if there's anything, I'll pick this back up tomorrow sometime.

@SafariMonkey SafariMonkey force-pushed the deserialize-using-column-names branch 2 times, most recently from 7ee7b37 to 8785ebc Compare August 22, 2020 01:18
@Empty2k12
Copy link
Collaborator

I have played around with the changes and am quite happy with them! I will merge when you give the final go, that you're happy with everything at its current status.

You are the author of this crate, and whatever approach is taken I want it to be one you are happy with.

I find this project to be more of a community effort to create a modern and easy to use InfluxDb client, than my personal thing. I'm happy to see contributions and progress, everything else is decided together!

@cryptoquick
Copy link

I just checked this one out, and it seems it double-quotes strings, at least in my case, just FYI. I did run into the problem it solves, and, otherwise, it seems to be a pretty drop-in replacement.

@cryptoquick
Copy link

Hi, is there any way this branch can be merged into master? It does work, and since there's a fix in the other branch for double quotes, hopefully this solves both issues, yeah?

@SafariMonkey Can you merge master into this branch and give us the final go? If you like, I can give you a thumbs up from my codebase.

@SafariMonkey
Copy link
Contributor Author

Hi, sorry for the long silence.

Life things have been getting in the way of my being able to go in and figure out how to address the issue raised by @cryptoquick. I keep meaning to pick it up but not getting to it - I was hoping to finally do so now that I'm on holiday.

I appreciate your patience, as well as the encouragement to get it done :) . I did not realize that the quoting issue had already been independently resolved, in which case it's basically good to go - that was my only outstanding concern, so I'll just rebase it, confirm it works with my code, and then give it a once over before approving.

By the way, the way this library does data insertion doesn't work with my use case at all, as we have custom quoting requirements. (We have our own escaping system to deal with InfluxDB's ridiculous quoting issues.) Fortunately, I'm not looking to use this library to insert data, so as long as it gives me the raw form of the metrics and doesn't try to escape them on the way out itself, it should work fine.

Add a custom Deserialize impl for Series which uses the field names
in the columns field of the response to allow the inner type
to deserialize itself as a map instead of relying on the field order.
@SafariMonkey
Copy link
Contributor Author

SafariMonkey commented Sep 22, 2020

I've rebased, tried it out, and it seems to work. I've read through the changes again and they look fine, so if you are happy with them too, feel free to merge them.

I'll open a PR for the GROUP BY support too. (I can only do this after this one is merged, but you can preview it here.)

Thank you for your patience.

@Empty2k12
Copy link
Collaborator

Thank you again for this contribution! Will merge now!

@Empty2k12 Empty2k12 merged commit 2d648f3 into influxdb-rs:master Sep 22, 2020
@SafariMonkey
Copy link
Contributor Author

As promised, I posted the GROUP BY support PR: #69.

@cryptoquick
Copy link

cryptoquick commented Sep 23, 2020

BTW, we're using this code in our production branch and it works perfectly. Great work! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants