Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable serialization #151

Open
mpenet opened this issue Feb 22, 2019 · 4 comments
Open

Pluggable serialization #151

mpenet opened this issue Feb 22, 2019 · 4 comments

Comments

@mpenet
Copy link

mpenet commented Feb 22, 2019

[copying my question here, I expand a bit so I will not copy your answer here]

Do you think it would be worthwhile allowing plugable serialization? Nippy is good enough sometimes but if the user wants to get his hand dirty he can get map key caching, varints and other things like that with other formats. Also it might be important for interop with other languages potentially (at a low level).

I am mostly thinking about protobuf usage, as much as I dislike it it can be quite efficient and it's widely used. I am not advocating to use that but I think allowing to control what serialization format is used is very important, even at a low level since it can have a real impact on storage/bw costs.
Then it's the responsibility of the user to know/use the correct, potentially custom "codec" against a cluster.
It could be a pluggable bit, behind protocols like other parts of crux.

@hraberg hraberg added this to To do in XTDB Development via automation Feb 22, 2019
@hraberg hraberg added this to the Post Phase 2 Release milestone Feb 22, 2019
@hraberg
Copy link
Contributor

hraberg commented Feb 23, 2019

In Crux one can already configure Kafka to control what goes on the wire, as one can provide an extra Kafka properties file as a Crux options. This is also meant to enable various authentication and other Kafka options (including serialisation) which we cannot foresee. It's not necessarily elegant, but doable. This obviously needs to be configured consequently throughout all participating systems, there's no central authority.

By default, Crux sends data over Kafka using Nippy in the messages, and have Kafka apply Snappy compression on this. Kafka has end-to-end compression, see: https://kafka.apache.org/documentation/#design_compression

The idea (or hope) is that these things together should be a "good enough" default for most use cases. This is partly to avoid the confusion and chance of configuring Crux in an inconsistent way, as mentioned above.

And a case can also be made for the opposite, that the Kafka topics should contain raw edn (or even more general, JSON) as this makes the data easier to consume and deal with without relying on Crux or specific libraries. Kafka could still be compressing this. This can also already be configured. We should add a default edn implementation of the Kafka serdes so support this more easily.

It's worth pointing out that Crux also stores the documents locally in the KV store in Nippy format, and this is currently not configurable. The content hashes Crux use for the documents is also based on this format, so Nippy is indirectly touching many things.

Just a few reflections and thoughts, not sure if this advances or solves the issue directly though. Keeping it open for further discussion.

hraberg added a commit that referenced this issue Feb 23, 2019
…t's easy to configure it without needing other dependencies. #151.
@hraberg
Copy link
Contributor

hraberg commented Feb 23, 2019

Another reflection, something like protocol buffers could be added to Crux's transaction topic, but not easily to the document topic, as the messages there simply are maps without schema. A user could of course conceivably have a very strict set of documents with schema they allow to be transacted into Crux, and have protocol buffers supporting that.

But it points to the fact that the transaction topic (which is much smaller) and the document topic would potentially have to be treated differently if one goes down this path.

hraberg added a commit that referenced this issue Feb 23, 2019
@hraberg hraberg modified the milestone: Post Phase 2 Release Apr 23, 2019
@refset refset moved this from To do to Public Backlog in XTDB Development May 14, 2019
@hraberg hraberg removed this from the Beta milestone Jun 12, 2019
@gklijs
Copy link

gklijs commented Jun 17, 2019

Maybe something on top of crux would be more sensible/usable. Working kind of like, if keys x, y ,z are in the map, also store it in kafka topic q using serializer f. This way you you have some categories of data in sync, and easily usable both from crux and with anything using kafka?

@caioaao
Copy link

caioaao commented Jan 13, 2020

Just my 2c on some stuff: First, I think protobufs are known to be a lot slower than alternative binary serialization protocols like flatbuffers. Second: idk how useful/feasible pluggable serialization would be. I feel it should be more coupled with the txlog backend choice to be useful as a general pluggable part (serialization for crux-jdbc should be different than crux-kafka or crux-rocksdb, for instance). Disclaimer: I am not familiar at all with crux, but those came to my mind when reading here.

EDIT: flatbuffers benchmark (on c++): https://google.github.io/flatbuffers/flatbuffers_benchmarks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants