Pluggable serialization #151

mpenet · 2019-02-22T21:55:31Z

[copying my question here, I expand a bit so I will not copy your answer here]

Do you think it would be worthwhile allowing plugable serialization? Nippy is good enough sometimes but if the user wants to get his hand dirty he can get map key caching, varints and other things like that with other formats. Also it might be important for interop with other languages potentially (at a low level).

I am mostly thinking about protobuf usage, as much as I dislike it it can be quite efficient and it's widely used. I am not advocating to use that but I think allowing to control what serialization format is used is very important, even at a low level since it can have a real impact on storage/bw costs.
Then it's the responsibility of the user to know/use the correct, potentially custom "codec" against a cluster.
It could be a pluggable bit, behind protocols like other parts of crux.

hraberg · 2019-02-23T14:11:10Z

In Crux one can already configure Kafka to control what goes on the wire, as one can provide an extra Kafka properties file as a Crux options. This is also meant to enable various authentication and other Kafka options (including serialisation) which we cannot foresee. It's not necessarily elegant, but doable. This obviously needs to be configured consequently throughout all participating systems, there's no central authority.

By default, Crux sends data over Kafka using Nippy in the messages, and have Kafka apply Snappy compression on this. Kafka has end-to-end compression, see: https://kafka.apache.org/documentation/#design_compression

The idea (or hope) is that these things together should be a "good enough" default for most use cases. This is partly to avoid the confusion and chance of configuring Crux in an inconsistent way, as mentioned above.

And a case can also be made for the opposite, that the Kafka topics should contain raw edn (or even more general, JSON) as this makes the data easier to consume and deal with without relying on Crux or specific libraries. Kafka could still be compressing this. This can also already be configured. We should add a default edn implementation of the Kafka serdes so support this more easily.

It's worth pointing out that Crux also stores the documents locally in the KV store in Nippy format, and this is currently not configurable. The content hashes Crux use for the documents is also based on this format, so Nippy is indirectly touching many things.

Just a few reflections and thoughts, not sure if this advances or solves the issue directly though. Keeping it open for further discussion.

…t's easy to configure it without needing other dependencies. #151.

hraberg · 2019-02-23T14:20:45Z

Another reflection, something like protocol buffers could be added to Crux's transaction topic, but not easily to the document topic, as the messages there simply are maps without schema. A user could of course conceivably have a very strict set of documents with schema they allow to be transacted into Crux, and have protocol buffers supporting that.

But it points to the fact that the transaction topic (which is much smaller) and the document topic would potentially have to be treated differently if one goes down this path.

gklijs · 2019-06-17T21:11:55Z

Maybe something on top of crux would be more sensible/usable. Working kind of like, if keys x, y ,z are in the map, also store it in kafka topic q using serializer f. This way you you have some categories of data in sync, and easily usable both from crux and with anything using kafka?

caioaao · 2020-01-13T02:33:45Z

Just my 2c on some stuff: First, I think protobufs are known to be a lot slower than alternative binary serialization protocols like flatbuffers. Second: idk how useful/feasible pluggable serialization would be. I feel it should be more coupled with the txlog backend choice to be useful as a general pluggable part (serialization for crux-jdbc should be different than crux-kafka or crux-rocksdb, for instance). Disclaimer: I am not familiar at all with crux, but those came to my mind when reading here.

EDIT: flatbuffers benchmark (on c++): https://google.github.io/flatbuffers/flatbuffers_benchmarks.html

hraberg added ingest dev-experience labels Feb 22, 2019

hraberg added this to To do in XTDB Development via automation Feb 22, 2019

hraberg added this to the Post Phase 2 Release milestone Feb 22, 2019

hraberg added a commit that referenced this issue Feb 23, 2019

Adding an edn Kafka Deserializer and Serializer into Crux itself so i…

43898f5

…t's easy to configure it without needing other dependencies. #151.

hraberg added a commit that referenced this issue Feb 23, 2019

Consistency. #151.

5a44579

hraberg modified the milestone: Post Phase 2 Release Apr 23, 2019

refset moved this from To do to Public Backlog in XTDB Development May 14, 2019

hraberg removed this from the Beta milestone Jun 12, 2019

hraberg mentioned this issue Oct 25, 2019

Edn Serialisation is Likely Broken #365

Closed

jarohen removed this from Ideas in XTDB Development Nov 13, 2019

jarohen removed analysis labels Mar 25, 2020

refset mentioned this issue Oct 26, 2020

Tx-log post request via http module with Kafka JSON producer gives com.fasterxml.jackson.core.JsonGenerationException #1176

Closed

jarohen added the 1.x label Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable serialization #151

Pluggable serialization #151

mpenet commented Feb 22, 2019

hraberg commented Feb 23, 2019

hraberg commented Feb 23, 2019

gklijs commented Jun 17, 2019

caioaao commented Jan 13, 2020 •

edited

Pluggable serialization #151

Pluggable serialization #151

Comments

mpenet commented Feb 22, 2019

hraberg commented Feb 23, 2019

hraberg commented Feb 23, 2019

gklijs commented Jun 17, 2019

caioaao commented Jan 13, 2020 • edited

caioaao commented Jan 13, 2020 •

edited