-
I was aware of NDS for a while, but not their open source projects. So I am positively surprised on these topics. First of all I am not here to troll or zealot, but rather want to ask a question and get informed. It seems that zserio is serialising similar to protobuf, works like any historic database: a row store, thus not taking in account the ability to compress data by looking at it via columns. As example;
Would become:
Is there a technical reason to distribute data per row, opposed per type? Maybe this answer could be added to the FAQ. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Thanks a lot for an interest in zserio and for the question. To answer your question, we would need to clarify the meaning of 'row'. Do you mean that if you put
and use zserio to serialize it, you will get binary data distributed per row? Meaning binary data will start with Or do you mean SQL tables?
|
Beta Was this translation helpful? Give feedback.
-
A column store concept is certainly appealing since it has a couple of benefits as you already mentioned: compression, fixed width arrays reading performance and others. One of the advertised benefits of zserio is its "zero serialization overhead", this means that we do not impose a wire-format. Such a wireformat description would be needed to be able to write schema in a well-readable struct like format but store it optimal in columns or other structures. zserio gives you the opportunity to write your own schema however you like. So you can simply convert your struct Employee (which you would later store in an array) into a column-store like
So zserio actually allows both design paradigms: column and row store. It is basically up to you to implement those. Of course your application will have to deal with a little bit of overhead in the case of the column-store approach since there will be no generated class Employee in the end but you will have to do that on your own. But I agree that we may want to update the FAQs in that respect in the future. |
Beta Was this translation helpful? Give feedback.
-
@mikir I hope that @fklebert has made it a bit more clear. Row oriented file formats take a C-struct as a database row and transfer that some what compacted as long as the varchar case is handled well. But some properties such efficient as random access to individual properties are lost. If a file format would somehow be column aware for example blocks of information were transferred it could be much more efficient to group the data per attribute (column), opposed to group the data per object (row/struct). Once you want to do a realtime individual properties exchange the column format would fall back to the single element column (hence: row). |
Beta Was this translation helpful? Give feedback.
A column store concept is certainly appealing since it has a couple of benefits as you already mentioned: compression, fixed width arrays reading performance and others.
One of the advertised benefits of zserio is its "zero serialization overhead", this means that we do not impose a wire-format. Such a wireformat description would be needed to be able to write schema in a well-readable struct like format but store it optimal in columns or other structures.
zserio gives you the opportunity to write your own schema however you like. So you can simply convert your struct Employee (which you would later store in an array) into a column-store like