Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The fundamental difference between PB and FB is what ? #4372

Closed
rice668 opened this issue Jun 30, 2017 · 10 comments
Closed

The fundamental difference between PB and FB is what ? #4372

rice668 opened this issue Jun 30, 2017 · 10 comments

Comments

@rice668
Copy link

rice668 commented Jun 30, 2017

I recently used FB to refactor my project [ It is written by scala / java ] I did before. After refactoring over, I want to test the performance of both. Really good performance a lot I got, but here I wont quantify the specific figures. My question is about what is the fundamental difference about the TWO guys. My understanding as the following, please look at my understanding whether correct or incorrect ?

FB performance is strong enough because it based on memory of the serialization framework compared to PB is not based on memory as we must call object.build().writeTo(output) for encoding and decoding with parseFrom from a byte array or from an inputstream. It does not support the API for like calling fbb.dataBuffer() in FB. When FB call fbb.dataBuffer(), then it serialize the object to memory as a bytebuffer. After finishing this, FB will call object.getRootAsXXXX(byteBuffer) do a deserialization from a bytebuffer then. But as for PB, must call parseFrom do a deserialization. So, that is key reason makes the two guys have a relatively large difference in performance. PB is not based on memory, and needs deserialize object from an inputstream. As for FB, just deserialize object from a bytebuffer is enough. I understand right ?

Another question I would like to know is, does FB support an API that writes it's data to an output stream ? I think it does not support now. Why ? just becasue it is based on a memory ?

Other than that, I counted my performance testing time for FB is that write the bytebuffer to an output stream. Testing code like following. I calculte my time including write to the output stream and I think it is incorrect in a way. I am not very sure about what I tested here. Maybe just calculate the time on how much time generate a bytebuffer from fbb.dataBuffer() method is enough as FB is based on memory. So, I dont need including the time from a bytebuffer to an outputstream. But as for PB, I need do it with writeTo method.

Anyway, use the following code to do serialization tests, still can get a good performance, Despite the testing might incorrect in a way. But still can show the power of FB.

 // encode execution, also called packing
// [ the way to do this might be wrong, I am not sure in a way ]
// Since FB does not support like writeTo method which support in PB
// So, I must use it with an output stream

val startEncodeTime = System.currentTimeMillis()
fileOutputStream.write(byteBuffer.array())
val executionTime = System.currentTimeMillis() - startEncodeTime
totalEncodeTime = totalEncodeTime + executionTime
println("Totally encode time is = " + totalEncodeTime)
val marsAds = MarsAds.getRootAsMarsAds(byteBuffer)

Below is FB decode time calculation

val startDecodeTime = System.currentTimeMillis()
val marsAds = MarsAds.getRootAsMarsAds(byteBuffer)
val executionTime = System.currentTimeMillis() - startDecodeTime
totalDecodeTime = totalDecodeTime + executionTime
println("Totally decode time is = " + totalDecodeTime)

As for counting PB test time, it is more easier to test serialization time is that I just test writeTo method consume on how much time is enough. And as to deserialization, just test parseFrom method consume on how much time is also enough.

PB performance testing code looks like following.

builder.build().writeTo(fileOutputStream) // Serialization begins...
 // decode execution, also called unpacking
val startDecodeTime = System.currentTimeMillis()
val marsAds = MarsAdsProtos.MarsAds.parseFrom(fileInputStream)
val executionTime = System.currentTimeMillis() - startDecodeTime
totalDecodeTime = totalDecodeTime + executionTime
println("Totally decode time is = " + totalDecodeTime)

@aardappel @rw It would be better if both you can take a look on what i think about those two awesome guys FB and PB. I am very very appreciate it! Thanks! And if I am wrong, please helps me out here.

@aardappel
Copy link
Collaborator

fbb.dataBuffer() does not serialize, it merely gives you access to the already serialized data. FlatBuffers serializes on the fly. Similarly, object.getRootAsXXXX(byteBuffer) does not de-serialized. It merely gets you a friendly pointer into the existing buffer.

The key difference is that FlatBuffers doesn't de-serialize at all, ever. It allows you to access the serialized data in-place.

FlatBuffers only supports writing to a ByteBuffer, yes.

Your timing doesn't test anything. You're testing the time it takes to write to the stream. You should instead test the time from where you create the FlatBufferBuilder until you call finish on it, that does all the serialization work.

You can't time de-serialization because it is always 0. Instead, to compare against Protobuf, you could compare the time it takes to de-serialize protobuf and access all its fields, against FlatBuffers accessing all fields.

@rice668
Copy link
Author

rice668 commented Jul 1, 2017

Hmm, That is why time the deserialization in my results for the FlatBuffers is basically always return almost to 0 milliseconds. Makes sense better now to me. With respect to above code what you said is wrong, I would also say "It is literally wrong." What I did these things as I would like to know FlatBuffers to write data to the disk that occupy how much space compared to Protobuf though. The conclusion is that the cost of the disk space of the Flatbuffers is relatively large than that of PB, which should be a way to take advantage of space to exchange time savings.

Here, I give a set of test results for how much disk space for serializing text file by means of Flatbuffers and protobuf.

Text files from Kafka size: 36, 879 KB ( input data )

With PB, write the bytes to output stream: 17, 878 KB ( wire format )

With Flatbuffers, write the bytes to output stream: 38, 589 KB (wire format) ( > 36, 879 KB belongs to text file size)

I will test for accessing 1 field, 2 fileds, 14 fields, and all fields [totally 27 filelds within an object] under different cases.

Last but not least, Would be better if Flatbuffers can provide a compressed API interface, and then send the compressed data to network for transmission, What do you think ? And Flink on the way to use compression (e.g. snappy) for full check/savepoints. https://issues.apache.org/jira/browse/FLINK-6773

@rice668
Copy link
Author

rice668 commented Jul 1, 2017

FYI. I put a report testing here from my work recently as following as a reference. And the input data comes from Kafka production environments. I have two sets of data, one's size is 36,879 KB, another size is 145,157 KB respectively. I parsed those input data into totally 27 fields and below are the results under different cases which including access 1, 2, 14 and 27 fields. Time is timed by millisecond.

12342331

36879

@aardappel
Copy link
Collaborator

You can use any existing compressor on a FlatBuffer, which will give some savings, but it is not optimal because offsets are relatively random and thus not very compressible. A special purpose compression schema could be invented, but I don't think anyone has tried that yet. FlatBuffers was on purpose designed for speed of access over size.

@rw
Copy link
Collaborator

rw commented Aug 7, 2017

One option is to use an encoder like zstd, which supports custom model "dictionaries", on individual values before writing them to a FlatBuffer. That way you minimize overhead.

@rw rw closed this as completed Jul 26, 2018
@binary132
Copy link

@zhangminglei, can you publish your benchmark code and data?

@binary132
Copy link

@aardappel one challenge I've gotten when suggesting the use of flatbuffers is that they're not optimized for size, and therefore (since IO, not CPU, typically dominates resources) "why would we want to do that?" I have to admit I'm not sure how to approach that. Is there a way to tune it for size?

@aardappel
Copy link
Collaborator

@binary132 if wireformat size is your concern above all else, then indeed I would not choose FlatBuffers. You can run a compressor on top of FlatBuffers, but since offsets don't compress well, this doesn't gain as much as you'd get from using, say, Protobuf.

I guess FlatBuffers was designed for use cases where memory usage and speed matter, such as games loading lots of data, or high performance RPC between services in a data center (where cpu can often be a bigger bottleneck than the network!)

Other than using a good compressor, not sure what to do if you already bought in to FlatBuffers and you want it to be smaller. I could imagine a special purpose transform that would make FlatBuffers more compressible, not sure if that's worth it.

@no-more-secrets
Copy link
Contributor

I did some investigation and found that the JSON representation of a Flatbuffer compress much better than the binary version. For example, take two serialized Flatbuffers that contain identical contents, just one in JSON and the other in standard Flatbuffers binary format:

$ ls
savegame.json  savegame.bin
$ wc -c *
 881167 savegame.json
 179216 savegame.bin
1060383 total
$ cat savegame.bin | gzip -9 | wc -c
53737
$ cat savegame.json | gzip -9 | wc -c
22415   # <== !!

So therefore if deserialization speed is not important to you, but you want to minimize space, it may be beneficial to serialize as JSON and compress that.

@aardappel
Copy link
Collaborator

@dpacbach
Yes, FlatBuffers is not very compressible because offsets contained in it are very random. It depends on the contents though, I presume your data contains lots of strings? For example, data with lots of random-looking numbers likely would have FlatBuffers result in a smaller end result. To get a 40x compression ratio on JSON you likely have a lot of very redundant data that could be better represented in some other way.

Also note how big the uncompressed JSON is. So with this path, you're going to decompress into 5x more memory, then run a JSON parser on all of that (which is slow, and likely allocates more copied of all of that).. that's a big price to pay in efficiency when FlatBuffers can be accessed as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants