Maximum size check? #249

nh2 · 2018-10-23T16:51:35Z

The protobuf C++ library has a limit to 2 GB because sizes inside are int (signed 32-bit integers).

If you serialise into a proto with more than that, you get e.g.:

[libprotobuf ERROR google/protobuf/message_lite.cc:289] Exceeded maximum protobuf size of 2GB: 2229450891

Does proto-lens implement such a check as well?

I have a suspicion that I sent a C++ side proto parser into an infinite memory allocation by giving it a proto larger than that created by proto-lens.

The text was updated successfully, but these errors were encountered:

blackgnezdo · 2018-10-24T14:25:12Z

Do you happen to have a self-contained example demonstrating the problem? I'd be surprised if C++ protocol buffers only had a check on the serialization and not parsing side which you seem to suggest.

As a side note, I'm surprised you managed to create a message over 2G in size considering proto-lens is currently not as fast as we want it to be.

nh2 · 2018-10-26T21:20:32Z

I'd be surprised if C++ protocol buffers only had a check on the serialization and not parsing side which you seem to suggest.

But the issue is more intricate:

The protobuf API only accepts ints as buffer lengths (this seems like quite a bad decision, as size_t is recommended for that since the 70s or what):

ParseFromArray(const void * data, int size)

This means that the parsing code cannot have a check for the buffer being larger than 2 GB, because the maximum representable int is that 2 GB.

The failure scenario is that you serialise something using proto-lens to a ByteString, and then use useAsCStringLen and pass that buffer and length into ParseFromArray().

(Note that I've also filed protocolbuffers/protobuf#5285 but that would only be a heuristic.)

The protobuf library protects against this apparent misdesign of int as lengths by ensuring that a 2GB proto cannot even be created.

blackgnezdo · 2018-11-05T05:24:32Z

It feels artificial to impose such a limit on the encoder side. Did you happen to have checked if proto-lens is inter-operable with itself for such big messages?

On a more pragmatic node, while being correct for such messages is a good idea, I suspect proto-lens can't even be made to perform well for such cases.

nh2 · 2018-11-05T11:52:04Z

It feels artificial to impose such a limit on the encoder side.

It feels nonsensical to have that kind of limit at all; size_t isn't that new of a concept. But the main C++ protobuf library does it.

Did you happen to have checked if proto-lens is inter-operable with itself for such big messages?

Not yet -- I didn't think of that, because my main goal was to interop between Haskell and other languages.

I can't tell whether the 2GB limit is a design choice for protobufs as a protocol, or an implementation detail of https://github.com/protocolbuffers/protobuf.

I like the idea of this just being an implementation detail and the Haskell implementation not suffering from this limit, but I'm not sure how valuable it is if the other 7 languages provided by https://github.com/protocolbuffers/protobuf also suffer from that limitation (note I haven't checked those, only the C++ one; but I have checked that indeed the same restriction exists in Go's implementation, which makes me think that this is pervasive).

I suspect proto-lens can't even be made to perform well for such cases.

It worked reasonably well for me, at least on the serialisation side. Is there a reason to suspect that it should somehow get non-linearly slower as messages get bigger?

judah · 2018-11-06T04:11:04Z

A side comment: Are you sure that the restriction still exists in Go? I couldn't find an actual use of "ErrorTooLarge" in that repo: https://github.com/golang/protobuf/search?utf8=%E2%9C%93&q=ErrTooLarge&type=

nh2 · 2018-11-06T08:52:59Z

I couldn't find an actual use of "ErrorTooLarge" in that repo: https://github.com/golang/protobuf/search?utf8=%E2%9C%93&q=ErrTooLarge&type=

When I click that link, it finds me the code I linked above:

nh2 · 2018-11-06T08:53:44Z

Ah sorry, I misunderstood. You said use. I thought Github search was flaky again.

nh2 · 2018-11-06T09:01:05Z

Looks like the usage was removed in this "code drop" style commit.

Due to lack of commit messages, design docs or issues describing the changes, I cannot conclude whether this was an intentional lift of the restriction or an omission.

nh2 mentioned this issue Apr 10, 2020

closure size: protobuf depends on C++ compiler NixOS/nixpkgs#73919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum size check? #249

Maximum size check? #249

nh2 commented Oct 23, 2018

blackgnezdo commented Oct 24, 2018

nh2 commented Oct 26, 2018

blackgnezdo commented Nov 5, 2018

nh2 commented Nov 5, 2018

judah commented Nov 6, 2018

nh2 commented Nov 6, 2018

nh2 commented Nov 6, 2018

nh2 commented Nov 6, 2018

Maximum size check? #249

Maximum size check? #249

Comments

nh2 commented Oct 23, 2018

blackgnezdo commented Oct 24, 2018

nh2 commented Oct 26, 2018

blackgnezdo commented Nov 5, 2018

nh2 commented Nov 5, 2018

judah commented Nov 6, 2018

nh2 commented Nov 6, 2018

nh2 commented Nov 6, 2018

nh2 commented Nov 6, 2018