Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum size check? #249

Open
nh2 opened this issue Oct 23, 2018 · 8 comments
Open

Maximum size check? #249

nh2 opened this issue Oct 23, 2018 · 8 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Oct 23, 2018

The protobuf C++ library has a limit to 2 GB because sizes inside are int (signed 32-bit integers).

If you serialise into a proto with more than that, you get e.g.:

[libprotobuf ERROR google/protobuf/message_lite.cc:289] Exceeded maximum protobuf size of 2GB: 2229450891

Does proto-lens implement such a check as well?

I have a suspicion that I sent a C++ side proto parser into an infinite memory allocation by giving it a proto larger than that created by proto-lens.

@blackgnezdo
Copy link
Collaborator

Do you happen to have a self-contained example demonstrating the problem? I'd be surprised if C++ protocol buffers only had a check on the serialization and not parsing side which you seem to suggest.

As a side note, I'm surprised you managed to create a message over 2G in size considering proto-lens is currently not as fast as we want it to be.

@nh2
Copy link
Contributor Author

nh2 commented Oct 26, 2018

I'd be surprised if C++ protocol buffers only had a check on the serialization and not parsing side which you seem to suggest.

But the issue is more intricate:

The protobuf API only accepts ints as buffer lengths (this seems like quite a bad decision, as size_t is recommended for that since the 70s or what):

ParseFromArray(const void * data, int size)

This means that the parsing code cannot have a check for the buffer being larger than 2 GB, because the maximum representable int is that 2 GB.

The failure scenario is that you serialise something using proto-lens to a ByteString, and then use useAsCStringLen and pass that buffer and length into ParseFromArray().

(Note that I've also filed protocolbuffers/protobuf#5285 but that would only be a heuristic.)

The protobuf library protects against this apparent misdesign of int as lengths by ensuring that a 2GB proto cannot even be created.

@blackgnezdo
Copy link
Collaborator

It feels artificial to impose such a limit on the encoder side. Did you happen to have checked if proto-lens is inter-operable with itself for such big messages?

On a more pragmatic node, while being correct for such messages is a good idea, I suspect proto-lens can't even be made to perform well for such cases.

@nh2
Copy link
Contributor Author

nh2 commented Nov 5, 2018

It feels artificial to impose such a limit on the encoder side.

It feels nonsensical to have that kind of limit at all; size_t isn't that new of a concept. But the main C++ protobuf library does it.

Did you happen to have checked if proto-lens is inter-operable with itself for such big messages?

Not yet -- I didn't think of that, because my main goal was to interop between Haskell and other languages.

I can't tell whether the 2GB limit is a design choice for protobufs as a protocol, or an implementation detail of https://github.com/protocolbuffers/protobuf.

I like the idea of this just being an implementation detail and the Haskell implementation not suffering from this limit, but I'm not sure how valuable it is if the other 7 languages provided by https://github.com/protocolbuffers/protobuf also suffer from that limitation (note I haven't checked those, only the C++ one; but I have checked that indeed the same restriction exists in Go's implementation, which makes me think that this is pervasive).

I suspect proto-lens can't even be made to perform well for such cases.

It worked reasonably well for me, at least on the serialisation side. Is there a reason to suspect that it should somehow get non-linearly slower as messages get bigger?

@judah
Copy link
Collaborator

judah commented Nov 6, 2018

A side comment: Are you sure that the restriction still exists in Go? I couldn't find an actual use of "ErrorTooLarge" in that repo: https://github.com/golang/protobuf/search?utf8=%E2%9C%93&q=ErrTooLarge&type=

@nh2
Copy link
Contributor Author

nh2 commented Nov 6, 2018

I couldn't find an actual use of "ErrorTooLarge" in that repo: https://github.com/golang/protobuf/search?utf8=%E2%9C%93&q=ErrTooLarge&type=

When I click that link, it finds me the code I linked above:

screenshot from 2018-11-06 08-52-15

@nh2
Copy link
Contributor Author

nh2 commented Nov 6, 2018

Ah sorry, I misunderstood. You said use. I thought Github search was flaky again.

@nh2
Copy link
Contributor Author

nh2 commented Nov 6, 2018

Looks like the usage was removed in this "code drop" style commit.

Due to lack of commit messages, design docs or issues describing the changes, I cannot conclude whether this was an intentional lift of the restriction or an omission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants