Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go-ipfs gossipsub produces invalid protobuf messages #361

Closed
rklaehn opened this issue Jul 24, 2020 · 13 comments
Closed

go-ipfs gossipsub produces invalid protobuf messages #361

rklaehn opened this issue Jul 24, 2020 · 13 comments

Comments

@rklaehn
Copy link
Contributor

rklaehn commented Jul 24, 2020

I am trying a rust-gossipsub based program to work with go-ipfs. I was having some strange intermittent problems.

After 3 days I nailed it down to this: go-ipfs under some circumstances produces protobuf messages that do not conform to the protobuf format used in go-ipfs.

The gossipsub protobuf has message ids as strings since a long time. See e.g.

repeated string messageIDs = 2;

But I am getting messages that are overall valid protobuf messages, but where the messageid is not a valid utf8 byte string.

$ echo 1ac9010ac6010a142f6b6c6b2f70726f642f323032302d30312d3039122a1220d024d0c98917394729c109ddf8d94c98d36ff3899b7f3f6d148b8e6baf084dc316236b1c66ea8a8e122a12202fe480d71af77f741f0a3abf054e30c5d414a8b61890d612c075f0a82a3e682e1610f7ab52a56bb2122a12200b04920eca11fe080538fd2e165d7e9547b90ee42879e36a267f37dbaa0beee01623bbb58618d1f8122a12208a68f5e8b6a57ef6515d1a9deebb060766edf4dd71f925a37eafa2fadc125ac616235bff420e07ec  | xxd -r -p | protoc --decode_raw
3 {
  1 {
    1: "/klk/prod/2020-01-09"
    2: "\022 \320$\320\311\211\0279G)\301\t\335\370\331L\230\323o\363\211\233\177?m\024\213\216k\257\010M\303\026#k\034f\352\212\216"

Here are the messages from an instrumented version of the rust gossipsub:

Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12202fe480d71af77f741f0a3abf054e30c5d414a8b61890d612c075f0a82a3e682e1610f7ab52a57069    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 1220d024d0c98917394729c109ddf8d94c98d36ff3899b7f3f6d148b8e6baf084dc316236b1c66ea8e82    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12208a68f5e8b6a57ef6515d1a9deebb060766edf4dd71f925a37eafa2fadc125ac616235bff420e08f9    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12200b04920eca11fe080538fd2e165d7e9547b90ee42879e36a267f37dbaa0beee01623bbb58618d5f2

For some reason the frequency of these corrupt messages depends on the exact way the node was connected to. See discussion in libp2p/rust-libp2p#1671 (comment)

The node in question is a go-ipfs 0.4.22 node that is in production. We are currently migrating these nodes to 0.6.0, but I want to make sure the issue is no longer present in 0.6.0.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

For reference, here is the gossipsub proto used in rust-gossipsub master. I can get the whole thing to work by changing messageid to bytes, but I would rather not...

https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/rpc.proto

@vyzo
Copy link
Collaborator

vyzo commented Jul 24, 2020

The message ID is not really expected to be a UTF8 string; the type in the protobuf should probably bytes instead.

@vyzo
Copy link
Collaborator

vyzo commented Jul 24, 2020

cc @raulk -- we might have to change the protobuf definitions for this, as some implementations expect strings to be utf8.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

The id is just an opaque thing to identify a message. It is randomly chosen (does not seem to be sequential from what I have seen). I see no good reason to constrain it to utf8.

So maybe just change the .proto to bytes? It seems to be the right thing to do disregarding issues of backwards compatibility...

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

If we change the proto so the message id is bytes, so accept bytes but only emit utf8 strings for some time, this change should be backwards compatible...

@vyzo
Copy link
Collaborator

vyzo commented Jul 24, 2020

bytes and strings have the same wire representation, there is no compatibility breakage by changing the protobuf type. Note that the message IDs were never valid UTF8 strings to begin with.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Well, you basically promised for 2 years that the message ids are strings by publishing the .proto . Or does it say that these are arbitrary blobs in the specs?

In any case, what people typically do when implementing a protocol is going to the repo of the most mature impl (in this case go-ipfs) and grabbing the .proto files from there.

But I agree that changing this is the best way forward, since the old .protos are a lie and there are plenty of non-utf8 message ids in in the wild...

@vyzo
Copy link
Collaborator

vyzo commented Jul 24, 2020

The promise was wrong, I didn't think it through :)

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Well, shit happens.

I am still a bit shocked though that the default golang protobuf lib allows you to emit non-utf8 (so not protobuf spec compliant) byte sequences where it says string in the .proto...

@AgeManning
Copy link

This is interesting. The spec had the protobuf as strings. For content addressing we were using base64 encoding to ensure utf-8 compatibility.

I guess bytes makes more sense and we can relax the encoding.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 25, 2020

Yeah, the spec was wrong. But on the positive side, I like bytes much better here. Having to base64 encode obvious things like hashes is not nice.

daviddias pushed a commit that referenced this issue Jul 27, 2020
@aschmahmann
Copy link
Contributor

@vyzo is this issue closed?

@vyzo
Copy link
Collaborator

vyzo commented Jul 31, 2020

yes, I think we can close.

@vyzo vyzo closed this as completed Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants