Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whats wrong with this buffer? (how to decode a protobuf buffer by hand) #55

Closed
filipednb opened this issue Sep 13, 2013 · 19 comments
Closed
Labels

Comments

@filipednb
Copy link

Trying to decode:

0a 0d 08 f9 27 12 02 4f 4b 18 8a 8c 06 20 4e

with this message:

message BuyInResponse {
   enum Code {
        OK = 0;
        ERROR = 1;
        AUTH_ERROR = 2;
    }

    repeated PaymentResponseElement response = 1;
}
message PaymentResponseElement {
    optional int64 pnPaymentId = 1;
    optional string messageCode =2;
    optional int64 balanceAfterTransaction = 3;
    optional int32 version = 4;
}

Getting this error:
Error: Illegal wire type for field Message.Field.core.comm.message.int2s.PaymentResponseElement.messageCode: 2 (0 expected)

@dcodeIO
Copy link
Member

dcodeIO commented Sep 13, 2013

If you do not have the .proto file, all you can do is to reverse engineer it from the buffer, which is possible using the protobuf documentation: https://developers.google.com/protocol-buffers/docs/encoding

Note that a couple of data types map to the same wire type (especially wire type 2 and unsigned / signed integers) and you'll have to sort out which is the right one on your own. Because of this, there is no automated approach.

@dcodeIO
Copy link
Member

dcodeIO commented Sep 13, 2013

In this exact case:

<0a 0d 08 f9 27 12 02 4f 4b 18 8a 8c 06 20 4e>

0A hex = 1 | 010 bin which is the first tag constructed from two values (marked by "|") = wire type 2 (=010), id 1 (=1)

0a<0d 08 f9 27 12 02 4f 4b 18 8a 8c 06 20 4e>

wire type 2 is a length delimited value (see), so

0D hex = 13 dec is the length (which is single byte varint of 0001 0011 bin actually, which is easy to decode by a simple binary to decimal conversion)

which is the rest of the data.

Assuming that this is an inner message with a length of 13 bytes we get for its contents:

0a 0d<08 f9 27 12 02 4f 4b 18 8a 8c 06 20 4e>

08 hex = 1 | 000 bin = wire type 0, id 1

wire type 0 is a varint, which is a bit difficult to calculate by hand if it is built from multiple bytes. However, we are able to determine its length:

0a 0d 08<f9 27 12 02 4f 4b 18 8a 8c 06 20 4e>

F9 hex = 1111 1001 bin with first bit set, continue
27 hex = 0010 0111 bin with first bit not set, end

You have to determine whether it is 32 or 64 bit (assuming 64 bit will always work as it will work with 32 bit values, too) and if it is unsigned (uint_), signed (int_) or zig-zag encoded (sint*, see).

0a 0d 08 f9 27<12 02 4f 4b 18 8a 8c 06 20 4e>

12 hex = 10 | 010 bin = wire type 2, id 2

wire type 2 is, as we already know, a length delimited value, so

0a 0d 08 f9 27 12<02 4f 4b 18 8a 8c 06 20 4e>

02 hex = 2 dec is the length

0a 0d 08 f9 27 12 02<4f 4b 18 8a 8c 06 20 4e>

4F 4B hex = 0100 1111 0100 1011 bin - you have to determine what actual type this is

0a 0d 08 f9 27 12 02 4f 4b<18 8a 8c 06 20 4e>

18 hex = 11 | 000 bin = wire type 0, id 3

again a varint

0a 0d 08 f9 27 12 02 4f 4b 18<8a 8c 06 20 4e>

8A hex = 1000 1010 bin with first bit set, continue
8C hex = 1000 1100 bin with first bit set, continue
06 hex = 0000 0110 bin with first bit not set, end

0a 0d 08 f9 27 12 02 4f 4b 18 8a 8c 06<20 4e>

20 hex = 100 | 000 bin = wire type 0, id = 4

again a varint

0a 0d 08 f9 27 12 02 4f 4b 18 8a 8c 06 20<4e>

4E hex = 0100 1110 bin with first bit not set, end.

0a 0d 08 f9 27 12 02 4f 4b 18 8a 8c 06 20 4e|

The buffer looks ok so far.

The original .proto could, substituting your already provided data types, look somehow like this:

message Outer {
   repeated Inner inner = 1;

   message Inner {
      optional int64 a = 1;
      optional string b = 2;
      optional int64 c = 3;
      optional int32 d = 4;
   }
}

@filipednb
Copy link
Author

Very nice... Tnx bout your patience. Its very clear now...
Let me ask,,, isnt this:

message BuyInResponse {
   enum Code {
        OK = 0;
        ERROR = 1;
        AUTH_ERROR = 2;
    }

    repeated PaymentResponseElement response = 1;
}
message PaymentResponseElement {
    optional int64 pnPaymentId = 1;
    optional string messageCode =2;
    optional int64 balanceAfterTransaction = 3;
    optional int32 version = 4;
}

exactly like

message Outer {
   repeated Inner inner = 1;

   message Inner {
      optional int64 a = 1;
      optional string b = 2;
      optional int64 c = 3;
      optional int32 d = 4;
   }
} 

Why the error persist?

@dcodeIO
Copy link
Member

dcodeIO commented Sep 13, 2013

I'd say yes, it is - meaning if I don't miss something. If so, it could be an encoding issue, like that the buffer becomes converted to a string somewhere and is corrupted in that process or such.

How are you obtaining the data / reading it into a ByteBuffer?

@filipednb
Copy link
Author

you know if it can happen by passing the buffer by various scripts? ... currently I pass the buffer (by parameter) for 3 different files.

@dcodeIO
Copy link
Member

dcodeIO commented Sep 13, 2013

If you pass it just as a function argument, this depends. Just passing it does not modify its type but any of the functions involved could possibly convert the buffer back and forth to some other data type, like a string, which might corrupt the data (like when en/-decoding to/from UTF8, US-ASCII etc.).

If you obtain it through HTTP and binaryType="arraybuffer" like with WebSockets or similar isn't available, I'd suggest that you encode it to Base64 before transmitting it over any network connection, and decode it properly to bytes prior to putting it into a byte buffer.

@filipednb
Copy link
Author

tnx bout you code passion :)

@dcodeIO
Copy link
Member

dcodeIO commented Sep 13, 2013

It's a good example and I've linked it from the FAQ in the wiki :)

@dcodeIO
Copy link
Member

dcodeIO commented Jun 19, 2014

Another example: #143 (comment)

@saikatmohajan
Copy link

since wire type 2 (Length-delimited) represents string, bytes, embedded messages, packed repeated fields, is there way to differentiate between string vs message vs repeated field ?

@dcodeIO
Copy link
Member

dcodeIO commented Nov 12, 2014

Not from the raw message alone, but with the correct .proto file loaded, you have everything at hand to evaluate the reflection structure for what to expect.

The other option is guessing.

@saikatmohajan
Copy link

Thanks for getting back so quick. I am basically using java to decode the raw protobuf and trying to generate the same result as protoc --decode_raw. It looks to me that there must be a way to differentiate since protoc command is doing it just by reading the raw protobuf.

@dcodeIO
Copy link
Member

dcodeIO commented Nov 12, 2014

Well, protoc then probably does some guessing for you. There is no other information on the type than "length delimited", as that's all a decoder needs (like with --decode_raw). Combined with the .proto definition, it becomes interpreted as the type it is.

@venkatpathapati
Copy link

I have problem with extraction of string values from binary data.
Following is the structure to extract values and corresponding binary data also.
9 {
1: "106671829932240464836"
4: "Vijayakrishnasai Gudavalli"
}

4a 33 0a 15 31 30 36 36 37 31 38 32 39 39 33 32 32 34 30 34 36 34 38 33 36 22 1a 56 69 6a 61 79 61 6b 72 69 73 68 6e
61 73 61 69 20 47 75 64 61 76 61 6c 6c 69

@venkatpathapati
Copy link

Can any one help me to how to decode the binary data as specified in above comment..

@venkatpathapati
Copy link

How do we differentiate length delimited string, bytes, sub messages in binary data..Can any one help me how to parse the binary data

@strngr
Copy link

strngr commented Jul 8, 2016

@venkatpathapati There is no other way to differentiate string and submessage except it's content. But even content may be confusing. So if you haven't schema you only can try to predict what message is and try to decode it.

@krforgit
Copy link

krforgit commented Sep 1, 2017

I am getting an index out of range exception,
when I am subscribing a message from a server which is a console application,
this is the code snippet which is sending the message

image

and the below is the JS snippet where the data is recieced but on continous recieving the protobufjs is throwing an index out of range exception at protobufjs\src\reader.js :13:12

capture

and the exception is
capture1

can any one help me in decoding the message getting from the server continously.

@dcodeIO
Copy link
Member

dcodeIO commented Sep 1, 2017

When sending and receiving multiple messages, use length delimited messages, because otherwise the decoder doesn't know where one message ends and another one starts. protobuf.js provides Message#decodeDelimited for this purpose (your server then has to send delimited messages as well, of course). Now, you basically want a buffer on the client side that collects all the so far received data. You'd then put this into a protobuf.Reader and provide it, instead of a buffer, to Message#decodeDelimited so you can find out how many bytes of the buffer have been read through inspecting Reader#pos afterwards. decodeDelimited might be callable multiple times depending on the number of messages within the so far received buffer. Cut that chunk (0 - reader.pos) away, rinse and repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants