-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw reader #910
Comments
As a sidenote
I'm hoping I can do better, specifically with field 1's repeated ints. |
Also, I'd be happy to put this all in a PR as a built-in method so others can very easily reverse protobufs, once I get it figured out. |
The current method I've got (which I am positive is wrong) actually gets pretty close to protoc's output: const getData = buffer => {
const reader = Reader.create(buffer)
const out = []
while (reader.pos < reader.len) {
const tag = reader.uint64()
const id = tag >>> 3
const wireType = tag & 7
switch (wireType) {
case 0: // int32, int64, uint32, bool, enum, etc
out.push({[id]: reader.uint32()})
break
case 1: // fixed64, sfixed64, double
out.push({[id]: reader.fixed64()})
break
case 2: // string, bytes, sub-message
const bytes = reader.bytes()
// TODO: this isn't the right way to do this at all, I'm sure
if (bytes[0] === 8) {
out.push({[id]: getData(bytes)})
} else {
out.push({[id]: bytes.toString()})
}
break
// IGNORE start_group
// IGNORE end_group
case 5: // fixed32, sfixed32, float
out.push({[id]: reader.float()})
break
default: reader.skipType(wireType)
}
}
return out
} The This is my current test message:
And this is what the above function outputs: [
{
"1": "\u0001\u0002\u0003\u0004\u0005"
},
{
"2": 1
},
{
"3": "hello"
},
{
"4": [
{
"1": 1
},
{
"2": "cool"
},
{
"3": [
{
"1": 1
}
]
}
]
},
{
"4": [
{
"1": 2
},
{
"2": "awesome"
},
{
"3": [
{
"1": 2
}
]
}
]
},
{
"4": [
{
"1": 3
},
{
"2": "neat"
},
{
"3": [
{
"1": 3
}
]
}
]
}
] |
Ok, I think I have the basics worked out at rawproto. Happy to make a PR to this project, if it's desired, and would love any suggestions (I'm not toally confident I'm doing it right.) Here is example output: [
{
"1": {
"type": "Buffer",
"data": [
1,
2,
3,
4,
5
]
}
},
{
"2": 1
},
{
"3": "hello"
},
{
"4": [
{
"1": 1
},
{
"2": "cool"
},
{
"3": [
{
"1": 1
}
]
}
]
},
{
"4": [
{
"1": 2
},
{
"2": "awesome"
},
{
"3": [
{
"1": 2
}
]
}
]
},
{
"4": [
{
"1": 3
},
{
"2": "neat"
},
{
"3": [
{
"1": 3
}
]
}
]
}
] |
Just wanted to leave this idea here: Looking through the protoc source, it appears there is no "raw-reader", it's just that the regular reader is ok with extra fields not defined in the proto (and adds them as numeric-named fields.) Raw parsing is basically just "use an empty proto message" and then all the other fields are extra so added as numeric fields. If there was an option in protobufjs to do this (not throw on extra fields, just add them as numeric fields with guessed types) we'd have a raw-parser, but also the other thing that protoc can't do with this stuff: "I have this proto which defines some of the fields, but I there are some extra fields I don't know about, so just add those as number-fields for further analysis." Here is an example of this: I made a proto binary message, like above, but added an extra string field that's not in the proto. I used this proto: syntax = "proto3";
message Test {
repeated int32 nums = 1;
int64 num = 2;
string str = 3;
repeated Child children = 4;
}
message Child {
int64 num = 1;
string str = 2;
repeated Child children = 3;
string extra = 4;
}
I removed
and protoc, using
If I had the equivalent combination of the 2, in protobufjs, it would be easier to reverse-engineer it:
So, with this, I could loop through fields and test for numeric-fields to find the ones I should take a closer look at. Since I would have all the other awsome context-info that protobufjs has, it would be very easy to figure out which definitions have an extra field (in this case |
So basically, output of {
"nums": [
1,
2,
3,
4,
5
],
"num": "1",
"str": "hello",
"children": [
{
"num": "1",
"str": "cool",
"children": [
{
"num": "1",
"4": "this is extra."
}
]
},
{
"num": "2",
"str": "awesome",
"children": [
{
"num": "2",
"4": "this is extra."
}
]
},
{
"num": "3",
"str": "neat",
"children": [
{
"num": "3",
"4": "this is extra."
}
]
}
]
} |
Coming back to this years later, as I need to decode another protobuf without fully having the proto def. I see no one has commented. Is this a subject anyone else has interest in a PR about? In my latest adventures in reversing protobufs, I discovered syntax = "proto3";
message Test {
repeated int32 nums = 1;
int64 num = 2;
string str = 3;
repeated Child children = 4;
}
message Child {
int64 num = 1;
string str = 2;
repeated Child children = 3;
string extra = 4;
} Then I comment out
I'd still like to PR it to this lib, if there is interest. It would mean I can deprecate my old raw parser, and it would make reverse-engineering proto definitions (from partial definitions) even easier. In addition, I made a separate function to infer a basic proto from the raw binary, sort of like this: syntax = "proto3";
message Message3 {
int32 field1 = 1; // could be a int32, int64, uint32, bool, enum, etc, or even a float of some kind
}
message Message4 {
int32 field1 = 1; // could be a int32, int64, uint32, bool, enum, etc, or even a float of some kind
bytes field2 = 2; // could be a repeated-value, string, bytes, or malformed sub-message
Message3 subMessage3 = 3;
}
message MessageRoot {
bytes field1 = 1; // could be a repeated-value, string, bytes, or malformed sub-message
int32 field2 = 2; // could be a int32, int64, uint32, bool, enum, etc, or even a float of some kind
bytes field3 = 3; // could be a repeated-value, string, bytes, or malformed sub-message
repeated Message4 subMessage4 = 4;
} It's not perfect, but a similar idea could be used to generate a working proto to not error on unknown-formats. It would be cool to integrate these ideas and get partial inference, like use the proto if it applies, and fill in the others with some generated-name. Then you could look through the data and find better names for things you can figure out. It would also allow you to keep editing your proto as you figure fields out, and the next time it parses a message, it would have the new field-defs. |
I am attempting to make a binary-to-info raw parser, so I can get output similar to
protoc --decode_raw
from an existing binary message, and eventually generate a vague-but-won't-error proto file as a basis for parsing. I think I need a little help with sub-messages, and I'm not sure I'm grabbing types correctly.I've got a message I built from a proto like this:
and I get a binary message that looks like this:
To parse it, I've been looking at Google's docs and your nice article
As well as the linked issues: #55 #736
From that, I've made this function:
Now, I know that there are different types, encoded differently, which can't be guessed without the proto file, but I figure that I can mark the types in the output so people can tweak their own generated proto files. This will at least get them started (much like
protoc --decode_raw
.)My question is how do I check a string for sub-message or repeated status, and am I using appropriate decoders for those types?
The text was updated successfully, but these errors were encountered: