Note: My go module imported arch versioned at v0.0.0-20210308155006-05f8f0431f72.
Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
What did you do?
I am writing a "shellcode" parser that takes raw binary, or a C-style hex array and converts it into human-readable disassembly like the Unix objdump utility. While experimenting with some sample shellcode, I found that this x86_64 sample produced an instruction with an argument that differed from the comments and the output of objdump.
The instruction in question:
"\xbf\xad\xde\xe1\xfe"// mov $0xfee1dead,%edi
$ objdump -D -M att,x86-64 -b binary -m i386 foo
foo: file format binary
Disassembly of section .data:
0: bf ad de e1 fe mov $0xfee1dead,%edi
I expected the immediate argument to the mov to be 0xfee1dead.
What did you see instead?
The immediate argument to the mov is -0x11e2153 (0xFFFFFFFFFEE1DEAD).
Debugging and thoughts
I should start by stating that my knowledge of x86 is... limited. So, I apologize in advance for misusing any terminology. I am going off what I already know, some Googling, and code comments.
I did attempt to understand what exactly is going on in the x86asm library. My first thought is maybe this is not even a bug? The value in question overflows a signed 32-bit integer... But this is 64-bit assembly.
Following the x86asm.Decode function through a debugger, we end up in the decode1 function in decode.go. Its documentation mentions that it mimic[s] bugs (or at least unique features) of GNU libopcodes as used by objdump. So, is this data even being interpreted correctly in the first place?
That said, @rsc's comment regarding 64-bit support on line 242 states:
// TODO(rsc): 64-bit mode not tested, probably not working.
... so, perhaps I am already setting my expectations a bit high :) The comment dates back to ~2014. Maybe it no longer applies?
In decode1, we end up in a loop on line 468. Its job appears to be decoding op code bytes. On the sixth iteration, we end up in a switch statement case on line 1046. Apparently, the op code is interpreted as xArgImm32. The code in this case converts the local imm variable (an int64) to an int32, and then converts this new value to an Imm, which is a type alias for int64. It then assigns it to index 1 of the Args field of the Inst object being constructed:
inst.Args[narg] =Imm(int32(imm)) // 'imm' = 4276215469 (0xFEE1DEAD).narg++
The resulting value is -18751827 (0xFFFFFFFFFEE1DEAD).
I wonder if the op code was interpreted incorrectly? The subsequent xArgImm64 case does the right thing (not converting imm to an int32), and this is 64-bit code.
Thank you for reading!
The text was updated successfully, but these errors were encountered:
Thanks for the explanation, @zephyrtronium. I guess I'm stuck on the human readable form of the instruction being represented that way. I do not think a debugger would represent a register's value as -0x11e2153. It just feels maclunky, especially if you have several values in a single binary being represented that way. Thinking back to that comment about emulating objdump bugs, is this the way objdump should represent the instruction?