Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/arch/x86/x86asm: x86 64-bit immediate argument integer overflow #45052

Open
stephen-fox opened this issue Mar 16, 2021 · 2 comments
Open

x/arch/x86/x86asm: x86 64-bit immediate argument integer overflow #45052

stephen-fox opened this issue Mar 16, 2021 · 2 comments

Comments

@stephen-fox
Copy link

@stephen-fox stephen-fox commented Mar 16, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16 darwin/amd64

Note: My go module imported arch versioned at v0.0.0-20210308155006-05f8f0431f72.

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOHOSTARCH="amd64"
GOHOSTOS="darwin"

What did you do?

I am writing a "shellcode" parser that takes raw binary, or a C-style hex array and converts it into human-readable disassembly like the Unix objdump utility. While experimenting with some sample shellcode, I found that this x86_64 sample produced an instruction with an argument that differed from the comments and the output of objdump.

The instruction in question:

"\xbf\xad\xde\xe1\xfe"  // mov    $0xfee1dead,%edi

objdump's interpretation:

$ objdump -D -M att,x86-64 -b binary -m i386 foo

foo:     file format binary

Disassembly of section .data:

00000000 <.data>:
   0:	bf ad de e1 fe       	mov    $0xfee1dead,%edi

A sample Go program, and its interpretation:

package main

import (
	"golang.org/x/arch/x86/x86asm"
	"log"
)

func main() {
	log.SetFlags(0)

	inst, err := x86asm.Decode([]byte{0xbf, 0xad, 0xde, 0xe1, 0xfe}, 64)
	if err != nil {
		log.Fatalln(err)
	}

	log.Println(inst.Args)

	log.Println(x86asm.GNUSyntax(inst, 0, nil))

	// Output:
	// [EDI -0x11e2153 <nil> <nil>]
	// mov $-0x11e2153,%edi
}

What did you expect to see?

I expected the immediate argument to the mov to be 0xfee1dead.

What did you see instead?

The immediate argument to the mov is -0x11e2153 (0xFFFFFFFFFEE1DEAD).

Debugging and thoughts

I should start by stating that my knowledge of x86 is... limited. So, I apologize in advance for misusing any terminology. I am going off what I already know, some Googling, and code comments.

I did attempt to understand what exactly is going on in the x86asm library. My first thought is maybe this is not even a bug? The value in question overflows a signed 32-bit integer... But this is 64-bit assembly.

Following the x86asm.Decode function through a debugger, we end up in the decode1 function in decode.go. Its documentation mentions that it mimic[s] bugs (or at least unique features) of GNU libopcodes as used by objdump. So, is this data even being interpreted correctly in the first place?

That said, @rsc's comment regarding 64-bit support on line 242 states:

// TODO(rsc): 64-bit mode not tested, probably not working.

... so, perhaps I am already setting my expectations a bit high :) The comment dates back to ~2014. Maybe it no longer applies?

In decode1, we end up in a loop on line 468. Its job appears to be decoding op code bytes. On the sixth iteration, we end up in a switch statement case on line 1046. Apparently, the op code is interpreted as xArgImm32. The code in this case converts the local imm variable (an int64) to an int32, and then converts this new value to an Imm, which is a type alias for int64. It then assigns it to index 1 of the Args field of the Inst object being constructed:

case xArgImm32:
    inst.Args[narg] = Imm(int32(imm)) // 'imm' = 4276215469 (0xFEE1DEAD).
    narg++

The resulting value is -18751827 (0xFFFFFFFFFEE1DEAD).

I wonder if the op code was interpreted incorrectly? The subsequent xArgImm64 case does the right thing (not converting imm to an int32), and this is 64-bit code.

Thank you for reading!

@gopherbot gopherbot added this to the Unreleased milestone Mar 16, 2021
@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Mar 16, 2021

Even though the code is for 64-bit mode, edi is a 32-bit register, so the immediate operand is not sign-extended to 0xFFFFFFFFFEE1DEAD. Both 0xfee1dead and -0x11e2153 are the same here.

Loading

@stephen-fox
Copy link
Author

@stephen-fox stephen-fox commented Mar 18, 2021

Thanks for the explanation, @zephyrtronium. I guess I'm stuck on the human readable form of the instruction being represented that way. I do not think a debugger would represent a register's value as -0x11e2153. It just feels maclunky, especially if you have several values in a single binary being represented that way. Thinking back to that comment about emulating objdump bugs, is this the way objdump should represent the instruction?

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants