Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/arch/arm64/arm64asm: all fields of potential Args structs should be exported #51517

Open
fabled opened this issue Mar 7, 2022 · 25 comments

Comments

@fabled
Copy link

fabled commented Mar 7, 2022

What version of Go are you using (go version)?

$ go version
go1.17.7

Does this issue reproduce with the latest release?

Yes.

What did you do?

I would like to programmatically access all fields of the arm64 disassembler Instruction Args.

	code := []byte{0x01,0xd4,0x43,0xf9} // LDR X1, [X0,#1960]
	inst, _ := arm64asm.Decode(code)
	fmt.Printf("opcode: %#v\n", inst)
	// arm64asm.Inst{Op:0xb1, Enc:0xf943d401,
	//	Args:arm64asm.Args{0x21, arm64asm.MemImmediate{Base:0x20, Mode:0x3, imm:1960},
	//	arm64asm.Arg(nil), arm64asm.Arg(nil), arm64asm.Arg(nil)}}

	// like to extract the offset
	arg := inst.Args[1].(arm64asm.MemImmediate)
	fmt.Printf("argument: %#v\n", arg)
	// arm64asm.MemImmediate{Base:0x20, Mode:0x3, imm:1960}
	//fmt.Printf("offset: %x\n", arg.imm) // error: "cannot refer to unexported field or method imm"

What did you expect to see?

To be able to access programmatically instruction arguments. This would be consistent with the other disassemblers: x86, arm32 and ppc. All of them have all Args field exported (or be typedef to a basic type). It seems arm64 is the only exception where some fields are not exported.

What did you see instead?

Build error, as several fields are unexported.

Change https://go.dev/cl/388714 contains suggested change.

@ianlancetaylor
Copy link
Contributor

CC @cherrymui

@erifan
Copy link

erifan commented Mar 8, 2022

I'm not against exporting these fields, but I think it's better for you to define a set of data structures yourself to represent Instructions and parameters, because the format of Instructions won't change, but these internal data structures may.

@fabled
Copy link
Author

fabled commented Mar 8, 2022

I believe Instructions and the Args are direct representation of the instruction set encoding. I don't see these data structures changing unless someone changes the ARM opcode encoding.

@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

FWIW it was definitely the intent of the other disassemblers to have these be "pure data" that could be directly manipulated. Is there a reason not to do the same for arm64?

@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

Does anyone object to accepting this proposal?

@erifan
Copy link

erifan commented Mar 31, 2022

I'm still a little concerned about this. There are two points:

  1. I think some data structures of the current disassembler are not well designed enough. For example,
    Go assembly instruction MRS ELR_EL1, R8, encoded as 0xd5384028.
    Go disassembles 0xd5384028 as Go assembly MRS $16897, R8, GNU assembly mrs x8, s3_0_c4_c0_1
    However GCC disassembles the binary as mrs x8, elr_el1
    From the point of view of the Arm specification, mrs x8, s3_0_c4_c0_1 and mrs x8, elr_el1 are both valid, we can use the system register name (elr_el1 ) or the combination of fields (s3_0_c4_c0_1). Currently we don't have any specification saying that which one should we use. If we go the same way as GCC does with system register names, then the data structures defined for system registers today are obviously not good enough, see https://github.com/golang/arch/blob/b76863e36670e165c85261bc41fabaf345376022/arm64/arm64asm/inst.go#L809
    In addition, if we introduce a new arm instruction, maybe we can reuse the current data structure with slight changes, but with exported data structure, we will have to define a new data structure.

  2. I actually don't understand why the disassembler is a separate project, leading to the assembler and disassembler are often not synchronized enough. When I was refactoring the arm64 assembler, I found that they can actually share a lot of things, such as instruction tables, data structures, etc. This gives me a strong drive to do some refactoring work on disassembler to make the two synchronize better, if this is really necessary, then some data structures and tables will inevitably be modified. I don't want the refactoring to be limited by these exported data structures.

@cherrymui
Copy link
Member

I think it is fine to export the fields. But we probably want to polish the names a little bit. For example, currently we have

type RegisterWithArrangement struct {
	r   Reg
	a   Arrangement
	cnt uint8
}

When exported, maybe they should be Reg, Arrangement, and Count, instead of R, A, and Cnt.

Alternatively we can keep fields unexported but add accessor methods (it doesn't seem necessary to set the fields in user code, just read them). But that doesn't sound very Go-like.

@rsc
Copy link
Contributor

rsc commented Apr 6, 2022

It sounds like we want to work on field names and maybe also introduce some new registers for things like ELR_EL1.
Do I have that right?

@rsc
Copy link
Contributor

rsc commented Apr 6, 2022

If the new API will take a while to design, we can always put this on hold until it is ready.

@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

Should we put this on hold?

@fabled
Copy link
Author

fabled commented May 2, 2022

Is anything needed on my part? Or is this waiting still internal decision?

@cherrymui
Copy link
Member

I think we need to put up the names for exported types/fields/methods/etc. @fabled feel free to propose a list. Thanks.

@rsc
Copy link
Contributor

rsc commented May 4, 2022

Will put this on hold; feel free to take off hold once there is a concrete proposal for the new fields. Thanks!

@rsc
Copy link
Contributor

rsc commented May 4, 2022

Placed on hold.
— rsc for the proposal review group

@davidben
Copy link
Contributor

davidben commented Jul 2, 2022

Exporting these would be very useful for some tool I'm toying with to test BoringSSL's assembly code. (I'm more using it as an instruction decoder than a disassembler, but it seems to do the trick!) For now I'm just parsing the string representation of the unexported structs back out, which is rather tedious. Beyond field names, the ExtShift constants also aren't exported.

From the discussion, it sounds like there's some question as to whether the system registers are done right, and also issues with some names. The system registers are less commonly used (and, not coincidentally, I'm less familiar with them 😄 ), so perhaps we should leave those unexported in the first iteration and leave that for later? I would then propose:

  • Export ImmShift fields as-is: Imm and Shift. Imm is consistent with Imm64.Imm and Shift is a fine name.
  • Export ExtShift constants as all-caps with no decoration. So UXTB, etc. ExtShiftUXTB would also work, but this is consistent with X0, etc. It's inconsistent with Arrangement8B, but those would otherwise begin with digits and needed some prefix.
  • Export RegExtshiftAmount fields as-is: Reg, ExtShift, and Amount. Leave show_zero alone for now, but probably rename it to showZero for style consistency. (If I'm understanding that structure correctly, show_zero is not necessary for interpreting the instruction, just stringifying it canonically.) Or we could call it ShowZero if folks want.
  • Export MemImmediate.imm as Imm
  • Leave Systemreg alone for now, per questions in proposal: x/arch/arm64/arm64asm: all fields of potential Args structs should be exported #51517 (comment)
  • Export Imm_fp fields as Sign, Exp, and Mantissa, respectively.
  • Export RegisterWithArrangement fields as Reg, Arrangement, and Count, respectively
  • Export RegisterWithArrangementAndIndex fields as Reg, Arrangement, Index, and Count, respectively

How does that sound to folks?

@fabled
Copy link
Author

fabled commented Jul 6, 2022

Sounds good to me. I'll update the draft PR accordingly to get things forward.

@fabled
Copy link
Author

fabled commented Jul 6, 2022

  • Export ExtShift constants as all-caps with no decoration. So UXTB, etc. ExtShiftUXTB would also work, but this is consistent with X0, etc. It's inconsistent with Arrangement8B, but those would otherwise begin with digits and needed some prefix.

This does not work:

arm64/arm64asm/tables.go:29:2: ASR redeclared in this block
arm64/arm64asm/inst.go:411:2: other declaration of ASR
arm64/arm64asm/tables.go:210:2: LSL redeclared in this block
arm64/arm64asm/inst.go:409:2: other declaration of LSL
arm64/arm64asm/tables.go:212:2: LSR redeclared in this block
arm64/arm64asm/inst.go:410:2: other declaration of LSR
arm64/arm64asm/tables.go:250:2: ROR redeclared in this block
arm64/arm64asm/inst.go:412:2: other declaration of ROR
arm64/arm64asm/tables.go:387:2: SXTB redeclared in this block
arm64/arm64asm/inst.go:405:2: other declaration of SXTB
arm64/arm64asm/tables.go:388:2: SXTH redeclared in this block
arm64/arm64asm/inst.go:406:2: other declaration of SXTH
arm64/arm64asm/tables.go:391:2: SXTW redeclared in this block
arm64/arm64asm/inst.go:407:2: other declaration of SXTW
arm64/arm64asm/tables.go:465:2: UXTB redeclared in this block
arm64/arm64asm/inst.go:401:2: other declaration of UXTB
arm64/arm64asm/tables.go:466:2: UXTH redeclared in this block
arm64/arm64asm/inst.go:402:2: other declaration of UXTH

@fabled
Copy link
Author

fabled commented Jul 6, 2022

Other parts updated to https://go-review.googlesource.com/c/arch/+/388714/. I also changed Imm_fp.Sign to a bool.

@davidben
Copy link
Contributor

davidben commented Jul 6, 2022

This does not work: [...]

Oh right, those are also instructions! Alright, how about ExtShiftLSL, etc?

@fabled
Copy link
Author

fabled commented Jul 13, 2022

Oh right, those are also instructions! Alright, how about ExtShiftLSL, etc?

Sounds good. Patch updated accordingly.

@rsc could you remove the Hold label, and have this reviewed? Thanks!

@davidben
Copy link
Contributor

Ah I guess one interesting point about MemImmediate.imm is that it's not quite an immediate. It's actually either an immediate or a register. See this code:
https://cs.opensource.google/go/x/arch/+/fc48f9fe:arm64/arm64asm/inst.go;drc=fc48f9fe4c157e3ed95b38adbda9b9fe5a31cf03;l=525

Should there be separate Imm and Reg fields with the AddrPostReg option using Reg instead of Imm? If not, we at least should document that AddrPostReg's second register is stored in Imm, as an offset from X0.

@davidben
Copy link
Contributor

Or perhaps s/Reg/Index/ to match MemExtend, if we go with the separate variable option.

@fabled
Copy link
Author

fabled commented Sep 11, 2023

@rsc Gentle ping. Any suggestions how to get this forward?

@erifan
Copy link

erifan commented Sep 11, 2023

Hi @fabled , I am refactoring the arm64 assembler, see #44734, and I also plan to refactor the disassembler based on the new instruction table, so many data structures may be changed, and more argument structs will be supported. In the new code I will export all fields. But I don’t have a clear timeline yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests

7 participants