-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: encoding/binary.PutUint32 generates unaligned data storage on arm64 arch #59856
Comments
CC @randall77 |
Unaligned accesses are definitely allowed on arm64. Is this a memory-mapped device or something? I don't see how we support enforcing aligned accesses without turning all read and write combining off. For instance,
We want to implement that with a 16-bit write of the constant This definitely requires some low-level support. I don't think there's any way for us to provide that by default. Using |
I can confirm that arm64 allows unaligned access, the ARMv8-A architecture allows many types of load and store accesses to be arbitrarily aligned. Most unaligned accesses have no performance penalties. I'm also curious what kind of device this is. |
Note that the 1-2-1 store pattern is somewhat inefficient and I think should be fixed by https://go-review.googlesource.com/c/go/+/478475 |
I am wondering if we can intrinsify this function into a 32bit store. However, the write address may still be unaligned, but we can check it before calling this function if alignment is needed. |
Alignment checking is controlled by a bit in the System Control Register. |
Apologies for confusion, absolutely arm64 supports unaligned access. The SIGBUS issue I experience is with a memory mapped device which does not support the unaligned access. I tried to carefully word the issue by saying things like "problematic" for "certain systems". And I understand how coalescing multiple byte writes into a single multi-byte instruction makes sense for performance. It just seemed strange for this particular go code to generate that 1-2-1 storage pattern. I would think that passing |
@randall77 I can confirm that the 1-2-1 pattern is fixed in the master branch. Thanks for the link, good learning opportunity for me. There is now a single 32-bit store: And before I close the topic, I figured I'd try the 64-bit version as well. The code
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The following code generates some funky assembly on ARM64.
What did you expect to see?
I'm compiling the executables with
On x86_64 (amd64) and arm architectures, I get reasonable results.
In amd64, we have a
movl $0xff,(%rax)
andmovups %xmm15,0x38(%rsp)
(see line 4828cd below). This correctly sets b, producing outputff000000
. The instructions are concise.On 32-bit arm, it's a similar story.
mvn r1, #0
andstrb r1, [r0]
sets the first byte to 0xff, followed by three remainingstrb r1
instructions to offsets from r0. See line a2d18 below. This works fine.What did you see instead?
On aarch64/arm64 architecture, something strange happens. It loads x3 with -1 , but then the strb/sturh/strb instructions produce a situation where there is a 1-2-1 byte store sequence. See line 8e880 below. This does correctly set the value, but it does so in a way that likely causes an unaligned data store. The system I'm working with generates a SIGBUS for the unaligned multi-byte store. Thus I cannot use the encoding/binary PutUint32 function for writing directly to mmap'd sections, at least not without writing to a temporary first and then using copy to go from temporary to mmap'd memory.
4 single byte strb instructions would've worked fine, or a 4-byte str. But the construct below is problematic due to the unaligned access.
This could be an unintended optimization, since it is generally correct code, just a little awkward for something that should "PutUint32" and problematic/illegal for certain systems.
I would research and suggest a fix myself, but I am unfamiliar with how to find the assembly code generator for low level issues like this.
The text was updated successfully, but these errors were encountered: