Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: sigbus/segfault on ARM using AzCopy #38331

Open
gcormier opened this issue Apr 9, 2020 · 5 comments
Open

cmd/link: sigbus/segfault on ARM using AzCopy #38331

gcormier opened this issue Apr 9, 2020 · 5 comments

Comments

@gcormier
Copy link

@gcormier gcormier commented Apr 9, 2020

What version of Go are you using (go version)?

go version go1.14.2 linux/arm

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/home/pi/.cache/go-build"
GOENV="/home/pi/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm"
GCCGO="gccgo"
GOARM="7"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/pi/azure-storage-azcopy-10.3.4/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build366671593=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Compiled and used azcopy. azcopy will SOMETIMES segfault. Opened an issue ( Azure/azure-storage-azcopy#882 ) and developers suggest it is an issue with golang on ARM.

wget https://github.com/Azure/azure-storage-azcopy/archive/10.3.4.zip
tar zxvf 10.3.4.zip
cd azure-storage-azcopy-10.3.4/
go build
go install

What did you expect to see?

Completed upload.

What did you see instead?

pi@pbsorca:~ $ azure-storage-azcopy copy --recursive "$RAWFS" "$RAWFS_SAS" --log-level warning
INFO: Scanning...

Job 768e77f4-9f04-ee48-7fa4-b287ee176e20 has started
Log file is located at: /home/pi/.azcopy/768e77f4-9f04-ee48-7fa4-b287ee176e20.log

0.0 %, 0 Done, 0 Failed, 10000 Pending, 0 Skipped, 10000 Total (scanning...), unexpected fault address 0x0
fatal error: fault
[signal SIGBUS: bus error code=0x1 addr=0x0 pc=0x1296c]

goroutine 164 [running]:
runtime.throw(0x60c151, 0x5)
        /usr/local/go/src/runtime/panic.go:1116 +0x5c fp=0x539b8fc sp=0x539b8e8 pc=0x460f0
runtime.sigpanic()
...

See attached for full trace.

azcopy-golang-seg.txt

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Apr 9, 2020

Looking at the original issue, were you able to prove/disprove if alignment was a problem? It's not clear from the original thread, and seems easy enough to test by just checking alignment somewhere near

https://github.com/Azure/azure-storage-azcopy/blob/master/common/atomicmorph.go#L16

Additionally, do you have a smaller repro case?

@gcormier
Copy link
Author

@gcormier gcormier commented Apr 9, 2020

I did try out their branch for alignment, it did not resolve the issue.

Unfortunately I don't have a good repro case. AzCopy does work on some folders, and not on others. I can see if I can try to come up with some files that will fail consistently.

@gcormier
Copy link
Author

@gcormier gcormier commented Apr 9, 2020

This file (attached) seems to fail (no need to decompress). Note if you did decompress, the FLAC inside is also failing to upload.

Windows:
image

Pi/ARM:

pi@pbsorca:~ $ azure-storage-azcopy copy fail.zip "$FAILKEY"
INFO: Scanning...

Job 787f867d-b0f3-9546-489a-ace590fa54e7 has started
Log file is located at: /home/pi/.azcopy/787f867d-b0f3-9546-489a-ace590fa54e7.log

0.0 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total, ^Cunexpected fault address 0x0
fatal error: fault
[signal SIGBUS: bus error code=0x1 addr=0x0 pc=0x1296c]

goroutine 130 [running]:
runtime.throw(0x60c151, 0x5)
        /usr/local/go/src/runtime/panic.go:1116 +0x5c fp=0x1cddaf0 sp=0x1cddadc pc=0x460f0
runtime.sigpanic()
...

fail.zip

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 9, 2020

I strongly suspect alignment issues.

The address of the underlying structure is computed as:

	return (*JobPartPlanTransfer)(unsafe.Pointer((uintptr(unsafe.Pointer(jpph)) + unsafe.Sizeof(*jpph) + uintptr(jpph.CommandStringLength)) + (unsafe.Sizeof(JobPartPlanTransfer{}) * uintptr(transferIndex))))

I don't see any reason why CommandStringLength would be a multiple of 4.
It is initialized as

		CommandStringLength:   uint32(len(order.CommandString)),

So I think a JobPartPlanTransfer just isn't written to the file at a properly aligned location.
You should be able to check this easily enough. Add an assertion that CommandStringLength is a multiple of 4 at every assignment and use.

@JohnRusk
Copy link

@JohnRusk JohnRusk commented Apr 9, 2020

Thanks @randall77 . We (the AzCopy team) will look into that.

@andybons andybons added this to the Unplanned milestone Apr 10, 2020
@andybons andybons changed the title sigbus/segfault on ARM using AzCopy cmd/link: sigbus/segfault on ARM using AzCopy Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.