Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned access causing SIGBUS on Raspbian #84

Closed
CubicTec opened this issue May 23, 2023 · 17 comments
Closed

Unaligned access causing SIGBUS on Raspbian #84

CubicTec opened this issue May 23, 2023 · 17 comments
Assignees

Comments

@CubicTec
Copy link

Hardware: Respberry Pi 4B
OS: Raspbian GNU/Linux 11 (bullseye)
EPICS version:7.0.5、7.0.6.1、7.0.7

I built an IOC application with a simple db file:

#scale.db
record(calc,"Scale$(N)"){
field(INPA,"Scale$(N).VAL")
field(CALC,"A+1")
field(SCAN,".1 second")
}

I add pvAccess Server in the App/src/Makefile

#Link QSRV (pvAccess Server) if available
ifdef EPICS_QSRV_MAJOR_VERSION
siocScle_LIBS += qsrv
siocScle_LIBS += $(EPICS_BASE_PVA_CORE_LIBS)
siocScle_DBD += PVAServerRegister.dbd
siocScle_DBD += qsrv.dbd
endif

After make I run the ioc app and I can caget the channel for example Scale1. But when I use pvget the terminal running the ioc above exit the last message is epics> Bus error

@mdavidsaver
Copy link
Member

Please use gdb to capture a stack trace of all threads. eg.

$ gdb --args siocScle st.cmd
(gdb) run
...

Now do what is necessary to trigger a crash.

(gdb) thread apply all backtrace
...

@CubicTec
Copy link
Author

CubicTec commented May 24, 2023

#looks like gdb doesn't support it
屏幕截图 2023-05-24 104050

pi@raspberrypi:~/EPICS/ioc/iocBoot/siocScle $ ls
envPaths  Makefile  st.cmd
pi@raspberrypi:~/EPICS/ioc/iocBoot/siocScle $ gdb st.cmd
GNU gdb (Raspbian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
"0xfffc16a4s": not in executable format: file format not recognized
(gdb) run
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) file st.cmd
"0xfffc159cs": not in executable format: file format not recognized
(gdb) exec-file st.cmd
"0xfffc15acs": not in executable format: file format not recognized
(gdb) exec-file ./st.cmd
"0xfffc15acs": not in executable format: file format not recognized
(gdb)

Please use gdb to capture a stack trace of all threads. eg.

$ gdb --args siocScle st.cmd
(gdb) run
...

Now do what is necessary to trigger a crash.

(gdb) thread apply all backtrace
...

@mdavidsaver
Copy link
Member

#looks like gdb doesn't support it

gdb works just fine if provided with the executable binary file. As you found, it doesn't understand a script file. By:

$ gdb --args siocScle st.cmd
(gdb) run
...

If you are working in ~/EPICS/ioc/iocBoot/siocScle/ this will be something like:

$ gdb --args ../../bin/linux-arm/siocScle st.cmd

For details see https://sourceware.org/gdb/current/onlinedocs/gdb.html/Invoking-GDB.html#Invoking-GDB

@CubicTec
Copy link
Author

CubicTec commented May 24, 2023

Thanks a lot for your instruction!
Looks like we found the problem.
捕获

Thread 15 "UDP-rx 172.16.7" received signal SIGBUS, Bus error.
[Switching to Thread 0xf5bfc440 (LWP 2076)]
0xf7a1eac4 in epics::pvData::PVScalarValue::deserialize(epics::pvData::ByteBuffer*, epics::pvData::DeserializableControl*) () from /home/pi/EPICS/base/lib/linux-arm/libpvData.so.8.0.5

#looks like gdb doesn't support it

gdb works just fine if provided with the executable binary file. As you found, it doesn't understand a script file. By:

$ gdb --args siocScle st.cmd
(gdb) run
...

If you are working in ~/EPICS/ioc/iocBoot/siocScle/ this will be something like:

$ gdb --args ../../bin/linux-arm/siocScle st.cmd

For details see https://sourceware.org/gdb/current/onlinedocs/gdb.html/Invoking-GDB.html#Invoking-GDB

@mdavidsaver
Copy link
Member

Now do what is necessary to trigger a crash.

(gdb) thread apply all backtrace

Once GDB has been triggered. Run the above command. This will produce a very large amount of text. Please copy all of this text into a file and attach that file to this issue. (please no screenshots!)

@CubicTec
Copy link
Author

CubicTec commented May 25, 2023

Hope that I'm doing rigtht.
捕获
Full trace is in the txt attachment as you want.
backtrace.txt

Now do what is necessary to trigger a crash.

(gdb) thread apply all backtrace

Once GDB has been triggered. Run the above command. This will produce a very large amount of text. Please copy all of this text into a file and attach that file to this issue. (please no screenshots!)

@mdavidsaver
Copy link
Member

The faulting frame.

Thread 15 (Thread 0xf5bfc440 (LWP 2076) "UDP-rx 172.16.7"):
#0  0xf7a1eac4 in epics::pvData::PVScalarValue<long long>::deserialize(epics::pvData::ByteBuffer*, epics::pvData::DeserializableControl*) () from /home/pi/EPICS/base/lib/linux-arm/libpvData.so.8.0.5
epics-base/epics-base#1  0xf79fc530 in epics::pvData::PVStructure::deserialize(epics::pvData::ByteBuffer*, epics::pvData::DeserializableControl*) () from /home/pi/EPICS/base/lib/linux-arm/libpvData.so.8.0.5
epics-base/epics-base#2  0xf7b90434 in (anonymous namespace)::BeaconResponseHandler::handleResponse(osiSockAddr*, std::shared_ptr<epics::pvAccess::Transport> const&, signed char, signed char, unsigned int, epics::pvData::ByteBuffer*) () from /home/pi/EPICS/base/lib/linux-arm/libpvAccess.so.7.1.6
epics-base/epics-base#3  0xf7b51008 in epics::pvAccess::BlockingUDPTransport::processBuffer(std::shared_ptr<epics::pvAccess::Transport> const&, osiSockAddr&, epics::pvData::ByteBuffer*) () from /home/pi/EPICS/base/lib/linux-arm/libpvAccess.so.7.1.6
epics-base/epics-base#4  0xf7b525c8 in epics::pvAccess::BlockingUDPTransport::run() () from /home/pi/EPICS/base/lib/linux-arm/libpvAccess.so.7.1.6
epics-base/epics-base#5  0xf7e03e10 in epicsThreadCallEntryPoint () from /home/pi/EPICS/base/lib/linux-arm/libCom.so.3.22.0
epics-base/epics-base#6  0xf7e08ce4 in start_routine () from /home/pi/EPICS/base/lib/linux-arm/libCom.so.3.22.0
epics-base/epics-base#7  0xf7702310 in start_thread (arg=0xf5bfc440) at pthread_create.c:477
epics-base/epics-base#8  0xf7d19da8 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6

@mdavidsaver
Copy link
Member

Could you try making this change? Then re-run make for epics-base and re-test.

diff --git a/modules/pvData/src/misc/pv/byteBuffer.h b/modules/pvData/src/misc/pv/byteBuffer.h
index e507f15..4a71c8a 100644
--- a/modules/pvData/src/misc/pv/byteBuffer.h
+++ b/modules/pvData/src/misc/pv/byteBuffer.h
@@ -157,7 +157,7 @@ struct swap<8> {
  * in execution time and/or object code size of byte-wise copy.
  */
 
-#ifdef _ARCH_PPC
+#if defined(_ARCH_PPC) || defined(__arm__) || defined(_M_ARM)
 
 template<typename T>
 union alignu {

@CubicTec
Copy link
Author

CubicTec commented May 25, 2023

It works. I can pvget/pvmonitor now. Thank you!!!

Could you try making this change? Then re-run make for epics-base and re-test.

diff --git a/modules/pvData/src/misc/pv/byteBuffer.h b/modules/pvData/src/misc/pv/byteBuffer.h
index e507f15..4a71c8a 100644
--- a/modules/pvData/src/misc/pv/byteBuffer.h
+++ b/modules/pvData/src/misc/pv/byteBuffer.h
@@ -157,7 +157,7 @@ struct swap<8> {
  * in execution time and/or object code size of byte-wise copy.
  */
 
-#ifdef _ARCH_PPC
+#if defined(_ARCH_PPC) || defined(__arm__) || defined(_M_ARM)
 
 template<typename T>
 union alignu {

@CubicTec CubicTec reopened this May 25, 2023
@CubicTec
Copy link
Author

I'm gonna check if it runs well on my colleague's Macbook with M1.

It works. I can pvget/pvmonitor now. Thank you!!!

Could you try making this change? Then re-run make for epics-base and re-test.

diff --git a/modules/pvData/src/misc/pv/byteBuffer.h b/modules/pvData/src/misc/pv/byteBuffer.h
index e507f15..4a71c8a 100644
--- a/modules/pvData/src/misc/pv/byteBuffer.h
+++ b/modules/pvData/src/misc/pv/byteBuffer.h
@@ -157,7 +157,7 @@ struct swap<8> {
  * in execution time and/or object code size of byte-wise copy.
  */
 
-#ifdef _ARCH_PPC
+#if defined(_ARCH_PPC) || defined(__arm__) || defined(_M_ARM)
 
 template<typename T>
 union alignu {

@mdavidsaver
Copy link
Member

Hardware: Respberry Pi 4B
OS: Raspbian GNU/Linux 11 (bullseye)

Can you provide any more detail about your system?

Is it running a custom/local Linux kernel build?

I have a pi3b with stock Raspbian, which seems to be able to handle unaligned memory access transparently. So I don't see any SIGBUS.

A couple of things to check:

$ uname -a
Linux raspberrypi 6.1.21-v7+ #1642 SMP Mon Apr  3 17:20:52 BST 2023 armv7l GNU/Linux

Assuming a Raspbian kernel config...

$ sudo modprobe configs
$ zgrep ALIGN /proc/config.gz      
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_ALIGNMENT_TRAP=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_CMA_ALIGNMENT=8

@mdavidsaver
Copy link
Member

Also, check /proc/cpu/alignment.

$ cat /proc/cpu/alignment 
User:           218222
System:         0 (0x0)
Skipped:        0
Half:           0
Word:           0
DWord:          70291
Multi:          143627
User faults:    2 (fixup)

@mdavidsaver
Copy link
Member

If I switch to "fixup+warn" mode

# echo 3 > /proc/cpu/alignment

shows what are probably the same accesses which result in a SIGBUS for @CubicTec

$ dmesg
...
[1770496.191207] Alignment trap: UDP-rx 192.168. (884) PC=0x74f02700 Instr=0xe8930006 Address=0x00f3f6d6 FSR 0x001
[1770496.191207] Alignment trap: UDP-rx 192.168. (907) PC=0x74e5c700 Instr=0xe8930006 Address=0x01d1018e FSR 0x001
...

In "fixup" mode I don't see any issues.

# echo 2 > /proc/cpu/alignment

A very informative post I came across.

@mdavidsaver mdavidsaver transferred this issue from epics-base/epics-base May 25, 2023
@mdavidsaver mdavidsaver changed the title pvAccess Server have trouble running on Respberry Pi Unaligned access causing SIGBUS on Raspbian May 25, 2023
@mdavidsaver mdavidsaver changed the title Unaligned access causing SIGBUS on Raspbian Unaligned access causing SIGBUS on Raspbian May 25, 2023
@mdavidsaver mdavidsaver self-assigned this May 25, 2023
@CubicTec
Copy link
Author

I am using the official Raspberry Pi OS release.

Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux

Unfortunately I can't check the alignment.
微信截图_20230525161904

@mdavidsaver
Copy link
Member

Ah, you are running a 64-bit (aarch64) kernel, while my pi3 has 32-bit (armv7l). This certainly explains the absence of /proc/cpu/alignment, and probably the difference in alignment handling as well.

Anyway, I've merged a fix (#85) which should have the same effect as the patch which you tested.

Thank you for reporting this!

@CubicTec
Copy link
Author

CubicTec commented May 31, 2023

I'm gonna check if it runs well on my colleague's Macbook with M1.

It works. I can pvget/pvmonitor now. Thank you!!!

Could you try making this change? Then re-run make for epics-base and re-test.

diff --git a/modules/pvData/src/misc/pv/byteBuffer.h b/modules/pvData/src/misc/pv/byteBuffer.h
index e507f15..4a71c8a 100644
--- a/modules/pvData/src/misc/pv/byteBuffer.h
+++ b/modules/pvData/src/misc/pv/byteBuffer.h
@@ -157,7 +157,7 @@ struct swap<8> {
  * in execution time and/or object code size of byte-wise copy.
  */
 
-#ifdef _ARCH_PPC
+#if defined(_ARCH_PPC) || defined(__arm__) || defined(_M_ARM)
 
 template<typename T>
 union alignu {

Still works well on M1.

@CubicTec
Copy link
Author

Ah, you are running a 64-bit (aarch64) kernel, while my pi3 has 32-bit (armv7l). This certainly explains the absence of /proc/cpu/alignment, and probably the difference in alignment handling as well.

Anyway, I've merged a fix (#85) which should have the same effect as the patch which you tested.

Thank you for reporting this!

Glad to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants