-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] static structs not zero after leaving bootloader mode #1408
[Bug] static structs not zero after leaving bootloader mode #1408
Comments
There are two ways to trigger it:
If there is a way to detect whether we just came out of bootloader update mode, maybe we could do a self reset to fix this. |
So I think we can do:
This should leave all existing experiences unchanged, and ensure a clean boot, I think? ** That doesn't work, presumably because the bootloader resets it, so maybe this means we should use a separate magic address that the LEGO bootloader does not look at, or find another way to tell it was a Pybricks reboot or a BLE update mode exit. |
Or is the zero-ing something we can just fix in the linker script and Or is it more likely to be something else entirely? 🤔 |
So likely this is something else, like writing past the end of some static array overwriting this. |
Could also be a GCC LTO bug, so trying a build with that disabled could be worthwhile. |
Thanks. Not-booting tends to vary build by build, perhaps depending on how static memory happens to be aligned, so maybe I'll have to keep an eye on both this and the non-LTO version for a while. It does appear at least deterministic once built, and the current version crashes well after uart is initialized, so fortunately there are some rudimentary debug tools. I was hoping to launch a quick beta to get some things tested before announcing a bigger new feature but maybe it isn't very practical that way. |
If we know one of the affected addresses, can we get any meaningful information by analyzing the binary? Even if just to confirm this is the kind of issue to look for, rather than something else entirely. |
I also wondered what the difference could be in that this only occurs when coming out of bootloader update mode. |
It could be useful to look at the |
If the bootloader doesn't disable interrupts, then I suppose it could be possible that an interrupt occurs that writes over that memory before we update the VTOR table in SystemInit. IIRC, the newer hubs starting with SPIKE Prime do disable interrupts, but the older hubs don't. So maybe we need to disable interrupts early in |
Like this? diff --git a/lib/pbio/platform/technic_hub/pbdrvconfig.h b/lib/pbio/platform/technic_hub/pbdrvconfig.h
index e3fbfb194..6e5321eb6 100644
--- a/lib/pbio/platform/technic_hub/pbdrvconfig.h
+++ b/lib/pbio/platform/technic_hub/pbdrvconfig.h
@@ -103,3 +103,4 @@
#define PBDRV_CONFIG_LAST_MOTOR_PORT PBIO_PORT_ID_D
#define PBDRV_CONFIG_SYS_CLOCK_RATE 80000000
+#define PBDRV_CONFIG_INIT_ENABLE_INTERRUPTS_ARM (1)
diff --git a/lib/pbio/platform/technic_hub/startup.s b/lib/pbio/platform/technic_hub/startup.s
index 61ac94791..faa57f716 100644
--- a/lib/pbio/platform/technic_hub/startup.s
+++ b/lib/pbio/platform/technic_hub/startup.s
@@ -71,20 +71,22 @@ defined in linker script */
* @param None
* @retval : None
*/
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
ldr sp, =_estack /* Atollic update: set stack pointer */
+ cpsid i /* Disable interrupts */
+
/* Copy the data segment initializers from flash to SRAM */
movs r1, #0
b LoopCopyDataInit
CopyDataInit:
ldr r3, =_sidata
ldr r3, [r3, r1]
str r3, [r0, r1]
Perhaps a silly question but just trying to wrap my head around this... So once turned back on, it will use our own ISRs as set up in our application's vector table, so nothing previously set up by the bootloader will run anymore? |
This does get the (at least one) affected build to boot after update (thanks!). Since this change probably doesn't affect static memory allocation, that seems more likely a good sign than accidental like we've seen across other build changes. |
Pretty much. I would even make it the first instruction in the file, so even before updating the stack pointer. (and mention in the comment that the bootloader doesn't do this which is why we have to do it here)
Correct.
Awesome! |
We appear to be getting corrupted data or nonzero static global structs when running the firmware after leaving BLE firmware update mode. Depending on the build, this could make the hub not boot after installation. Fixes pybricks/support#1408
I may be seeing some strange side effects of this. Just before releasing v3.4.0b2 last week, I was seeing some strange things on the Technic Hub. Programs would not be saved on the hub, and I was able to trace it back to It was consistent, but it would only show under a very specific set of circumstances, which may just be a coincidental combination of memory usage:
from pybricks.hubs import TechnicHub
from pybricks.iodevices import XboxController
from pybricks.parameters import Color
from pybricks.tools import wait
# Set up all devices.
technic_hub = TechnicHub()
controller = XboxController()
# The main program starts here.
print('Hello, Pybricks!')
technic_hub.light.on(Color.ORANGE)
# Cancel the program and turn the hub off.
# If you turn it back on, the program is not there.
wait(2000000) As I was working on the auto-disconnect step then, I thought that must have somehow caused it. I moved a few things around in pybricks/pybricks-micropython@ea0379d and it worked around it. That change wasn't bad generally, but it doesn't quite explain why it fixed saving programs. Today then, I went back to pybricks/pybricks-micropython@9610d03 which was the last commit that reproduced the above, and instead reverted the commit that fixed the boot issue (pybricks/pybricks-micropython@68828ac). This also fixed saving, and it seems a more likely candidate. tl;dr: Could the boot fix have side effects that we missed? Could it have caused the programs-not-saving issue? (And maybe other odd effects, in different builds. Maybe Gatt issues, etc.) Update: Ok, maybe reverting the interrupt boot fix didn't resolve that either. |
The not-saving issue seems unrelated, so closing this again. |
Describe the bug
This is a bit of a weird one. In the past, I think we've occasionally seen that a hub wouldn't immediately start after installing the firmware, and this may be related.
Sometimes, it's a bit worse. It appears that for some builds it does start, and then keeps restarting. In my current build it crashes before the lights come on, so it appears not to start at all. It can only start if you pull the batteries first.
So we can't release such a build. Even if it did boot, all bets are off with other undefined behavior.
Digging deeper, it appears to crash here which calls this.
Apparently,
self->parent_object != NULL
returns true, which causes the protection against calling the stop function to fail.But
dcmotor[i].parent.parent_object
is statically allocated and should beNULL
.So just a hunch, but maybe the LEGO bootloader happens to use the memory region that happens to overlap with this struct in this particular build?
Reproduce
This build is not released yet, but will be shortly.
The text was updated successfully, but these errors were encountered: