Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote nodes with admin channels reboot sometimes #811

Closed
geeksville opened this issue May 25, 2021 · 9 comments
Closed

remote nodes with admin channels reboot sometimes #811

geeksville opened this issue May 25, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@geeksville
Copy link
Member

Coming back to this again after quite a bit of testing, I have now managed to identify a somewhat more reproducible way of causing resets on T-beams v1.0 over the air.

A single node running 1.2.30 with one encrypted channel can sit without resets permanently (I tested 23+h without a single reset)

With four nodes on the mesh, all flashed with 1.2.30, I see a reset every ~24 min or so (based on about 25 resets in 10h). These occur even if no messages are actively sent. On the two nodes that have a screen I can see that they often reset almost simultaneously.
I’ve tested this with the nodes nodes connected via bluetooth and not, with and without admin channels enabled.
The most reproducible way has been to have a remote node with an admin channel enabled. A reset is triggered in the remote node when meshtastic —debug is sent to a USB connected node. (one might have to wait a couple of minutes if the remote device is sleeping).
I am unable to reproducibly induce this behaviour if the remote node does not have an admin channel enabled.

I therefore thought the resets occur only if the admin channel is enabled on the remote node, but after reflashing and only activating the primary channel in all nodes, two nodes nevertheless still reset after a few minutes (even without meshtastic —debug) just as I was writing this (both without USB connection, externally powered nodes don’t seem to reset).

It seems whatever is sent periodically in the mesh without user interaction might also be sent when running meshtastic —debug, and that this can cause the resets.

Can anybody else reproduce this? As mentioned previously, I cannot seem reproduce this if the nodes are powered.

...

I can see that, too. Whether it’s a LoRa32 or a T-Beam, powered by battery or via USB, they would eventually reboot. They all have the admin channel configured.

Plus, some nodes (of course it’s always the remote ones!) would eventually freeze, requiring a manual reboot. They would go on one, two or three days, then simply freeze.

from:
https://meshtastic.discourse.group/t/new-device-release-1-2-30-ready-for-alpha-testing/3272/20

@geeksville geeksville self-assigned this May 25, 2021
@geeksville geeksville changed the title nodes reboot sometimes remote nodes with admin channels reboot sometimes May 25, 2021
@geeksville
Copy link
Member Author

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/new-release-of-python-api-1-2-35-and-geeksvilles-current-queue/3398/4

@geeksville geeksville added the bug Something isn't working label May 25, 2021
@IZ1IVA
Copy link
Contributor

IZ1IVA commented May 25, 2021

@geeksville running a T-Beam without battery, powered from USB, admin channel configured. Here's what happens before a spontaneous reboot:

09:46:34 2015 [PowerFSM] GPS prepare sleep!
09:47:04 2045 [PowerFSM] GPS prepare sleep!
09:47:04 2045 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0
09:47:04 2045 [PowerFSM] GPS prepare sleep!
09:47:34 2075 [PowerFSM] GPS prepare sleep!
09:47:34 2075 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0
09:47:34 2075 [RadioIf] (bw=125, sf=12, cr=4/8) packet symLen=32 ms, payloadSize=42, time 3645 ms
09:47:34 2075 [RadioIf] Lora RX (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5)
09:47:34 2075 [RadioIf] AirTime - Packet received : 3645ms
09:47:34 2076 [Router] Adding packet record (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5)
09:47:34 2076 [Router] Using channel 0 (hash 0xb1)
09:47:34 2076 [Router] Expanding short PSK #1
09:47:34 2076 [Router] Installing AES128 key!
09:47:34 2076 [Router] Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x400014e8 PS : 0x00060b30 A0 : 0x8011f720 A1 : 0x3ffd1560
A2 : 0x0000001a A3 : 0x00000018 A4 : 0x000000ff A5 : 0x0000ff00
A6 : 0x00ff0000 A7 : 0xff000000 A8 : 0x00000000 A9 : 0x00000008
A10 : 0x3ffd51b8 A11 : 0x3ffd175c A12 : 0x3ffd51c0 A13 : 0x3ffb28c4
A14 : 0x0000002a A15 : 0x3ffd19f0 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000018 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffd

ELF file SHA256: 0000000000000000

Backtrace: 0x400014e8:0x3ffd1560 0x4011f71d:0x3ffd1570 0x401284ee:0x3ffd1880 0x4012852a:0x3ffd1910 0x400d45cd:0x3ffd1950 0x400d4757:0x3ffd1990 0x400e547e:0x3ffd19e0 0x400def71:0x3ffd1a10 0x400df11a:0x3ffd1a30 0x400df1fa:0x3ffd1a50 0x400df219:0x3ffd1a70 0x400db1b9:0x3ffd1aa0 0x400d4e62:0x3ffd1ac0 0x400f1e21:0x3ffd1ae0 0x400da5d4:0x3ffd1b10 0x401022bd:0x3ffd1b30

Rebooting...

@geeksville
Copy link
Member Author

geeksville commented May 25, 2021 via email

@IZ1IVA
Copy link
Contributor

IZ1IVA commented May 25, 2021

firmwareVersion is 1.2.30.80e4bc6

Cheers!

@geeksville
Copy link
Member Author

@michelepagot it is bin/exception_decoder.py in this git repo (someone donated it sometime ago and I bet it came from one of those places). Usage is bin/exception_decoder.py -e elffilepath exceptionmessagefile

It warms my heart that you asked and that you might be doing more to extend/fix the device code in the future. ;-)

@geeksville
Copy link
Member Author

(also I just noticed we aren't keeping elfs in the github artifacts - no problem for now because I can rebuild locally, but I'll update the github actions to keep elfs in a separate artifact)

@geeksville
Copy link
Member Author

investigating, but here's the stack trace

~/development/meshtastic/meshtastic-esp32$ bin/exception_decoder.py -e .pio/build/tbeam/firmware.elf ex
stack:
0x4011f71d: _svfprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vfprintf.c:1529
0x401284ee: _vsnprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c:72
0x4012852a: vsnprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c:41
0x400d45cd: RedirectablePrint::vprintf(char const*, __va_list_tag) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/RedirectablePrint.cpp:37
0x400d4757: RedirectablePrint::logDebug(char const*, ...) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/RedirectablePrint.cpp:96
0x400e547e: pb_decode_from_bytes(unsigned char const*, unsigned int, pb_msgdesc_s const*, void*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/mesh-pb-constants.cpp:33
0x400def71: perhapsDecode(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df11a: Router::handleReceived(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df1fa: Router::perhapsHandleReceived(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df219: Router::runOnce() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400db1b9: ReliableRouter::runOnce() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/DSRRouter.cpp:240
0x400d4e62: concurrency::OSThread::run() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/concurrency/OSThread.cpp:45
0x400f1e21: ThreadController::runOrDelay() at /home/kevinh/development/meshtastic/meshtastic-esp32/.pio/libdeps/tbeam/Thread/ThreadController.cpp:153
0x400da5d4: loop() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/main.cpp:653
0x401022bd: loopTask(void*) at /home/kevinh/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:19
~/development/meshtastic/meshtastic-esp32$ 

@michelepagot
Copy link
Contributor

it is also already documented in https://meshtastic.org/docs/software/other/build-instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants