-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NimBLE scan stops generating results, then crashes when resumed. (IDFGH-2001) #4196
Comments
I'm not 100% sure if this is related, but it occurs in bluedroid too, if you have many results, the scan stuck after a while, so a workaround is to scans a short period of time (1 minute), stop some time (10 secs) and back to scan again instead try scan it forever |
Hi @jimmo I am trying to reproduce this issue with the above mentioned settings using
Can you please confirm if you have not set
Can you please tell me where exactly you are calling
I guess this might be differentiating factor between our setup, I will try to replicate this at my end and update you. |
@prasad-alatkar Thanks for looking into this!
We set
We call cancel from the main MicroPython task. To elaborate -- the MicroPython firmware gives me a REPL prompt on the device, from which I call a Python function which internally calls When the disc callback is invoked, it copies the data into a ringbuffer and notifies the MicroPython task, which then schedules a Python callback that prints out the scan data. (When I referred to making the callback a no-op, this is the code I'm talking about) After some time interval (usually 10-15 mins in my testing at ~50 results/s, but I've had reports from users that this can take many hours at a slower rate), I stop seeing the print output from my Python callback. So that's where I interactively stop the scan from the REPL, then re-start it, at which point I see the crash described in rwble.c. The main MicroPython task is created from app_main().
|
Hi @jimmo, apologies for late reply. FYI, I was able to reproduce the issue with high advertisement rate. Though I am still looking into the root-cause, can you try disabling Please share your valuable observations. |
@prasad-alatkar Great that you were able to repro the issue, and thanks again for looking into it! As you can probably see in our sdkconfig.h, we're using the default value of the sleep params (FYI MicroPython still uses tools/kconfig_new/confgen.py, so I just added Unfortunately I was still able to repro the issue. I actually got it to happen after a very short amount of time (less than a minute), and then again after about 15 minutes. |
hi @prasad-alatkar, as @jimmo mentioned, this seems very similar to my issue #4001 which includes a coredump of the esp_32 when the error occurs which may give you greater insight into the operation of the BT system when it fails. |
Hi @prasad-alatkar I am tracking release/v3.3 for my IDF toolchain. |
Hi @0neblock the issue is not yet fixed, I will update you as soon as it is fixed. |
@prasad-alatkar Apologies for pushing again, but is there any news here? Any workaround is highly appreciated. |
Hi @pschlang , @csushantk is working on fixing the issue in controller. I am also working on workarounds in host, however it may take some time to fix this issue. I will update you as soon as it is fixed. |
Hi @prasad-alatkar, we also observe this issue in 4.0-rc1. We urgently need a workaround for this. Can you give some more details on how to avoid this issue or how we could detect and recover from it? |
+1 if a workaround exists, that would be great until we have a solid fix. |
@prasad-alatkar Any news regarding the fix? |
Hi @dmartauz Can you please try and test on latest master ? commits |
@prasad-alatkar Great, thanks. I will test in a week or two. |
@prasad-alatkar is this excepted to fix bluedroid as well? |
@prasad-alatkar thanks for this. When you say "are expected to resolve the crash issue." does that mean that your previously-failing test case now works? Any chance you'd be able to back-port to v4.0? I don't have the time right now to move MicroPython to v4.1-beta1 (and we'd rather stick to the release version anyway). A few people have contacted us asking about this issue. |
Hi @jimmo , the changes are backported to v4.0 as well.
These fixes address the bugs that were found while debugging this issue. Please let me know if it resolves your issue. @0neblock sorry for delayed response. As the fix is in controller, I believe it will be applicable to bluedroid as well (CC: @csushantk ) |
Thanks @prasad-alatkar, |
@prasad-alatkar ah I was looking out for a new tag, but it seems that you have just moved the existing v4.0 tag? That's going to create some confusion. Can you do point releases in the future? Thanks for the confirmation, I will try and get some people to test it ASAP. I can confirm that MicroPython builds cleanly against the new v4.0 revision tag, I will update here when I hear back from our users. |
This is a "re-release" of v4.0. The "v4.0" tag was updated to include some backported fixes. The main one is espressif/esp-idf#4196
This is a "re-release" of v4.0. The "v4.0" tag was updated to include some backported fixes. The main fix relevant to MicroPython is espressif/esp-idf#4196
Hi @prasad-alatkar, the esp32-bt-lib On release/v3.3 hasn’t been updated for a month, is the fix definitely in there ? |
Hi @0neblock I think commit |
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
Sorry @prasad-alatkar for my delay, I can confirm that disabling 'CONFIG_BLE_ADV_REPORT_FLOW_CONTROL_SUPPORTED' does not fix the issue in #4001, anything else i can try? |
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
Thanks for reporting and sorry for slow turnaround, the fix is available at 2f6f842. Feel free to reopen if the issue still happens. Thanks. |
Hello @prasad-alatkar , I'm having the same problem of @0neblock (issue #4001 ) using ESP-IDF v4.3-dev-907-g6c17e3a64. Regards, Gianluca. |
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
The main fix relevant to MicroPython is espressif/esp-idf#4196 Release notes here https://github.com/espressif/esp-idf/releases/tag/v4.0.1
Environment
git describe --tags
to find it): v4.0-beta1, v4.1-devxtensa-esp32-elf-gcc --version
to find it): (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0Problem Description
Using NimBLE, running a ble_gap_disc, the stack eventually stops generating new scan results (i.e. the ble_gap_event_fn passed to ble_gap_disc no longer gets called).
If I stop scanning (i.e. ble_gap_disc_cancel), then start again, it fails with:
I have been investigating this issue that was reported by MicroPython users. The typical time to failure was several hours, but I was able to repro it much faster with some devices sending advertisements at a high rate (50 per second). Usually takes about 10-15 mins for the issue to occur.
The most important detail about my configuration is that it's running inside MicroPython, but I can repro this if I turn gap_scan_cb into a no-op, so that should hopefully exclude any issues with what we're doing in the callback. No other BLE operations were active. It's a passive scan, with interval=30ms, window=30ms, duration=forever. Another detail is that NimBLE is pinned to core 1 (same core as the MicroPython task), but I saw this problem when it was on the default core (0). The main MicroPython task is blocked on the UART (for the REPL) via ulTaskNotifyTake.
MicroPython doesn't use IDF's makefiles or cmake, but it does use sdkconfig. I've attached the input and generated sdkconfig.h below.
I wonder if this might be related to #4001 (but that was on Bluedroid, not NimBLE). I can repro with both v4.0-beta1 and v4.1-dev.
Here's a disassembly of
r_rwble_isr
which seems to be raising the assert above https://gist.github.com/jimmo/43ba1da440fcbebfab89c70e345f368b (I've marked the relevant line with ********) I've attached the elf below. Built from https://github.com/jimmo/micropython/tree/ble-fixesOther items if possible
application.elf.zip
sdkconfig.zip
The text was updated successfully, but these errors were encountered: