-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BT Controller - Stops Scanning or responding after random amount of time (IDFGH-1781) #4001
Comments
One thing to note which I just thought of, I am compiling with Arduino-ESP32 as a component. |
Hey @igrr (Don't know who else to @), woukd someone be able to have a look at this, this issue is still ongoing for me. |
Hello, sorry for the late reply. Do you added a lot of printing in esp_gap_cb? I recommend that you turn off some unnecessary printing, such as remove the print of device name and address from when scanning. This is not the final solution, but try to see if the issue is resolved.Thank you. |
Hi @gengyuchao, thanks for your reply. |
According to your description, I have not been able to reproduce the problem. Can you give me a sample code of your problem? So I can try to track this problem, thank you. |
Hi @gengyuchao, I will try to build a smaller program that can reproduce the issue. It seems that the error is that the pre-compiled bt lib is the component that is crashing. |
Seems to be related to #4196 |
Hi @prasad-alatkar is there any update on this or Issue #4196 ? |
We are working on BT controller firmware fix for the issue. In BLE scan scenario, couple of issues are observed where BT controller reboots with controller level malfunction error code OR just stops responding without any known error. Issue is related to the handling of scan reports in BT controller when there are large number of scan reports in short frame of time. We will release the further details and updated bt lib as soon as possible. |
Thanks for your prompt response, good to know you have identified the issue source. Look forward to applying a fix! |
Hi @csushantk I am tracking release/v3.3 for my IDF toolchain. |
Observing the same issue with v4.0-beta2. Any news here @csushantk ? This is really urgent for us. |
The same issue in v3.3.1. It is strange that scan_result->scan_rst.num_resps is not reset between scans. |
Any updates from anyone on this bug? Still an issue with the latest release/v3.3 branch |
I update out firmware to v4.1 (this error did not reproduce in v4.1). |
Hello @Sushant-Espressif , I'm having the same issue of @0neblock with the same "fw environment". I'm using ESP-IDF v4.3-dev-907-g6c17e3a64. Regards, Gianluca. |
@GianlucaLoi @0neblock In our local setup, with Bluedroid Host, we are not able to reproduce the issue of "BLE stops scanning randomly" (tested for one week continuously).
|
Hello @Sushant-Espressif , Thanks for the response. > 1. Are there excessive prints in your application? > 2. Is application task set to higher priority and hogging the CPU? > 3. Is it possible to share any other details about the application so that we can quickly reproduce the issue?
If you need more information, I will be glad to give them. Regards, Gianluca. |
Hello @Sushant-Espressif , Do you have any update about this problem? EDIT: Regards, Gianluca. |
Hi @GianlucaLoi |
Hello @chhajedji |
Hi @GianlucaLoi
|
Hello @chhajedji I performed your steps but, in the Linker section, I obtain these errors from the libbtdm_app.a:
How can I solve them? Gianluca. |
Can you try doing a Also note that you will have to first update submodules ( |
Hello @chhajedji I'm still doing your test because I have to adapt some function to your repository to work well. One more information to understand the problem (maybe):
Regards, Gianluca. |
Hi @GianlucaLoi I am also testing with this parameters and see if I can reproduce it. In case you get the crash, please share the logs. |
Hi @vbvchauthmal, I will trying to recreate the issue. Although I tried same earlier for @GianlucaLoi and before I could recreate it, changing some parameters helped for them. Please share some more details about your failing scenario through which I can reproduce it.
Also please provide any other information you feel which could be helpful to recreate or solve this issue. |
Hi @chhajedji
I am using ESP-IDF version v3.3.5 (commit id : 03810c4) after your suggestion, earlier I was using v3.3.1 (commit-id : 143d26a )
I am doing BLE scanning and below are the scan parameters set in my source code :
The idf example closely resemble with my application is gattc_multi_connect. This application extended with setting of BLE GAP security parameters and supporting interfacing of five BLE peripheral devices. At a time only one BLE peripheral will be allowed to connect when its broadcasting is captured to get sensor readings i.e. through BLE notifications/indications.
Till now we have deployed 6000 of our ESP32 based platform with this developed firmware and all must have different numbers of BLE devices in vicinity which can be advertisers or scanners. Most of these deployed showing this issue.
Its occurring at random sometimes it will arise after week or sometimes it will take few minutes or hours.
Query :
|
We are facing the same issue |
@Rokachy Can you please try with the latest v4.3 release? We did a test and did not reproduce the issue. We are still testing with mass devices on the same. |
Yes, I will and let you know for results. Yehuda |
@Rokachy Yes, it is a random issue. Am afraid no need to enable anything at the moment, the best would be packets capture. Please try with latest v4.3 first. Thanks. |
Its runs with v4.3 release for a few days, no issue so far -:) |
@TianaESP Any chances of this (possible) fix being backported to v3.3? |
@0neblock The fixes were backported to v3.3, v4.0, v4.1, v4.2. Please try the latest v3.3. |
@Rokachy We fixed bugs in modem sleep that we suspected were contributing to the problem. Please let us know if the issue happens again. Thanks. |
Thank you for the support, we are using release 4.3 and we didn't see the
issue for the last 2 weeks. I hope it will be kept like this 😀.
Thank you for the good work.
Yehuda
…On Mon, Jun 28, 2021, 14:46 TianaESP ***@***.***> wrote:
@Rokachy <https://github.com/Rokachy> We fixed bugs in modem sleep that
we suspected were contributing to the problem. Please let us know if the
issue happens again. Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4001 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS7K4PJZQ6Z4A7UH2VWTDPTTVBOIVANCNFSM4ITBESYQ>
.
|
is the fix backported to 4.0.3 version too? |
Thanks for sharing updates, the fix has already backported to release/4.0 b89b1ec, thanks. |
I have observed an issue, which might relate to this. At least the result is the same: BLE stack stops to work properly... In my situation, I am scanning for BLE advertisements, and at some point in time, the scan stops. It typically happens when ESP32 is busy (e.g. writing a lot of information to Debug Console). I enabled "CONFIG_BLE_HOST_QUEUE_CONGESTION_CHECK", which helps a lot, and actually shows, that BTU queue "often" has congestion. But I also observed (easy to reproduce by changing "BT_QUEUE_CONGEST_SIZE" to 20 in file "bt_common.h"), that when congestion occurs, it actually locks the ble stack completely. If I allow "hciH4T", "btuT" and "BTC_TASK" tasks to use same priority, I do not see this lockup. |
Hi! Any update about this? I noticed this same issue when using BLE + Classic (v4.4-beta1). Thanks! edit: v4.3.1 is also affected |
We are using 4.3.1 and we are not seeing the issue
…On Thu, Dec 30, 2021, 18:53 Juan Ávila ***@***.***> wrote:
Hi! Any update about this? I noticed this same issue when using BLE +
Classic (v4.4-beta1). Thanks!
—
Reply to this email directly, view it on GitHub
<#4001 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS7K4PPKJ26AMQ5UEROFOM3UTSE7LANCNFSM4ITBESYQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Are you using bluedroid (with BTDM mode)? With only ble scan I can't reproduce it I'm doing name requests (read_remote_name) and at the same time a ble scan |
Using the latest Arduino ESP32, v 2.0.2 that uses idf 4.4-alpha1 I also get the error "BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c" When this happens I also see that the application reconnects to WiFi as I have WiFi / BLE co-existence. BLE scanning for 1 second at the time, then a short break before I start scanning again. |
+1, although the changelog includes this bug fix, it hasn't been resolved in the latest version yet. Also, since the bluetooth controller code is not open source, it is not possible to do anything on our side... |
Unfortunately, I seem to have the same problem. |
v4.4.1, has the same problem. BLE/Classic BT + Wifi still unusable in a prod environment. |
I had the same problem for months and finally solved it by adjusting the scan interval and scan window. I think the default might be 100/99 which gives very little resources for the ESP to "take out the trash" and "wash the dishes" when you are running a continuous scan like I am. I changed it to 800/750 and now suddenly the scanner runs and runs without crapping out. Amazing! You can use different numbers I assume, but just be sure to leave some gap between the two numbers so the scanner will take a break and allow the ESP to do some other housekeeping. (I am just speculating that the housekeeping issue is the root of the problem; it might also be that the scanner is overheating or overeating or over somethinging.) FYI, I had the same problem with the ESP32-WROOM-32U and also with the ESP32-C3. Changing to 800/750 solved it for both of them. |
Are you sure it's really solving it and not just making it less likely? Until the root cause for this is analyzed and fixed, I can't really trust the ESP in a production device when it comes to BT. |
V4.4.3 has the same problem. |
v4.4.4 (via the latest Arduino framework) has the same problem. My gap between the interval and window is quite wide as per the suggestion by @Zimbu98 - unfortunately this had no effect. |
Brief
I have been having a problem with the Bluedroid BT Controller Scanning function for a few weeks now, and after trying many different things, I am stuck and am not sure what else I can try.
The crux of the issue is that the BLE Scanning feature will work for a large amount of time - up to 3 days, then just fail silently, with the whole BT controller seemingly shutting down.
Problem Description
I have a BLE Scanning app that is working well for the most part. It spends most of its time performing an active scan for other BLE devices that are advertising a service UUID and some custom manufacturer data. It receives an advertisement from a sensor around every 1 second, but I can have anywhere from 1-10 sensors within range at any one time.
After a completely random period of time, sometimes 20 minutes, sometimes 3 days. The App will stop receiving ESP_GAP_SEARCH_INQ_RES_EVT events from the bt layer, even though it should still be receiving advertisements form multiple devices, with no indication from any underlying BT Controller debugging that anything has happened. This happens no matter how many sensors I have within range of the ESP, advertising the device, it even happens when I have no sensors advertising, and the general BVLE background advertisements are relatively low.
The free heap memory of the app stays the same (~140kB free memory at any one time), so I can rule out a memory leak on the app side, and the rest of the application keeps running normally, albeit with more computation time from the RTOS (indicated by a loop counter that increases when this error happens), So clearly some of the BT Tasks have stopped running.
When the error happens, I can also see that the ESP itself DOES STOP performing Active scanning, as The sensors I use flash an LED whenever they receive a SCAN_REQUEST from the ESP32 Hardware MAC Address, and this stops happening as soon as the error starts.
If I try and recover from the error, by issuing a command such as
esp_ble_gap_start_scanning()
- which responds ESP_OK, I get a HCI timeout error printed:BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c
.At the moment, trying to perform a bt command after the error, and getting this response, is the only indication from the application that something has gone wrong.
I am not using any WiFi functions, so to reduce memory footprint and file size, I have changed the linker script to only include the following libraries in the component.mk of esp32:
core rtc phy
instead of the usual:core rtc net80211 pp wpa smartconfig coexist wps wpa2 espnow phy mesh
Coredump
coredump
This is a coredump taken about 20 minutes after the error occured. I forced this core dump to log by deliberately throwing an IntegerDivideByZero Exception in another task. My hope here is that it saved the task state of the BT tasks, which your team can use internally to see the task state. If you require My APP ELF I can provide this by email.
Debug Log
This is a log showing the lack of errors I receive when the error happens. As you can see, the application was running for 2.5 days before the error occured. The 'BMS' TAG is my application, and the 'Scanning started' and 'scanning stopped' logs are when my app receives the
ESP_GAP_SEARCH_INQ_CMPL_EVT
andESP_GAP_BLE_SCAN_START_COMPLETE_EVT
events respectively. In this application, I start a esp_ble_gap_start_scanning operation of 30 seconds, and when I receive a ESP_GAP_SEARCH_INQ_CMPL_EVT event, i set a flag to restart the esp_ble_gap_start_scanning of 30 seconds, in a cycle. Although as discussed later, I have tried changing this interval to anywhere from 30 seconds to 5 minutes, and I have also tried setting the interval to 0 for unlimited, so I only call the start_scan once.In this instance, my pplication received the ESP_GAP_SEARCH_INQ_CMPL_EVT event, so set a flag internally to call esp_ble_gap_start_scanning(30) again, which responded with ESP_OK, but I never received the ESP_GAP_BLE_SCAN_START_COMPLETE_EVT, and about 8 seconds later, I see an error log of command timeout.
sdkconfig
sdkconfig
Scanning Configuration Used
These are the configurations currently in use, but as you'll see below I have tried many different
Changes Attempted
Below is a list of sdkconfig changes of application setup/operation changes that I have tried, with no success , the same thing occurs.
If there is anything else I should try, please let me know.
Apologies for the large Github issue, this error has been troubling me for some time and I would like to know what I can try next. Thank you.
Environment
The text was updated successfully, but these errors were encountered: