Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BT Controller - Stops Scanning or responding after random amount of time (IDFGH-1781) #4001

Open
0neblock opened this issue Sep 3, 2019 · 61 comments

Comments

@0neblock
Copy link
Contributor

0neblock commented Sep 3, 2019

Brief

I have been having a problem with the Bluedroid BT Controller Scanning function for a few weeks now, and after trying many different things, I am stuck and am not sure what else I can try.
The crux of the issue is that the BLE Scanning feature will work for a large amount of time - up to 3 days, then just fail silently, with the whole BT controller seemingly shutting down.

Problem Description

I have a BLE Scanning app that is working well for the most part. It spends most of its time performing an active scan for other BLE devices that are advertising a service UUID and some custom manufacturer data. It receives an advertisement from a sensor around every 1 second, but I can have anywhere from 1-10 sensors within range at any one time.

After a completely random period of time, sometimes 20 minutes, sometimes 3 days. The App will stop receiving ESP_GAP_SEARCH_INQ_RES_EVT events from the bt layer, even though it should still be receiving advertisements form multiple devices, with no indication from any underlying BT Controller debugging that anything has happened. This happens no matter how many sensors I have within range of the ESP, advertising the device, it even happens when I have no sensors advertising, and the general BVLE background advertisements are relatively low.

The free heap memory of the app stays the same (~140kB free memory at any one time), so I can rule out a memory leak on the app side, and the rest of the application keeps running normally, albeit with more computation time from the RTOS (indicated by a loop counter that increases when this error happens), So clearly some of the BT Tasks have stopped running.

When the error happens, I can also see that the ESP itself DOES STOP performing Active scanning, as The sensors I use flash an LED whenever they receive a SCAN_REQUEST from the ESP32 Hardware MAC Address, and this stops happening as soon as the error starts.

If I try and recover from the error, by issuing a command such as esp_ble_gap_start_scanning() - which responds ESP_OK, I get a HCI timeout error printed: BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c.
At the moment, trying to perform a bt command after the error, and getting this response, is the only indication from the application that something has gone wrong.

I am not using any WiFi functions, so to reduce memory footprint and file size, I have changed the linker script to only include the following libraries in the component.mk of esp32:
core rtc phy instead of the usual: core rtc net80211 pp wpa smartconfig coexist wps wpa2 espnow phy mesh

Coredump

coredump
This is a coredump taken about 20 minutes after the error occured. I forced this core dump to log by deliberately throwing an IntegerDivideByZero Exception in another task. My hope here is that it saved the task state of the BT tasks, which your team can use internally to see the task state. If you require My APP ELF I can provide this by email.

Debug Log

This is a log showing the lack of errors I receive when the error happens. As you can see, the application was running for 2.5 days before the error occured. The 'BMS' TAG is my application, and the 'Scanning started' and 'scanning stopped' logs are when my app receives the ESP_GAP_SEARCH_INQ_CMPL_EVT and ESP_GAP_BLE_SCAN_START_COMPLETE_EVT events respectively. In this application, I start a esp_ble_gap_start_scanning operation of 30 seconds, and when I receive a ESP_GAP_SEARCH_INQ_CMPL_EVT event, i set a flag to restart the esp_ble_gap_start_scanning of 30 seconds, in a cycle. Although as discussed later, I have tried changing this interval to anywhere from 30 seconds to 5 minutes, and I have also tried setting the interval to 0 for unlimited, so I only call the start_scan once.
In this instance, my pplication received the ESP_GAP_SEARCH_INQ_CMPL_EVT event, so set a flag internally to call esp_ble_gap_start_scanning(30) again, which responded with ESP_OK, but I never received the ESP_GAP_BLE_SCAN_START_COMPLETE_EVT, and about 8 seconds later, I see an error log of command timeout.

I (210356607) BMS[scanning]: Scanning Stopped
I (210356607) BMS[scanning]: Scanning started
I (210363327) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 21, lps: 248
I (210373357) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 20, lps: 249
I (210383367) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 19, lps: 247
I (210386607) BMS[scanning]: Scanning Stopped
I (210386617) BMS[scanning]: Scanning started
I (210393387) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 21, lps: 248
I (210403407) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 19, lps: 248
I (210413457) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451316, used: 202360, attempted: 981230 | pps: 19, lps: 249
I (210416617) BMS[scanning]: Scanning Stopped
I (210423487) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451220, used: 202452, attempted: 981230 | pps: 17, lps: 250
E (210424617) BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c
I (210433497) main: HEAP - free: 140560, largest_block: 98108 | PSRAM - free: 3451220, used: 202452, attempted: 981230 | pps: 25, lps: 250

sdkconfig

sdkconfig

Scanning Configuration Used

These are the configurations currently in use, but as you'll see below I have tried many different

static esp_ble_scan_params_t logging_ble_scan_params = {
    .scan_type              = BLE_SCAN_TYPE_ACTIVE,
    .own_addr_type          = BLE_ADDR_TYPE_PUBLIC,
    .scan_filter_policy     = BLE_SCAN_FILTER_ALLOW_ALL, 
    .scan_interval          = 0x100,
    .scan_window            = 0x100,
    .scan_duplicate         = BLE_SCAN_DUPLICATE_DISABLE
};

Changes Attempted

Below is a list of sdkconfig changes of application setup/operation changes that I have tried, with no success , the same thing occurs.

  • Unlimited scanning timeout - when done this way, I never even get an indication of hci_layer_timeout, because i never try to run another esp_ble_gap_start_scanning call, so the bt controller will fail silently
  • Turning CONFIG_BLUEDROID_MEM_DEBUG ON - There is no debugging information around the time the error happens
  • CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY ON/OFF
  • CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST ON/OFF
  • CONFIG_BLE_HOST_QUEUE_CONGESTION_CHECK ON/OFF
  • Scanning interval/window change to many different values: 0x100/0x50, 0x200/0x50, 0x1000/0x100, and a few more.
  • 180 second scanning timeout - This seems to make the error happen quicker, although that may be placebo as I only tested a few times.
  • Increase CONFIG_BTC_TASK_STACK_SIZE and CONFIG_BTU_TASK_STACK_SIZE
  • Not calling esp_bt_controller_mem_release(ESP_BT_MODE_CLASSIC_BT) (which I usually call before enabling anything.
  • Enabling/disbling WiFi coexist (I am not using WiFi function at all

If there is anything else I should try, please let me know.

Apologies for the large Github issue, this error has been troubling me for some time and I would like to know what I can try next. Thank you.

Environment

Key Value
Development Kit Custom Board
Module or chip used ESP32-WROVER-32
IDF version 91f29be - tracking release/v3.3
Build System Make
Compiler version crosstool-ng-1.22.0-80-g6c4433a
Operating System macOS
Power Supply external 3.3V
@github-actions github-actions bot changed the title BT Controller - Stops Scanning or responding after random amount of time BT Controller - Stops Scanning or responding after random amount of time (IDFGH-1781) Sep 3, 2019
@0neblock
Copy link
Contributor Author

0neblock commented Sep 8, 2019

One thing to note which I just thought of, I am compiling with Arduino-ESP32 as a component.
I am not sure if that would affect anything as I am using esp-idf for all the bluetooth-related functionality, as i found the BLE library in Arduino a bit too resource-intensive.

@0neblock
Copy link
Contributor Author

Hey @igrr (Don't know who else to @), woukd someone be able to have a look at this, this issue is still ongoing for me.

@gengyuchao
Copy link
Collaborator

Hello, sorry for the late reply. Do you added a lot of printing in esp_gap_cb? I recommend that you turn off some unnecessary printing, such as remove the print of device name and address from when scanning. This is not the final solution, but try to see if the issue is resolved.Thank you.

@0neblock
Copy link
Contributor Author

Hi @gengyuchao, thanks for your reply.
I do no printing in esp_gap_cb, and i also keep my processing time to a minimum, by adding most events to a queue via xQueueSendFromISR and processing them in another thread.

@gengyuchao
Copy link
Collaborator

According to your description, I have not been able to reproduce the problem. Can you give me a sample code of your problem? So I can try to track this problem, thank you.

@0neblock
Copy link
Contributor Author

Hi @gengyuchao,
The problem can take anywhere from 20 minutes to 3 days to happen, and my app is not able to be shared in its current state, however i can share the elf to analyse the coredump if you like.

I will try to build a smaller program that can reproduce the issue.

It seems that the error is that the pre-compiled bt lib is the component that is crashing.
Is there anyway i can diagnose the current state of the esp32 bt lib when i detect an error?

@0neblock
Copy link
Contributor Author

Seems to be related to #4196

@0neblock
Copy link
Contributor Author

Hi @prasad-alatkar is there any update on this or Issue #4196 ?
We are about to enter production and this is still an ongoing issue for us, we are having to call esp_restart every time the BT Controller fails, which is not ideal.

@Sushant-Espressif
Copy link
Contributor

@one

Hi @prasad-alatkar is there any update on this or Issue #4196 ?
We are about to enter production and this is still an ongoing issue for us, we are having to call esp_restart every time the BT Controller fails, which is not ideal.

We are working on BT controller firmware fix for the issue. In BLE scan scenario, couple of issues are observed where BT controller reboots with controller level malfunction error code OR just stops responding without any known error. Issue is related to the handling of scan reports in BT controller when there are large number of scan reports in short frame of time. We will release the further details and updated bt lib as soon as possible.
Thanks.

@0neblock
Copy link
Contributor Author

Thanks for your prompt response, good to know you have identified the issue source. Look forward to applying a fix!

@0neblock
Copy link
Contributor Author

Hi @csushantk
I noticed that a few commits were pushed to the Github repo recently, can you please confirm if your fix for this issue was included?

I am tracking release/v3.3 for my IDF toolchain.
I am testing this latest release now.
Thanks.

@pschlang
Copy link

pschlang commented Jan 13, 2020

Observing the same issue with v4.0-beta2. Any news here @csushantk ? This is really urgent for us.

@plebed
Copy link

plebed commented Feb 14, 2020

The same issue in v3.3.1.
Like a ugly workaround, we restart the chip if scan_result->scan_rst.num_resps (in ESP_GAP_SEARCH_INQ_CMPL_EVT) not changed during five scans.

It is strange that scan_result->scan_rst.num_resps is not reset between scans.

@0neblock 0neblock reopened this Apr 15, 2020
@0neblock
Copy link
Contributor Author

Any updates from anyone on this bug? Still an issue with the latest release/v3.3 branch

@plebed
Copy link

plebed commented Jun 22, 2020

I update out firmware to v4.1 (this error did not reproduce in v4.1).

@GianlucaLoi
Copy link

Hello @Sushant-Espressif ,

I'm having the same issue of @0neblock with the same "fw environment". I'm using ESP-IDF v4.3-dev-907-g6c17e3a64.
Any update about this problem?

Regards,

Gianluca.

@Sushant-Espressif
Copy link
Contributor

@GianlucaLoi @0neblock In our local setup, with Bluedroid Host, we are not able to reproduce the issue of "BLE stops scanning randomly" (tested for one week continuously).
Can you please provide more details to reproduce this issue?

  1. Are there excessive prints in your application?
  2. Is application task set to higher priority and hogging the CPU?
  3. Is it possible to share any other details about the application so that we can quickly reproduce the issue?

@GianlucaLoi
Copy link

@GianlucaLoi @0neblock In our local setup, with Bluedroid Host, we are not able to reproduce the issue of "BLE stops scanning randomly" (tested for one week continuously).
Can you please provide more details to reproduce this issue?

  1. Are there excessive prints in your application?
  2. Is application task set to higher priority and hogging the CPU?
  3. Is it possible to share any other details about the application so that we can quickly reproduce the issue?

Hello @Sushant-Espressif ,

Thanks for the response.

> 1. Are there excessive prints in your application?
I have very few prints when my fw is ongoing. At point 3 you can see an example.

> 2. Is application task set to higher priority and hogging the CPU?
Could you be more specific?

> 3. Is it possible to share any other details about the application so that we can quickly reproduce the issue?
In my application I have WiFi (STA mode), MQTT (no SSL) and BLE.
What my FW does is:

  1. At the startup it waits for a WiFi Smart Configuration,
  2. Once is connect to the WiFi, MQTT and BLE task will be initialized and a periodic active scan starts (period about 5seconds) and a task (TASK1) is created to manage the scan data of ble to send the data by MQTT
  3. If there is any data that needs to be publish, MQTT publish function is called by the TASK1
  4. The FW so remains scanning and sending with 5 second period. An example of prints is shown below
I (1231732) TASK1: [APP] Free memory: 4220620 bytes
I (1231746) BLE: Scan started
I (1231833) MQTT: MQTT_EVENT_DATA
I (1236748) BLE: Scan restarting...
I (1236748) TASK1: [APP] Free memory: 4222208 bytes
I (1236751) BLE: SCAN PARAM SET COMPLETE
I (1236763) BLE: Scan started
I (1241765) BLE: Scan restarting...
I (1241768) BLE: SCAN PARAM SET COMPLETE
I (1241773) MQTT: sent publish successful, msg_id=0
I (1241773) TASK1: [APP] Free memory: 4220496 bytes
I (1241815) MQTT: MQTT_EVENT_DATA
E (1249770) BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c

If you need more information, I will be glad to give them.

Regards,

Gianluca.

@GianlucaLoi
Copy link

GianlucaLoi commented Sep 17, 2020

Hello @Sushant-Espressif ,

Do you have any update about this problem?

EDIT:
@igrr do you have any solution about this problem? I pulled the ESP-IDF v4.3-dev-1197-g8bc19ba89 where there are a lot of bufixs on bluetooth but I still have this problem.

Regards,

Gianluca.

@chhajedji
Copy link
Contributor

Hi @GianlucaLoi
Can you tell us that if you are using advertising report flow control and scan duplicate filtering options?
These options can be found in sdkconfig under name CONFIG_BTDM_BLE_ADV_REPORT_FLOW_CTRL_SUPP and CONFIG_BTDM_BLE_SCAN_DUPL.

@GianlucaLoi
Copy link

Hello @chhajedji
They are both set as YES.
Regards,
Gianluca.

@chhajedji
Copy link
Contributor

Hi @GianlucaLoi
We tried to reproduce your issue, but we didn't get any success. Can you follow below steps and share the logs with us.

  1. Apply this patch present in the attached tarball and also replace the bt lib at $IDF_PATH/components/bt/controller/lib/libbtdm_app.a by the lib in tarball.
  2. Run your program and when crash occurs, store all the logs in a file.
    This tarball contains a patch for esp-idf and a bt lib. These are not official versions, but just to get the logs of exact scenario.
    gh_timeout.tar.gz

@GianlucaLoi
Copy link

GianlucaLoi commented Oct 20, 2020

Hello @chhajedji

I performed your steps but, in the Linker section, I obtain these errors from the libbtdm_app.a:

  • undefined reference to `ke_task_env'
  • undefined reference to `ke_handler_search'
  • undefined reference to `ld_pscan_frm_cbk'

How can I solve them?
Regards,

Gianluca.

@chhajedji
Copy link
Contributor

Can you try doing a git fetch and git submodule update --init --recursive.
Since you are using current master branch and it is getting updated frequently, You will have to do a git checkout 0289d1cc81c210b719f28c65f113c45f9afd2c7b as I have created given patch on this commit.

Also note that you will have to first update submodules (git submodule update --init --recursive) then replace given library. And similarly git fetch and git checkout 0289d1cc81c210b719f28c65f113c45f9afd2c7b and then apply given patch.

@GianlucaLoi
Copy link

GianlucaLoi commented Oct 22, 2020

Hello @chhajedji

I'm still doing your test because I have to adapt some function to your repository to work well.
In the meanwhile I tested the ESP-IDF v4.1 and I see this problem also with that version.

One more information to understand the problem (maybe):
Because I need an active scan, every 5 seconds I re-set the scan params. When this phase is complete I restart the scanning.

`
static esp_ble_scan_params_t ble_scan_params = {
		.scan_type              = BLE_SCAN_TYPE_ACTIVE,
		.own_addr_type          = BLE_ADDR_TYPE_PUBLIC,
		.scan_filter_policy     = BLE_SCAN_FILTER_ALLOW_ALL,
		.scan_interval          = 0x50,
		.scan_window            = 0x30
};

...
case ESP_GAP_BLE_SCAN_PARAM_SET_COMPLETE_EVT:                	
		if(param->scan_param_cmpl.status == ESP_BT_STATUS_SUCCESS)
		{
			ESP_LOGI(BLE_LOG_TAG,"SCAN PARAM SET COMPLETE");
			esp_ble_gap_start_scanning(5);
		}
		else
		{
			ESP_LOGE(BLE_LOG_TAG,"SCAN PARAM SET NOT COMPLETE");
		}
		break;
	
...
case ESP_GAP_SEARCH_INQ_CMPL_EVT:
                ...
		esp_ble_gap_set_scan_params(&ble_scan_params);
		break;
...
`

Regards,

Gianluca.

@chhajedji
Copy link
Contributor

Hi @GianlucaLoi

I am also testing with this parameters and see if I can reproduce it. In case you get the crash, please share the logs.

@chhajedji
Copy link
Contributor

Hi @vbvchauthmal,

I will trying to recreate the issue. Although I tried same earlier for @GianlucaLoi and before I could recreate it, changing some parameters helped for them. Please share some more details about your failing scenario through which I can reproduce it.

  • Which commit id are you using for your application?
  • What exactly are you doing in your application (scanning/advertising/both, anything else also) and what are the parameters for the same (scan params, adv params)?
  • Which idf example will most closely resemble your application and what are the changes in it to emulate your scenario. Or if you can share your application, that would be better.
  • How many devices are there in the vicinity and what are they doing (how many advertisers and scanners nearby)?
  • How long does this issue take to occur for your case and does this time vary?

Also please provide any other information you feel which could be helpful to recreate or solve this issue.

@vbvchauthmal
Copy link

Hi @chhajedji

Hi @vbvchauthmal,

I will trying to recreate the issue. Although I tried same earlier for @GianlucaLoi and before I could recreate it, changing some parameters helped for them. Please share some more details about your failing scenario through which I can reproduce it.

* Which commit id are you using for your application?

I am using ESP-IDF version v3.3.5 (commit id : 03810c4) after your suggestion, earlier I was using v3.3.1 (commit-id : 143d26a )

* What exactly are you doing in your application (scanning/advertising/both, anything else also) and what are the parameters for the same (scan params, adv params)?

I am doing BLE scanning and below are the scan parameters set in my source code :

ble_scan_params = {
    .scan_type              = BLE_SCAN_TYPE_ACTIVE,
    .own_addr_type          = BLE_ADDR_TYPE_PUBLIC,
    .scan_filter_policy     = BLE_SCAN_FILTER_ALLOW_ALL,
    .scan_interval          = 0xF0,                         // Interval between the start of two consecutive scan windows. Dec(0xF0) = 240 x 0.625 = 150ms
    .scan_window            = 0xF0                          // The duration in which the Link Layer scans on one channel. Dec(0xF0) = 240 x 0.625 = 150ms
};
* Which idf example will most closely resemble your application and what are the changes in it to emulate your scenario. Or if you can share your application, that would be better.

The idf example closely resemble with my application is gattc_multi_connect. This application extended with setting of BLE GAP security parameters and supporting interfacing of five BLE peripheral devices. At a time only one BLE peripheral will be allowed to connect when its broadcasting is captured to get sensor readings i.e. through BLE notifications/indications.

* How many devices are there in the vicinity and what are they doing (how many advertisers and scanners nearby)?

Till now we have deployed 6000 of our ESP32 based platform with this developed firmware and all must have different numbers of BLE devices in vicinity which can be advertisers or scanners. Most of these deployed showing this issue.

* How long does this issue take to occur for your case and does this time vary?

Its occurring at random sometimes it will arise after week or sometimes it will take few minutes or hours.

Also please provide any other information you feel which could be helpful to recreate or solve this issue.

Query :

  • Is there any API (for checking the Bluetooth radio / BLE scanning status) which I can execute for knowing the BLE scan is stopped or hanged? I observed sometimes this issue reproduced without any BT_HCI errors in log, so wanted to get status of underlying BLE scan so I can reset bluetooth instead of executing esp_restart().
  • What is the proper sequence to reset bluetooth ?

@Rokachy
Copy link

Rokachy commented Jun 10, 2021

We are facing the same issue
We are working with 4.2.1 release, we tried also v4.3-beta3 tag and also v4.4-dev tag and the issue is also there.
When this issue occur, BT radio status looks good no error reported so software cannot detect this...
We tried many option to reset BT only, but scan not operational after that.
Only esp_restart() recover but we cannot use it since in our app we cannot loose BT radio more than 1 minute ( it not applicable for us to run esp_restart() every minute!!).
This is issue will kill our project....

@TianaESP
Copy link

@Rokachy Can you please try with the latest v4.3 release? We did a test and did not reproduce the issue. We are still testing with mass devices on the same.

@Rokachy
Copy link

Rokachy commented Jun 15, 2021

Yes, I will and let you know for results.
Is a random issue, it can appear from few hours to few days....
Is there anything that I can read / get status from the device to help debug it?
Do you want me to open/enable other than logs?

Yehuda

@TianaESP
Copy link

@Rokachy Yes, it is a random issue. Am afraid no need to enable anything at the moment, the best would be packets capture. Please try with latest v4.3 first. Thanks.

@Rokachy
Copy link

Rokachy commented Jun 22, 2021

Its runs with v4.3 release for a few days, no issue so far -:)
We will continue to run it for few days more.
What have been fixed at the SDK v4.3?

@0neblock
Copy link
Contributor Author

@TianaESP Any chances of this (possible) fix being backported to v3.3?

@TianaESP
Copy link

@0neblock The fixes were backported to v3.3, v4.0, v4.1, v4.2. Please try the latest v3.3.
Thanks.

@TianaESP
Copy link

@Rokachy We fixed bugs in modem sleep that we suspected were contributing to the problem. Please let us know if the issue happens again. Thanks.

@Rokachy
Copy link

Rokachy commented Jun 28, 2021 via email

@Rokachy
Copy link

Rokachy commented Jul 4, 2021

The fixes were backported to v3.3, v4.0, v4.1, v4.2. Please try the latest v3.3.

is the fix backported to 4.0.3 version too?

@Alvin1Zhang
Copy link
Collaborator

Thanks for sharing updates, the fix has already backported to release/4.0 b89b1ec, thanks.

@MartinTJDK
Copy link

MartinTJDK commented Jul 6, 2021

I have observed an issue, which might relate to this. At least the result is the same: BLE stack stops to work properly...

In my situation, I am scanning for BLE advertisements, and at some point in time, the scan stops. It typically happens when ESP32 is busy (e.g. writing a lot of information to Debug Console).

I enabled "CONFIG_BLE_HOST_QUEUE_CONGESTION_CHECK", which helps a lot, and actually shows, that BTU queue "often" has congestion.

But I also observed (easy to reproduce by changing "BT_QUEUE_CONGEST_SIZE" to 20 in file "bt_common.h"), that when congestion occurs, it actually locks the ble stack completely.
Sometimes it does not recover, and I believe it is caused by "hciH4T" task having higher priority - and a lot event to process.

If I allow "hciH4T", "btuT" and "BTC_TASK" tasks to use same priority, I do not see this lockup.
Perhaps someone from Espressif (@TianaESP) could confirm the issue?

@juanaviladev
Copy link

juanaviladev commented Dec 30, 2021

Hi! Any update about this? I noticed this same issue when using BLE + Classic (v4.4-beta1). Thanks!

edit: v4.3.1 is also affected

@Rokachy
Copy link

Rokachy commented Jan 2, 2022 via email

@juanaviladev
Copy link

juanaviladev commented Jan 7, 2022

We are using 4.3.1 and we are not seeing the issue

Are you using bluedroid (with BTDM mode)? With only ble scan I can't reproduce it

I'm doing name requests (read_remote_name) and at the same time a ble scan

@Rokachy
Copy link

Rokachy commented Jan 7, 2022 via email

@ktonder
Copy link

ktonder commented Jan 19, 2022

Using the latest Arduino ESP32, v 2.0.2 that uses idf 4.4-alpha1 I also get the error "BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c"

When this happens I also see that the application reconnects to WiFi as I have WiFi / BLE co-existence. BLE scanning for 1 second at the time, then a short break before I start scanning again.

@juanaviladev
Copy link

Using the latest Arduino ESP32, v 2.0.2 that uses idf 4.4-alpha1 I also get the error "BT_HCI: command_timed_out hci layer timeout waiting for response to a command. opcode: 0x200c"

When this happens I also see that the application reconnects to WiFi as I have WiFi / BLE co-existence. BLE scanning for 1 second at the time, then a short break before I start scanning again.

+1, although the changelog includes this bug fix, it hasn't been resolved in the latest version yet. Also, since the bluetooth controller code is not open source, it is not possible to do anything on our side...

@HeFeng1947
Copy link

HeFeng1947 commented Apr 6, 2022

Unfortunately, I seem to have the same problem.
My idf version is v4.2 c327a00
I have enabled ble wifi coexistence.Bluetooth scans all the time and sends it to the cloud server over wifi.

@juanaviladev
Copy link

juanaviladev commented May 6, 2022

v4.4.1, has the same problem. BLE/Classic BT + Wifi still unusable in a prod environment.

@Zimbu98
Copy link

Zimbu98 commented Sep 21, 2022

I had the same problem for months and finally solved it by adjusting the scan interval and scan window. I think the default might be 100/99 which gives very little resources for the ESP to "take out the trash" and "wash the dishes" when you are running a continuous scan like I am.

I changed it to 800/750 and now suddenly the scanner runs and runs without crapping out. Amazing! You can use different numbers I assume, but just be sure to leave some gap between the two numbers so the scanner will take a break and allow the ESP to do some other housekeeping. (I am just speculating that the housekeeping issue is the root of the problem; it might also be that the scanner is overheating or overeating or over somethinging.)

FYI, I had the same problem with the ESP32-WROOM-32U and also with the ESP32-C3. Changing to 800/750 solved it for both of them.

@pschlan
Copy link

pschlan commented Sep 23, 2022

Are you sure it's really solving it and not just making it less likely? Until the root cause for this is analyzed and fixed, I can't really trust the ESP in a production device when it comes to BT.

@freemansu
Copy link

V4.4.3 has the same problem.

@thorrak
Copy link

thorrak commented Mar 6, 2023

v4.4.4 (via the latest Arduino framework) has the same problem.

My gap between the interval and window is quite wide as per the suggestion by @Zimbu98 - unfortunately this had no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests