Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enable BLE presence detection #18

Closed
mmakaay opened this issue Apr 21, 2021 · 9 comments
Closed

[FEATURE] Enable BLE presence detection #18

mmakaay opened this issue Apr 21, 2021 · 9 comments

Comments

@mmakaay
Copy link
Owner

mmakaay commented Apr 21, 2021

The ESP32-WROOM-32D that is in the lamp does support BLE.
It would be a cool feature to make that work for presence detection.

I already tried to implement this, but have failed so far.
One issue was that the platform package repo that we use did not yet contain the correct library files. That was fixed.
A bigger problem, unsolved as of yet, is that the ble_32_tracker module makes the device very unstable. The device performs a lot of spontaneous reboots. The console logs show that this is because the device loop is taking up too much time (according to the WDT).

Possibly, this is an issue for this specific chip, given that this is a single core chip and not a dual core one.
Changes are likely to be done in the loop() function of the ble_32_tracker module.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 22, 2021

Compiling with ble_32_tracker adds a noticable chunk to the firmware output.

Flash: [========  ]  83.9% (used 1540470 bytes from 1835008 bytes)

versus:

Flash: [=====     ]  50.3% (used 922770 bytes from 1835008 bytes)

So, 33.6% of the flash memory is taken by the BLE code.
Maybe this has to do with the autoloading of xiaomi_ble and ruuvi_ble.
Without looking further into this, the dependencies might be in the wrong order here.
Those modules depend on the ble tracker, not the other way around
This is confirmed by the fact that these two autoloaded modules autoload esp32_ble_tracker themselves.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 23, 2021

I started investigating the esp32_ble_tracker code, to see what I could
do to fix its functionality on my device. Below are my scribbles for this.

Underlying API

esp-idf API used: GAP BLE

Sempaphore use

The code uses 2 semaphores for code syncing.

  • SCAN RESULT
  • SCAN END
    FIX? the scan end lock actually indicates that a scan is busy. Its naming could
    reflect that and it would improve readability.

Function: register_listener(ESPBTDeviceListener)

  • sets ESP32BLETracker as parent for the device listener
  • and adds the listener to a list of listeners

Function: setup()

  • setup the BLE tracker
  • start the first scan with start_scan(true)

Function: start_scan(bool first), first = true when called from setup

  • take SCAN END lock, non-blocking
    • lock busy? return!
  • not the first call?
    • call on_scan_end() on every listener
      The actual functionality could be called: "right before the next scan".
      It gives the listeners a chance to publish state.
  • clear the list of already discovered devices. A bit of a "seen" list per scan round.
  • setup scan options, e.g. scan windows = 30ms, scan interval = 320ms
  • run esp_ble_gap_start_scanning(time) = 5 min
    FIX? no check for ESP_OK return value
  • setup a timeout in the component scheduler for 2 x scanning time (= 10 min)
    When this timer is hit, then the device will be rebooted to restore the BLE stack.

Function: loop()

  • take SCAN END lock, non-blocking
    • if the lock can be taken, a scan was apparently completed
    • give back the SCAN END lock
    • and start a new scan using start_scan(false)
      FIX? this call uses the global_esp32_ble_tracker variable, not this. Why is that?
  • take SCAN RESULT lock, wait max (5L / portTICK_PERIOD_MS) (= 5 ms)
  • lock taken?
    • get the current SCAN RESULT INDEX (so this index is shared and protected by mutex)
    • give back SCAN RESULT lock
    • Is the index > 16?
      • warn about too many events to process
    • loop i = 0 .. current index
      • parse scan result for result[i]
      • found = listener exists for which parse_device(result[i]) returns true
      • not found?
        • print_bt_device_info(result[i])
          This function uses the list of already discovered devices.
          When the device is not in the list, it's data are printed and
          the device is added to the list. This prevents printing its
          data multiple time when multiple scans have been collected.
    • take SCAN RESULT lock, wait max 10 ms
    • lock taken?
      • set current SCAN RESULT INDEX = 0
        So if the lock failed, then new device scans were being added during
        the lock taking, and the list grows. Next loop, some more devices
        will be processed in the above loop. Nothing too bad I think.
      • give back SCAN RESULT lock
  • Received callback from API layer about wrong scan parameter?
    • Log the failure.
  • Received callback from API layer about a completed scan, but with error result?
    • Log the failure.

Function: gap_scan_result() - called by GAP API on new result

  • event == ESP_GAP_SEARCH_INQ_RES_EVT? (new peer device)
    • take SCAN RESULT Lock, non-blocking
    • lock taken?
      • less than 16 results so far? <-- so this is where the limit of 16 is implemented
        • store new result
        • increment scan result index
      • give back SCAN RESULT lock
  • event == ESP_GAP_SEARCH_INQ_CMPL_EVT? (scan complete)
    • give back SCAN END lock

That's all folks

The rest of the code is all about parsing scan results.
I have no need for going over that code right now.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 23, 2021

I've restructured the code for loop() and with that change I do get scanned devices. So far so good.
However, I'm not able to use OTA flashing at this point.
The upload of the new firmware stalls and fails very early in the process.

On the console, I find the following:

[14:09:54][D][ota:072]: Starting OTA Update from 192.168.x.x...
[14:09:59]E (154748) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[14:09:59]E (154748) task_wdt:  - IDLE0 (CPU 0)
[14:09:59]E (154748) task_wdt: Tasks currently running:
[14:09:59]E (154748) task_wdt: CPU 0: loopTask
[14:09:59]E (154748) task_wdt: Aborting.
[14:09:59]abort() was called at PC 0x401bfb84 on core 0
[14:09:59]
[14:09:59]ELF file SHA256: 0000000000000000
[14:09:59]
[14:09:59]Backtrace: 0x4008f890:0x3ffbff30 0x4008faf5:0x3ffbff50 0x401bfb84:0x3ffbff70 0x4008e0fa:0x3ffbff90 0x4000bfed:0x3ffcc270 0x40091753:0x3ffcc280 0x400908ab:0x3ffcc2a0 0x401b24d2:0x3ffcc2e0 0x401a1ac5:0x3ffcc300 0x401a1b32:0x3ffcc330 0x400e90b2:0x3ffcc350 0x400dba56:0x3ffcc380 0x400dbc72:0x3ffcc3b0 0x400dc272:0x3ffcc890 0x401e5c4d:0x3ffcc8b0 0x401e5ce1:0x3ffcc8d0 0x400e0401:0x3ffcc8f0 0x400e3dfa:0x3ffcc920 0x400ee310:0x3ffcc940 0x40090b2a:0x3ffcc960
[14:09:59]
[14:09:59]Rebooting...

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 23, 2021

The OTA updates are simply conflicting with the tracker, because of the shared access to the single physical radio, and lower level abstractions do not have any handling in place for handling this.
I found an issue about this exact issue on GitHub btw: esphome/issues#1098

I'm now trying to come with a way to fix this. My envisioned way of handling this is to add some triggers and actions, so we can add something like this in the YAML configuration:

ota:
  on_begin:
    then:
      - esp32_ble_tracker.suspend: my_ble_tracker
  on_error:
    then:
      - esp32_ble_tracker.resume: my_ble_tracker

esp32_ble_tracker:
  id: my_ble_tracker

Having this automatically taken care of would be very nice too, but it doesn't really match the idea of having all these loosely coupled components, that are glued together in the YAML config.

Another way of work would be to introduce somethink like an esp32_radio component, which all components that require the radio can use to get an exclusive lock on the radio hardware. This would implement an advisory locking scheme.

But before going there, I first will investigate if the idea works with the simpler setup as mentioned above.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 23, 2021

I filed a pull request for extending the OTA component with some automation triggers:
esphome/esphome#1714
When these get accepted, I can use them to disable the bluetooth connection at appropriate times.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 25, 2021

I'm working on disabling the bluetooth during OTA upgrades, but so far I've had no luck with this.
Here's a log from a recent attempt:

[01:43:43][D][esp32_ble_tracker:643]: Found device 7C:2A:C1:91:FF:B1 RSSI=-94
[01:43:43][D][esp32_ble_tracker:664]:   Address Type: RANDOM
[01:43:56][D][ota:072]: Starting OTA Update from 192.168.100.11...
[01:43:56][D][main:218]: Disable BLE tracker to prevent conflicts with the OTA upgrade
[01:43:56][D][esp32_ble_tracker:187]: ble teardown: esp_bluedroid_disable()
[01:43:57][D][esp32_ble_tracker:195]: ble teardown: esp_bluedroid_deinit()
[01:43:57][D][esp32_ble_tracker:203]: ble teardown: esp_bt_controller_disable()
[01:43:57][D][esp32_ble_tracker:211]: ble teardown: esp_bt_controller_deinit()
[01:43:57][D][esp32_ble_tracker:219]: ble teardown: esp_bt_controller_mem_release()
[01:43:57][D][esp32_ble_tracker:227]: ble teardown: btStop()
[01:43:57][D][ota:077]: TODO read magic bytes
[01:43:57][D][ota:091]: TODO send OK and version
[01:43:57][D][ota:096]: TODO read features
[01:44:02]E (45309) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[01:44:02]E (45309) task_wdt:  - IDLE0 (CPU 0)
[01:44:02]E (45309) task_wdt: Tasks currently running:
[01:44:02]E (45309) task_wdt: CPU 0: loopTask
[01:44:02]E (45309) task_wdt: Aborting.
[01:44:02]abort() was called at PC 0x401c0b18 on core 0
[01:44:02]
[01:44:02]ELF file SHA256: 0000000000000000
[01:44:02]
[01:44:02]Backtrace: 0x4008f890:0x3ffbff30 0x4008faf5:0x3ffbff50 0x401c0b18:0x3ffbff70 0x4008e0fa:0x3ffbff90 0x401a01ad:0x3ffcc2b0 0x401a2a51:0x3ffcc330 0x401a2ac6:0x3ffcc360 0x400e9dae:0x3ffcc380 0x400dc09e:0x3ffcc3b0 0x400dc2fa:0x3ffcc3e0 0x400dc9ee:0x3ffcc8d0 0x401e6be1:0x3ffcc8f0 0x401e6c75:0x3ffcc910 0x400e0a39:0x3ffcc930 0x400e44ea:0x3ffcc960 0x400ef098:0x3ffcc980 0x40090b2a:0x3ffcc9a0
[01:44:02]
[01:44:02]Rebooting...

I'm trying really hard to shutdown the bluetooth, but I do get wdt reboots. I added some more debugging output in the ota component, and from that you can see in the above log that the 'read features' step fails. This is part of the OTA processing code that uses the wifi.

Maybe the bluetooth handling is still competing for the radio? Or maybe disabling the bluetooth has masked some incoming wifi packets? In the latter case, things might get fishy, because we can't really predict when an OTA upgrade is going to happen. We can only act upon one as soon as one has just started.

One thing I found in a discussion about combining wifi and ble:

In last release v4.2 Ble/WiFi coexistence is working, but it depends on order of starting ... If I first start Ble and then WiFi then only WiFi works, but if first I start WiFi and then Ble - both Ble and WiFI works, it is strange issue

Maybe this is something I can persue.

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 25, 2021

Found another issue
In this one, Otto tells that he didn't find a soluton for these issues. That's not very hopeful :-(

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 25, 2021

Looked at the esp-idf 4.2 release. That has some changelog items on wifi coexistence.
You can find this here: https://github.com/espressif/esp-idf/releases/tag/v4.2

Also found a faq entry

Does ESP32 support coexistence between Bluetooth® and Wi-Fi?

Yes, but time-sharing control is required for ESP32’s coexistence between Wi-Fi and Bluetooth. Please go to menuconfig to enable the Wi-Fi/Bluetooth coexistence, shown as follows:

 menuconfig -> Component config -> Wi-Fi -> Software controls WiFi/Bluetooth coexistence (Enable)

@mmakaay
Copy link
Owner Author

mmakaay commented Apr 28, 2021

Another interesting thing: switching to nimble
esphome/feature-requests#810

All in all, my conclusion is that doing BLE tracking currently is not an option. When using BLE, the WiFi stack is broken for OTA. Disabling the Bluetooth doesn't help here. Future releases of esp-idf (v4 branch), might bring some goodness in this respect, along with making use of the Nimble component. This is likely not a short term solution, because it requires changes in multiple stacks, and we'll have to wait for a stable v4 release of esp-idf.

Closing this ticket for now.

@mmakaay mmakaay closed this as completed Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant