Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluez GATT attribute stale caching issues #882

Closed
projectgus opened this issue Jul 13, 2022 · 7 comments · Fixed by #902
Closed

Bluez GATT attribute stale caching issues #882

projectgus opened this issue Jul 13, 2022 · 7 comments · Fixed by #902
Labels
Backend: BlueZ Issues and PRs relating to the BlueZ backend bug Something isn't working

Comments

@projectgus
Copy link
Contributor

projectgus commented Jul 13, 2022

  • bleak version: 0.14.3
  • Python version: 3.9.13
  • Operating System: Arch Linux amd64 (with a downgraded Python version)
  • BlueZ version (bluetoothctl -v) in case of Linux: 5.65

Description

I ran into some unexpected behaviour with BLE attribute caching and Bluez. As mentioned in other issues, there are definitely some Bluez bugs here (that I intend to report to them as well). However, I think that in some of the cases Bleak can be made to correctly handle the cache invalidation.

The goal is to be able to update the firmware for a paired/trusted device in a way that changes GATT attributes and have this reflected by Bleak. It should always work when using the Service Changed attribute correctly. As a bonus it may sometimes work when not using the Service Changed attribute correctly(!)

This is very similar to discussion in #625, #628 and probably some other issues. I couldn't find an open issue where it fit so I've started a new one - apologies if there was a better place I should have used.

What I Did

Bleak code is this:

import logging
from bleak import BleakClient

logging.basicConfig(encoding='utf-8', level=logging.DEBUG)
logger = logging.getLogger(__file__)

ADDRESS = 'F4:12:FA:41:8D:B2'

async def connect_and_pair():
    c = BleakClient(ADDRESS)
    await c.connect()
    await c.pair()
    services = await c.get_services()
    for char in services.characteristics.values():
        logger.info(f'SVC {char.service_uuid} CHAR {char}')

asyncio.run(connect_and_pair())

The device is a modified ESP-IDF NimBLE bleprph example running on an ESP32-S3, which exposes the standard Service Changed characteristic and a custom GATT service with a single custom GATT attribute. I don't think the details of the device are relevant, though.

01 - Initial run

Works as expected, Bluez does service discovery and caches the services.

INFO:SVC 00001801-0000-1000-8000-00805f9b34fb CHAR 00002a05-0000-1000-8000-00805f9b34fb (Handle: 7): Service Changed
INFO:SVC 22222222-2222-2222-2222-222222222222 CHAR bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb (Handle: 11): Unknown

See 01-initial.log for output from Bleak, contents of cache after running, and debug output from bluetoothd.

02 - Change characteristic UUID

Change the custom GATT characteristic UUID from all-0xBB to all-0xCC, reflash the device and re-run. Note that the device sends an indication for Service Changed as soon as Bluez subscribes to it, so Bluez is notified that GATT services may have changed.

Nothing else changes, the same UUIDs from 01 are shown (as loaded from cache by Bluez). bluetoothd log shows that it only re-discovers the GATT service IDs, not the characteristic IDs, even though the "Service Changed" indication handle range covers both.

See 02-changed-char-uuid.log for full logs, unchanged contents of on-disk cache, etc.

I believe this is a bug in Bluez not a bug in Bleak. At least according to my reading of the standard it should be re-discovering all GATT IDs not just services. So unless someone knows better, I'll report this one to Bluez.

03 - Change service UUID

Change the custom GATT service UUID from all 0x22 to all 0x33. The characteristic UUID is still 0xCC.

This time, Bluez receives the indication for Service Changed, and it correctly removes the old service and characteristic, re-adds the new service and characteristic, and updates the cache on disk.

Bleak starts up loading the cached GATT entries from Bluez over DBus. I think is expected because Bluez has to finish connecting before it can receive the Service Changed indication and update, this indication can be received at any time after connection is established.

Then Service Changed happens in Bluez, and Bleak receives the InterfacesRemoved dbus notifications to remove the old GATT entries from Bluez. These aren't handled by Bleak.

Then Bleak receives InterfacesAdded to add the new GATT entries, but logs errors about "ERROR:bleak.backends.service:The characteristic '11' is already present in this BleakGATTServiceCollection!" because it hasn't removed the old ones.

All of these steps happen before ServicesResolved goes to True.

As a result, Bleak still sees the old GATT services even though Bluez has told it about the new ones.

03-service-uuid-changed.log has full logs, all the details.

I think this one is a bug in Bleak, and if/when issue 02 is resolved in Bluez then the same handling will be needed for changed characteristics as well (there's no way for Bluez to know if the characteristics have changed at the point Bleak reads the cached objects, it will always need to dynamically update the list after connection establishes).

I'm very happy to work on a patch to handle InterfacesRemoved if you agree it's desirable to try for that.

Bonus bluez bug

As a bonus minor bluez bug, it updates the on-disk cache with the new UUIDs but it continues to initially send the old cached UUIDs and go through the dance of InterfacesRemoved/InterfacesAdded for each new connection!! So the end state ends up correct (according to Bluez), but it goes "the long way around" each time. This keeps up until bluetoothd is restarted. Will report that to bluez as well...

04 - Change service UUID, no Service Changed

This one is an unexpected bonus behaviour from Bluez: even if you don't send a Service Changed notification it still does basically the same behaviour as in step 03 on connect - i.e. it reads the service UUIDs, and if one of them has changed it does the InterfacesRemoved/InterfacesAdded thing to update them.

Same as 03, this is not handled by Bleak though so it still sees the old UUIDs.

04-service-uuid-changed-no-indication.log

@dlech dlech added the Backend: BlueZ Issues and PRs relating to the BlueZ backend label Jul 13, 2022
@dlech
Copy link
Collaborator

dlech commented Jul 13, 2022

I'm very happy to work on a patch to handle InterfacesRemoved if you agree it's desirable to try for that.

FYI, I'm currently working on some scanner changes that add a global BlueZ state manager that will be useful for this (look for a PR later today).

@dlech
Copy link
Collaborator

dlech commented Jul 13, 2022

Have you also tried using the GATT Database Hash characteristic (0x2B2A) instead of the Services Changed characteristic? I think this could avoid many of the bugs mentioned here. (And thanks for the very detailed report!)

@dlech dlech added the bug Something isn't working label Jul 13, 2022
@kzyapkov
Copy link

kzyapkov commented Jul 25, 2022

I observed caching issues where an entire service was erroneously cached after a reconnect to the BLE Peripheral. This happens in a very simple script which implements the SiLabs OTA Update protocol, GATT service described in "3.5 Silicon Labs OTA GATT service" in the linked document.

The gist of it:

  • both application and bootloader expose the OTA GATT service
  • the application does not add the "data" characteristic to the service, which is used as indication on whether a reboot into bootloader is necessary
  • writing 0x00 to the control char reboots the device into bootloader where the OTA service has the full set of characteristics

Although after the reboot the device only advertises a single OTA service with complete set of characteristics, BlueZ was reporting two insances thereof -- one was the cached instance with control char only, while the other was the freshly advertised, complete OTA service. The thing which solved it for me is setting Cache = yes instead of always in bluetoothd config. @dlech may remember bluez/bluez#191 so I won't bother logging another issue for bluez.

What I meant to say is, @projectgus , maybe try setting Cache = yes in /etc/bluetooth/main.conf

@projectgus
Copy link
Contributor Author

projectgus commented Jul 25, 2022

Thanks @kzyapkov for sharing this behaviour. In my case I have Cache = yes already, but my device is paired & trusted so I would expect caching in this case (with Cache = always it would be cached even if not paired.)

BlueZ was reporting two instances thereof -- one was the cached instance with data char only, while the other was the freshly advertised

Was BlueZ definitely reporting this (i.e. do you see it in bluetoothctl), or is it only appearing in Bleak (with Bluez as the backend)?

If the problem only shows in Bleak, then it's possible something like case 03 above is happening in Bleak - i.e.

  1. Bluez sends Bleak the cached service. (dbus InterfacesAdded)
  2. During service discovery Bluez notices the changed service and asks Bleak to remove the cached service (dbus InterfaceRemoved).
  3. Bleak doesn't handle this request, so the cached service stays in Bleak.
  4. Bluez then asks Bleak to add the new correct service (dbus InterfacesAdded again). Bleak may do this successfully, if the handle values are different. But now Bleak has both services.

This is just a guess, to confirm you could run your script with logging set to Debug level and look at the DBus requests as they come through.

Have you also tried using the GATT Database Hash characteristic (0x2B2A) instead of the Services Changed characteristic? I think this could avoid many of the bugs mentioned here. (And thanks for the very detailed report!)

Thanks @dlech for this suggestion and sorry I haven't gotten back around to look at this yet. Still plan to do so. :). Using Database Hash is a very good idea (I confess I didn't actually know this feature existed until you suggested it, despite thinking to myself that something like this really should exist to save on fiddling with Services Changed!)

I fear the same thing might happen here - if the hash changes, Bluez will tell Bleak to remove the cached interfaces and then add the new ones. But I will try it to confirm, and also take a look at a fix. Probably won't have time for a few weeks, though..

@dlech
Copy link
Collaborator

dlech commented Jul 26, 2022

I'm very happy to work on a patch to handle InterfacesRemoved if you agree it's desirable to try for that.

We got this fix for free in #902, can you please test?

@kzyapkov
Copy link

kzyapkov commented Jul 26, 2022

Was BlueZ definitely reporting this (i.e. do you see it in bluetoothctl), or is it only appearing in Bleak (with Bluez as the backend)?

Definitely BlueZ reporting duplicate instances of the service under two separate handles when Cache = always, works well when Cache = yes. Confirmed with BLEAK_LOGGING=1 and by looking at bluetoothctl output. I'm not bonded to the device. This is reproducible using Silabs' stock BLE OTA example: bootloader-apploader and a blinky app. #902 doesn't remedy the situation, but I didn't expect it to -- its BlueZ which has the stale cache. Anyway, sorry for the noise.

@projectgus
Copy link
Contributor Author

We got this fix for free in #902, can you please test?

@dlech Great work, this does indeed seem to fix the issues! 💯

Using either the Services Changed or Database Hash characteristics correctly, I no longer see any stale values when changing cached GATT service IDs or characteristics (none of the bugs described above still exist).

I had thought the bug described under "02" was a bug in Bluez (changing characteristic but not service UUID doesn't lead to re-discovery), but it seems like it was a bug in Bleak as it's now fixed as well. Yay!

The "Bonus bluez bug" I described still seems like it might be there - somehow I still see more InterfacesRemoved/InterfacesAdded events after a cached change, that go away after bluetoothd is restarted. But I don't see any incorrect values at any stage of this process now, so I think maybe this is also fixed - or at least a very low severity behavior difference in Bluez rather than a bug.

@dlech dlech closed this as completed in f55b01d Jul 29, 2022
dlech added a commit that referenced this issue Jul 29, 2022
Added
-----

* Added new ``assigned_numbers`` module and ``AdvertisementDataType`` enum.
* Added new ``bluez`` kwarg to ``BleakScanner`` in BlueZ backend.
* Added support for passive scanning in the BlueZ backend. Fixes #606.
* Added option to use cached services, characteristics and descriptors in WinRT backend. Fixes #686.
* Added ``PendingDeprecationWarning`` to use of ``address_type`` as keyword argument. It will be moved into the
  ``winrt`` keyword instead according to #623.
* Added better error message when adapter is not present in BlueZ backend. Fixes #889.

Changed
-------

* Add ``py.typed`` file so mypy discovers Bleak's type annotations.
* UUID descriptions updated to 2022-03-16 assigned numbers document.
* Replace use of deprecated ``asyncio.get_event_loop()`` in Android backend.
* Adjust default timeout for ``read_gatt_char()`` with CoreBluetooth to 10s. Merged #891.
* ``BleakScanner()`` args ``detection_callback`` and ``service_uuids`` are no longer keyword-only.
* ``BleakScanner()`` arg ``scanning_mode`` is no longer Windows-only and is no longer keyword-only.
* All ``BleakScanner()`` instances in BlueZ backend now use common D-Bus object manager.
* Deprecated ``filters`` kwarg in ``BleakScanner`` in BlueZ backend.
* BlueZ version is now checked on first connection instead of import to avoid import side effects. Merged #907.

Fixed
-----

* Documentation fixes.
* On empty characteristic description from WinRT, use the lookup table instead of returning empty string.
* Fixed detection of first advertisement in BlueZ backend. Merged #903.
* Fixed performance issues in BlueZ backend caused by calling "GetManagedObjects" each time a
  ``BleakScanner`` scans or ``BleakClient`` is connected. Fixes #500.
* Fixed not handling "InterfacesRemoved" in ``BleakClient`` in BlueZ backend. Fixes #882.
* Fixed leaking D-Bus socket file descriptors in BlueZ backend. Fixes #805.
@dlech dlech mentioned this issue Jul 29, 2022
dlech added a commit that referenced this issue Jul 29, 2022
Added
-----

* Added new ``assigned_numbers`` module and ``AdvertisementDataType`` enum.
* Added new ``bluez`` kwarg to ``BleakScanner`` in BlueZ backend.
* Added support for passive scanning in the BlueZ backend. Fixes #606.
* Added option to use cached services, characteristics and descriptors in WinRT backend. Fixes #686.
* Added ``PendingDeprecationWarning`` to use of ``address_type`` as keyword argument. It will be moved into the
  ``winrt`` keyword instead according to #623.
* Added better error message when adapter is not present in BlueZ backend. Fixes #889.

Changed
-------

* Add ``py.typed`` file so mypy discovers Bleak's type annotations.
* UUID descriptions updated to 2022-03-16 assigned numbers document.
* Replace use of deprecated ``asyncio.get_event_loop()`` in Android backend.
* Adjust default timeout for ``read_gatt_char()`` with CoreBluetooth to 10s. Merged #891.
* ``BleakScanner()`` args ``detection_callback`` and ``service_uuids`` are no longer keyword-only.
* ``BleakScanner()`` arg ``scanning_mode`` is no longer Windows-only and is no longer keyword-only.
* All ``BleakScanner()`` instances in BlueZ backend now use common D-Bus object manager.
* Deprecated ``filters`` kwarg in ``BleakScanner`` in BlueZ backend.
* BlueZ version is now checked on first connection instead of import to avoid import side effects. Merged #907.

Fixed
-----

* Documentation fixes.
* On empty characteristic description from WinRT, use the lookup table instead of returning empty string.
* Fixed detection of first advertisement in BlueZ backend. Merged #903.
* Fixed performance issues in BlueZ backend caused by calling "GetManagedObjects" each time a
  ``BleakScanner`` scans or ``BleakClient`` is connected. Fixes #500.
* Fixed not handling "InterfacesRemoved" in ``BleakClient`` in BlueZ backend. Fixes #882.
* Fixed leaking D-Bus socket file descriptors in BlueZ backend. Fixes #805.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend: BlueZ Issues and PRs relating to the BlueZ backend bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants