Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate switching from txdbus to other Python DBus package #236

Closed
hbldh opened this issue Jul 1, 2020 · 34 comments · Fixed by #389
Closed

Evaluate switching from txdbus to other Python DBus package #236

hbldh opened this issue Jul 1, 2020 · 34 comments · Fixed by #389
Assignees
Labels
asyncio Problems related to asyncio and multiple clients Backend: BlueZ Issues and PRs relating to the BlueZ backend enhancement New feature or request help wanted Extra attention is needed Opinions Appreciated Please add an opinion on your desired resolution on this issue!

Comments

@hbldh
Copy link
Owner

hbldh commented Jul 1, 2020

Description

A problem with using txdbus as the solution to talk to the BlueZ DBus service is that it seems to leak file descriptors somewhere. Right now this is solved by having a store of twisted reactors in the __init__.py of bleak.backends.bluez.

This might be improved by using another package. Candidates to evaluate:

Pros

  • Both are pure python and would reduce the number of dependencies for bleak
  • It might resolve the file descriptor problem
  • An asyncio-only solution in Linux

Cons

  • Twisted (which powers txdbus) is a very stable and efficient package. Replacing it with newer, less tested packages might lead to unexpected problems.
@hbldh hbldh added enhancement New feature or request help wanted Extra attention is needed Backend: BlueZ Issues and PRs relating to the BlueZ backend Opinions Appreciated Please add an opinion on your desired resolution on this issue! asyncio Problems related to asyncio and multiple clients labels Jul 1, 2020
@hbldh hbldh self-assigned this Jul 1, 2020
@dlech
Copy link
Collaborator

dlech commented Jul 1, 2020

I think I would be in favor of exploring one of the alternatives. I did some work last year to fix some bugs in txdbus. One bug (that fortunately doesn't affect Bleak) I was not able to fix because it would require changing twisted, but I found it to just be too complicated.

Both of the packages you have listed look good.

@cspensky
Copy link
Contributor

cspensky commented Sep 18, 2020

I am running into some weird issues with Twisted right now that seem to be linked to this issue. It seems to be very related to the asyncio issue that was raised. We will likely be looking into this in the coming weeks. If anyone else has input before we dive in, it would be much appreciated.

Here is the problem that I am seeing when trying to read a characteristic, for reference:

Traceback (most recent call last):
  File ".../bleak/backends/bluezdbus/client.py", line 454, in read_gatt_char
    ).asFuture(asyncio.get_event_loop())
twisted.internet.error.ConnectionDone: Connection was closed cleanly.

@hbldh
Copy link
Owner Author

hbldh commented Sep 21, 2020

@cspensky Please try! I think Jeepney is more promising than DBussy, even though I like the name better of the latter one.

@mitea1
Copy link

mitea1 commented Nov 5, 2020

I am running into some weird issues with Twisted right now that seem to be linked to this issue. It seems to be very related to the asyncio issue that was raised. We will likely be looking into this in the coming weeks. If anyone else has input before we dive in, it would be much appreciated.

Here is the problem that I am seeing when trying to read a characteristic, for reference:

Traceback (most recent call last):
  File ".../bleak/backends/bluezdbus/client.py", line 454, in read_gatt_char
    ).asFuture(asyncio.get_event_loop())
twisted.internet.error.ConnectionDone: Connection was closed cleanly.

is there any progress on that? I'm having the same issue but if notify callback is called of the chracteristics. would very much appreciate if this was solved soon.

@hbldh
Copy link
Owner Author

hbldh commented Nov 5, 2020

@mitea1 as far as I know, no effort has been spent trying to solve this yet. Please give it a go if you feel up for a challenge!

@mitea1
Copy link

mitea1 commented Nov 5, 2020

@mitea1 as far as I know, no effort has been spent trying to solve this yet. Please give it a go if you feel up for a challenge!

@hbldh I'm afraid I will not find the time to dive deeply into this and solve it.

@cspensky Do you know if this only happens on specific bluez versions. I'm using 5.50 and bleak 0.9.1. If there was a version without this issue I could downgrade to it

@cspensky
Copy link
Contributor

cspensky commented Nov 5, 2020

@mitea1 I have been seeing this same issue, and have not found any solution (despite a reasonable amount of effort). My next step was going to be to switch the DBus package to fix it. Unfortunately, other things at work have taken my attention recently. I hope to have some time this month to address this issue. Please do let me know if you make any headway.

For reference, I am running Bluez 5.48 and am on the develop branch of bleak.

@mitea1
Copy link

mitea1 commented Nov 5, 2020

@hbldh do I get that right if I say at the moment there is no valid version set bleak/bluez that works 100% without this issues regarding dbus? so only the .net backend version works?

Just asking because I could probably switch to other bleak/bluez version if there is any that works without this issues

@hbldh
Copy link
Owner Author

hbldh commented Nov 5, 2020

do I get that right if I say at the moment there is no valid version set bleak/bluez that works 100% without this issues regarding dbus?

No. I would not say that you are right. I have not personally experienced this in any of my usage of Bleak on Ubuntu, or at least not to any extent that has been annoying enough to warrant me remembering it. I have used a variety of BlueZ versions over the course of several years, but I have not registered which. Some Linux OS versions and BlueZ have more problems than others, see e.g. #332. But I have no record of which have worked well, since such errors have been isolated occurrences which have been solved by either identifying problematic hardware or downgrading BlueZ or Linux kernel.

What do you mean 100%? That this will never happen? I cannot and will not guarantee that on any system. It is Bleak, plus several OS subsystems working together, combined with hardware, wireless communication and some connected peripheral with hardware and software of its own. Of course some strange disconnections will happen. Do not expect flawless execution with BLE.

so only the .net backend version works?

No. I would say that all three backends "work".

@mitea1
Copy link

mitea1 commented Nov 6, 2020

@cspensky I made an interesting observation:

I use bleak partly to send (write) data to a "rx" characteristic and get responses from another "tx" characteristic which I subscribed to. This process is abstracted in a send_request() method. inside this method I do the following

  1. subscribe to TX char (using start_notify())
  2. write_char RX char (using write_gatt_char())
  3. unsubscribe TX char (using stop_notify() )

I noticed that I get this error after every second call of send_request().

Now I changed things a little bit

After I connect to the device I call start_notify() only once and unsubscribe just before disconnecting.

So send_request() basically only consists of:

  • write_char RX char (using write_gatt_char())

Now it works without any Issues.

So to make things short:

My assumption is that there is a problem for bluez if we use start_notify() several times in an ongoing session (no disconnection). even if you unsubscribe the characteristics in between

So the solution for me was just call start_notify() only once after bleak connected to a device

@hbldh maybe this sounds legit to you or you might have an explanation for it?

@cspensky
Copy link
Contributor

cspensky commented Nov 6, 2020

@mitea1 Thank you for the insights! I am also using a TX and RX characteristic with notifications. Based on your findings, I will see if I can try to get to the bottom of the issue. May I ask what type of device you are communicating with? I saw this when interacting with an iPhone XR.

@mitea1
Copy link

mitea1 commented Nov 6, 2020

@mitea1 Thank you for the insights! I am also using a TX and RX characteristic with notifications. Based on your findings, I will see if I can try to get to the bottom of the issue. May I ask what type of device you are communicating with? I saw this when interacting with an iPhone XR.

@cspensky Bleak is running inside a docker image balenalib/raspberrypi3-debian-python:3.7 on a RPi4 to communicate with proprietary embedded device using nordic sdk.

Traceback (most recent call last):
File ".../bleak/backends/bluezdbus/client.py", line 454, in read_gatt_char
).asFuture(asyncio.get_event_loop())
twisted.internet.error.ConnectionDone: Connection was closed cleanly.

just out of curiosity... why are you using read_gatt_char() functionality at all if you are using RX and TX like characteristics with notifications? shouldn't you use the notify callback for the TX characteristic of the device you are communication with instead like I do? otherwise you would have to poll the characteristic the whole time.

@5frank
Copy link

5frank commented Nov 23, 2020

Another alternative could be "Bluetooth Management sockets" http://www.bluez.org/the-management-interface/. I do not think it is a good idea, lots of work etc, but it deserves to be mentioned.

@hidaris
Copy link

hidaris commented Nov 23, 2020

python-dbus-next is another good option.

@dlech dlech mentioned this issue Dec 9, 2020
@cspensky
Copy link
Contributor

cspensky commented Jan 9, 2021

After playing with dbus-next for a bit, it seems to be a pretty heavyweight solution. 😞 Is anyone else seeing pretty high CPU usage, especially when scanning? I may poke around a bit and see if one of the other solutions to see if they are a bit more lightweight... Would love to get some input before I jump into this.

@dlech
Copy link
Collaborator

dlech commented Jan 9, 2021

Can you do some profiling to see what is using the CPU? I'm surprised that it would be much different from txdbus.

@cspensky
Copy link
Contributor

cspensky commented Jan 9, 2021

I just ran a quick instance with cProfile where I ran a scanner until 3 devices were found and connected to.
Sorted by Total Time, it seems like the unmarshalling of BlueZ data is the main culprit. There is a clear spike in the CPU usage when scanning, which makes sense as a pure Python3 solution.

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
1287	14.84	0.01153	14.84	0.01153	~:0(<method 'poll' of 'select.epoll' objects>)
2550	0.2162	8.48e-05	0.6628	0.0002599	traceback.py:321(extract)
25169	0.1803	7.163e-06	0.1803	7.163e-06	~:0(<built-in method posix.stat>)
75383	0.08302	1.101e-06	0.1328	1.762e-06	unmarshaller.py:49(read)
1287	0.0561	4.359e-05	17.48	0.01358	base_events.py:1784(_run_once)
19584	0.05558	2.838e-06	0.2065	1.054e-05	linecache.py:53(checkcache)
1239	0.05236	4.226e-05	0.665	0.0005367	unmarshaller.py:241(_unmarshall)
215290/214199	0.0449	2.096e-07	0.04518	2.109e-07	~:0(<built-in method builtins.len>)
1655	0.04296	2.596e-05	0.09222	5.572e-05	__init__.py:284(__init__)
21673	0.03959	1.827e-06	0.1033	4.766e-06	unmarshaller.py:158(read_ctype)
1655	0.03751	2.267e-05	0.03751	2.267e-05	~:0(<method 'write' of '_io.TextIOWrapper' objects>)
246	0.03678	0.0001495	0.03678	0.0001495	~:0(<method 'read' of '_io.BufferedReader' objects>)
36102/2445	0.03607	1.475e-05	0.4629	0.0001893	unmarshaller.py:233(read_argument)
19806	0.03494	1.764e-06	0.1212	6.122e-06	traceback.py:285(line)
36598	0.03487	9.528e-07	0.03487	9.528e-07	traceback.py:292(walk_stack)
40556	0.03451	8.509e-07	0.06644	1.638e-06	unmarshaller.py:122(align)
34088	0.03195	9.373e-07	0.03981	1.168e-06	linecache.py:147(lazycache)
14	0.03127	0.002234	0.03252	0.002323	~:0(<built-in method _imp.create_dynamic>)
18763	0.03078	1.64e-06	0.07633	4.068e-06	linecache.py:15(getline)
34088	0.03051	8.95e-07	0.03051	8.95e-07	traceback.py:243(__init__)
15086/11300	0.02908	2.573e-06	0.07188	6.361e-06	signature.py:251(verify)
102468	0.02838	2.77e-07	0.0358	3.494e-07	~:0(<built-in method builtins.isinstance>)
8730/8517	0.0269	3.158e-06	0.2647	3.108e-05	unmarshaller.py:181(read_variant)
4505/1807	0.02534	1.403e-05	0.4487	0.0002483	unmarshaller.py:204(read_array)

@dlech
Copy link
Collaborator

dlech commented Jan 9, 2021

Thanks for that. txdbus is pure Python too, so if it has better performance, maybe the unmarshaller can be copied from there.

The Jeepney library specifically has performance as a non-goal, so I think that rules out that library.

@cspensky
Copy link
Contributor

Unfortunately txdbus had some other issues for me, so I was never able to really stress test it. The dbus-next migration seemed to solve those.

I used to contribute to PyBluez that used pygattlib, which was just a gatttool wrapper for Python.
Is there a particular reason that we want Bleak to use D-Bus or our Python instead of a more low-level, faster approach?
I will likely need to implement something more perfomant for my needs, and if there is a desire to have this incorporated in Bleak, I'm happy to drive that effort. I plan to keep the Bleak API intact either way, so it will be compatible (the idea of a universal API in Python is fantastic!).

This doesn't appear to be has big of a problem on OSX, because it seems like CoreBluetooth does a lot of the heavy lifting with data processing. I have not had the opportunity test on Windows yet.

@dlech
Copy link
Collaborator

dlech commented Jan 11, 2021

Is performance important for advertising data, characteristic notifications or both?

Is there a particular reason that we want Bleak to use D-Bus or our Python instead of a more low-level, faster approach?

Using such low-level calls requires elevated privileges and requires in-depth knowledge of the Bluetooth protocol and could potentially interfere with BlueZ that will still be running in the background anyway.

For me personally, I think I would rather spend the time to fix BlueZ rather than trying to work around it.

@cspensky
Copy link
Contributor

I'd love to have it be more performant all around. I'm not proposing going away from BlueZ but using a more direct interface. Something like gattlib's Python interface (pygattlib) should be significantly less overhead than using a pure-Python D-Bus solution.

The library still seems to be maintained as well.

@dlech
Copy link
Collaborator

dlech commented Jan 12, 2021

Ah, I see I misunderstood how that library worked (based on the readme, it sounds like I'm not the only one).

The only downside I see is that it can be a pain for users on non-x86 architectures to compile the library. But maybe it makes sense to add it as an optional backend for those who need the performance benift?

@cspensky
Copy link
Contributor

This was the same conclusion that @Bernstern and I came to yesterday. I plan to add gattlib as an optional backend on Linux. Some preliminary testing has led me to believe that it will be pretty straightforward and significantly faster. There will hopefully be a pull request in the near future.

@hbldh
Copy link
Owner Author

hbldh commented Jan 12, 2021

@cspensky I might have missed something in the discussion above, but I just want to weigh in with my opinion regarding the inclusion of gattlib in Bleak.

I used to use pygattlib a lot about four - five years ago. It was for high-speed transfer of data from MbientLab sensors. It handled very high rates of transfer and I was very content with it. It was at the time (2016-2019) not maintained and there were some bugs that were problematic, but managable. Then came the announcement that BlueZ would deprecate gatttool and only support the new DBus API that they were working on. I realized that in order to keep a working BLE testing environment in Python I would need to do something else. It does not matter how much I prefer not using such a verbose and "chatty" solution as the DBus system, it was the direction of BlueZ, which is the de facto standard for Bluetooth in most Linux distributions.

Consequently, I created Bleak as an exercise in cross-platform API building and asyncio coding. pygattlib had all the threading in the C++ parts and I disliked not being able to debug easily. Thus I stipulated some ground rules for myself and went to work. This package is, with contributions from several others, the result.

If you by some impressive feat of software engineering manage to coax gattlib into a asyncio coat, then I will be very impressed. It might be done, I don't know, because I have never explored that path. I would still prefer it not to be in the pip-installable bleak package though, even as an optional backend available as an extras. I think you cannot escape e.g. compilation, extra apt-installable packages and permission/capability handling with libcap or similar.

My suggestion is that you make a new package and host it yourselves on PyPI. I would gladly include references to it in the Bleak docs and README, but since it relies on a "possibly-soon-deprecated" part of BlueZ and probably will violate some of the initial ideas of Bleak, I argue that it should be an external add-on in that case.

Another option is to actually use pygattlib instead of Bleak. It is very good and it is what I used as a model for Bleak, and if it has gotten an active maintenence once again, then I highly recommend it for high throughput demands. DBus is an unavoidable bottleneck I think, regardless how optimized marshalling/unmarshalling you write.

You should, in my opinion, use Bleak for what it is written for (cross-platform without much hassle) and pygattlib when you need high speed in a Linux distribution system.

@cspensky
Copy link
Contributor

@hbldh Thank you for the detailed response. Indeed, I agree that Bleak is the of the various attempts that I have worked with, hence my current involvement in this repository. Thank you for all of your work! 🙏

I did start down the gattlib as a Bleak backend path out of curiosity but have since pulled the plug. For interested parties, it does seem to be possible to make it play nicely with async, you can see my branches here: bleak and https://github.com/Allthenticate/gattlib/tree/development. It appears that D-Bus in general is the bottleneck and not the Python code in particular. I ran some tests in pure C vs. the Python version and did not see much difference, unfortunately. The library that I was comparing the speed against was bluepy. However, it appears that are circumventing D-Bus and doing some hacky stuff where they are wrapping a compiled binary with Python to get that speed gain.

The only way I've really found to get our CPU usage down is by adding scanning filters or, in our case, acting as a peripheral instead of a central. Thus, I plan to jump on board with bless and help push that forward.

@smurfix
Copy link
Contributor

smurfix commented Feb 22, 2021

I am working on replacing txdbus with dbus-next. Or rather asyncdbus, which is a fork of mine that supports Trio as well as asyncio, using the anyio adapter.

Scanning works already, getting it to work was easy enough that I expect the rest to fall into place without too much hassle. Progress is somewhat limited by the fact that asyncdbus needs anyio v3 which is not even in alpha yet.

asyncdbus: https://github.com/M-o-a-T/asyncdbus, also on PyPI
anyio: https://github.com/agronholm/anyio (at the moment you need the start_task branch)

@hbldh
Copy link
Owner Author

hbldh commented Feb 25, 2021

@smurfix that sounds great, but be advised that there already is work done on porting to dbus-next (in the dbus-next-2 branch) and we are planning on merging that as soon as it is stable. You might want to make your efforts based on that if you want to stay compatible with bleak.

@smurfix
Copy link
Contributor

smurfix commented Feb 25, 2021

Yeah, I saw. My problem here is that there's no good way to do this the anyio/Trio way without deprecating .connect/.disconnect and replacing them with an async context manager. That applies to dbus-next as well as bleak, so a direct merge is a bit difficult.

That being said, I learned (and copied) a lot from the code in the dbus-next-2 branch, to the point that I can now use my code to reliably interrogate my [censored] blood pressure cuff. I have pushed the latest versions.

The cuff reader is in https://github.com/smurfix/ble-stuff.

@smurfix
Copy link
Contributor

smurfix commented Feb 25, 2021

I plan to check with the dbus-next maintainers about merging my fork back, but (a) that'll have to wait for anyio v3 to be resonably stable/ready, and (b) as I wrote it does require some incompatible changes.

@devbis
Copy link
Contributor

devbis commented Mar 2, 2021

I'm not sure this issue is the best place to mention it, but the current performance of the bleak with bluez backend is really poor.
I mean that the very basic code could consume a lot of processor time on small processors.

def device_detection_callback(device: BLEDevice, advertisement_data):
    print(device, advertisement_data)


async def run():
    async with BleakScanner(scanning_mode='passive') as scanner:
        scanner.register_detection_callback(device_detection_callback)
        await asyncio.sleep(3)
        devices = await scanner.get_discovered_devices()
        print('New devices: ', devices)
    await asyncio.sleep(1)

This code consumes up to 60-70% running on imx6ull 680 MHz processor (arm32, single core). Similar results are both for txdbus and dbus-next based backends.

I compared the results with the full-features software https://github.com/espruino/EspruinoHub bases on JS library "noble" for accessing BLE. CPU time is less than 3% for that software including scanning BLE and sending data to MQTT.

I suppose that it would be hard to reach the same performance results using python-only dependencies.
Maybe some approaches from noble could be used for bleak backend?

@hbldh
Copy link
Owner Author

hbldh commented Mar 2, 2021

@devbis That is problematic indeed... Is it Bleak or the DBus system that consumes the CPU? If it is Bleak and not the DBus system, then I have too look at what actually takes time.

Noble does not - as far as I know - use the DBus API of BlueZ, but implements its own solution, so it is not burdened by the overhead of DBus. It is an impressive feat and I have the outmost respect for the effort made by the developers to achieve such performance in Noble.

However, I am not capable of nor interested in copying that for Bleak. Bleak was meant to be the opposite of Noble in a way; only use existing, built-in, OS native BLE solutions. Bleak will not implement communication of its own or regress to using old solutions. In case you want that, there is the pygattlib project that avoids DBus by doing something similar as Noble.

@cspensky
Copy link
Contributor

cspensky commented Mar 2, 2021

@devbis Unfortunately, we had the same issues as well on both OSX and Linux. It appears to be a fundamental problem with parsing the high-throughput scanning data in Python, which is inherently slow. You can cut the CPU time a bit by filtering based on UUID or RSSI, for example. I submitted a fix a while back that fixed these on the dbus-next branch. However, it was still too CPU intensive for our needs. We now use Bleak for testing, but had to abandon it for anything more than poking around with devices. A fast, albeit EOL'd alternative, that has a C plugin and does not use D-Bus is https://github.com/IanHarvey/bluepy

@dlech dlech linked a pull request Mar 2, 2021 that will close this issue
@dlech dlech closed this as completed in #389 Mar 2, 2021
Bleak Roadmap automation moved this from Ideas to Done Mar 2, 2021
bdraco added a commit to bdraco/python-dbus-next that referenced this issue Aug 3, 2022
This change reduces the overhead of unmarshalling with the
goal of having bleak scanners not overwhelming the system.

See related issue:
hbldh/bleak#236 (comment)

```
(speed_up_unmarsh) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 14.397348250000505 seconds
(master) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 47.7756206660124 seconds
```
bdraco added a commit to bdraco/python-dbus-next that referenced this issue Aug 4, 2022
This change reduces the overhead of unmarshalling with the
goal of having bleak scanners not overwhelming the system.

See related issue:
hbldh/bleak#236 (comment)

```
(speed_up_unmarsh) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 14.397348250000505 seconds
(master) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 47.7756206660124 seconds
```
bdraco added a commit to bdraco/python-dbus-next that referenced this issue Sep 11, 2022
This change reduces the overhead of unmarshalling with the
goal of having bleak scanners not overwhelming the system.

See related issue:
hbldh/bleak#236 (comment)

```
(speed_up_unmarsh) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 14.397348250000505 seconds
(master) % python3 bench/unmarshall.py
Unmarshalling 1000000 bluetooth rssi messages took 47.7756206660124 seconds
```
@bdraco
Copy link
Contributor

bdraco commented Oct 26, 2022

@cspensky If you get a chance it would be great to know if 0.19.0 resolves the performance concerns

@cspensky
Copy link
Contributor

cspensky commented Oct 27, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asyncio Problems related to asyncio and multiple clients Backend: BlueZ Issues and PRs relating to the BlueZ backend enhancement New feature or request help wanted Extra attention is needed Opinions Appreciated Please add an opinion on your desired resolution on this issue!
Projects
Bleak Roadmap
  
Done
Development

Successfully merging a pull request may close this issue.

9 participants