Skip to content
This repository has been archived by the owner on Jan 27, 2023. It is now read-only.

org.freedesktop.DBus.Error.NoReply after ~24 hours #87

Closed
skrawn opened this issue Aug 13, 2021 · 7 comments
Closed

org.freedesktop.DBus.Error.NoReply after ~24 hours #87

skrawn opened this issue Aug 13, 2021 · 7 comments

Comments

@skrawn
Copy link

skrawn commented Aug 13, 2021

I am using python-networkmanager to try to find network devices that get plugged into or remove from my system. The devices have known IP addresses, but I don't know what port the device is going to be plugged into, so I have to scan through my wired Ethernet interfaces, create a gateway connection, ping for the device, and if not found, bring the connection down and delete the connection. This process happens every 1 minute on about ~5 different Ethernet devices. After doing this for almost exactly 24 hours, I am eventually hit with a org.freedesktop.DBus.Error.NoReply exception:

2021-08-13 00:24:42,542 - root - ERROR - return NetworkManager.NetworkManager.GetDevices()
2021-08-13 00:24:42,542 - root - ERROR -   File "<string>", line 3, in GetDevices
2021-08-13 00:24:42,542 - root - ERROR -   File "/usr/lib/python3.8/site-packages/dbus/proxies.py", line 141, in __call__
2021-08-13 00:24:42,543 - root - ERROR - return self._connection.call_blocking(self._named_service,
2021-08-13 00:24:42,543 - root - ERROR -   File "/usr/lib/python3.8/site-packages/dbus/connection.py", line 652, in call_blocking
2021-08-13 00:24:42,543 - root - ERROR - reply_message = self.send_message_with_reply_and_block(
2021-08-13 00:24:42,543 - root - ERROR - dbus.exceptions
2021-08-13 00:24:42,543 - root - ERROR - .
2021-08-13 00:24:42,543 - root - ERROR - DBusException
2021-08-13 00:24:42,543 - root - ERROR - :
2021-08-13 00:24:42,543 - root - ERROR - org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

I can't really find much information about what this exception means, except that the DBus service that may have quit or crashed - as far as I can tell, NetworkManager hasn't crashed, restarted or even thrown any errors that would indicate a problem. My best guess about what is going on here is the connection to DBus main loop (?) is severed. My application includes the call to DBusGMainLoop(set_as_default=True) in main(). Is there something simple I missed here?

@atroche
Copy link

atroche commented Aug 18, 2021

We're having the exact same problem. Only workaround we have for now is detecting when this exception is being thrown and restarting the process.

We're on 2.2 and also doing:

    DBusGMainLoop(set_as_default=True)

@maggie44
Copy link

maggie44 commented Oct 20, 2021

Ran across the same issue, in the same context.

I am importing at the top of the file same as you:

    from dbus.mainloop.glib import DBusGMainLoop
    DBusGMainLoop(set_as_default=True)

The code is running in a Docker container. When the error occurs, if I go in to the Docker container and execute the same script from a new file (test.py) the error doesn't occur suggesting it is an issue with long running scripts.

Does anyone know how these loops work? Has anyone tried re-calling the DBusGMainLoop before each request to see if the connection is restored? Something like:

def dbus_import():
    from dbus.mainloop.glib import DBusGMainLoop
    DBusGMainLoop(set_as_default=True)

def my_script():
    dbus_import()

    #My NetworkManager command here

Or:

from dbus.mainloop.glib import DBusGMainLoop
import importlib
import sys

def dbus_import():
    for k,v in sys.modules.items():
        if k.startswith('DBusGMainLoop'):
            importlib.reload(v)
    DBusGMainLoop(set_as_default=True)

def my_script():
    dbus_import()

    #My NetworkManager command here

My issue is intermittent so hard to do any testing reliably, but it sounds like @skrawn has more regular issues?

Wondering if may also be fixed by: #85' but that could restore an existing issue.

follow_name_owner_changesbool
If the object path is a well-known name and this parameter is false (default), resolve the well-known name to the unique name of its current owner and bind to that instead; if the ownership of the well-known name changes in future, keep communicating with the original owner. This is necessary if the D-Bus API used is stateful.

If the object path is a well-known name and this parameter is true, whenever the well-known name changes ownership in future, bind to the new owner, if any.

If the given object path is a unique name, this parameter has no effect.

https://dbus.freedesktop.org/doc/dbus-python/dbus.bus.html

@maggie44
Copy link

def dbus_import():
    from dbus.mainloop.glib import DBusGMainLoop
    DBusGMainLoop(set_as_default=True)

def my_script():
    dbus_import()

    #My NetworkManager command here

Or:

from dbus.mainloop.glib import DBusGMainLoop
import importlib
import sys

def dbus_import():
    for k,v in sys.modules.items():
        if k.startswith('DBusGMainLoop'):
            importlib.reload(v)
    DBusGMainLoop(set_as_default=True)

def my_script():
    dbus_import()

    #My NetworkManager command here

So none of these options worked. Has been a bit of a pain to test as it requires 24 hours each time to reproduce the error. But finally got something worth reporting.

I ran python-networkmanager 2.1 and it too produced the same error after 24 hours, but only when I included a manual import of DBus:

    from dbus.mainloop.glib import DBusGMainLoop
    DBusGMainLoop(set_as_default=True)

So I then ran version 2.2 again without the manual import of the DBus mainloop (which required setting the follow_name_owner_changes back to False #85) and it then worked ok again, no timeouts after 24 hours.

So it seems it is not the follow_name_owner_changes boolean that is causing the issue, but manually importing the DBus mainloop that trips something up in the python-networkmanager script.

@seveas could we merge #85 or #90 (maybe as a 2.2.1?) so we can get python-networkmanager back up and running?

@onsolutionjames
Copy link

@Maggie0002 @atroche Did you ever find a solution for this? We had about 35 Raspberry Pi's running python-networkmanager all start returning the same error after about 23 days of uptime, so I'm guessing it's probably the same issue you were having.

The error we're receiving is:

org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

@maggie44
Copy link

Sounds like it might be a different issue. I have rolled back to 2.1 waiting merging of one of the pull requests with the fix.

@Sherry112
Copy link

Also facing the same issue as @onsolutionjames after an uptime of 2 weeks.

@seveas
Copy link
Owner

seveas commented Jan 27, 2023

Closing all PR's and issues prior to archiving this repository.

@seveas seveas closed this as completed Jan 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants