Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTherm gateway: unhandled error #22074

Closed
andriej opened this issue Mar 15, 2019 · 8 comments
Closed

OpenTherm gateway: unhandled error #22074

andriej opened this issue Mar 15, 2019 · 8 comments

Comments

@andriej
Copy link
Contributor

andriej commented Mar 15, 2019

Home Assistant release with the issue: 0.89.2 with pyotgw 0.4b2
Last working Home Assistant release (if known): n/a

Operating environment (Hass.io/Docker/Windows/etc.): Linux / venv / python 3.7.2
Component/platform: opentherm gateway

Description of problem:
Error shown in log, component failed to work further - reported values stayed as for the moment of error (so I can see boiler working non-stop even tho it's off since at least 12h)

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):
not relevant, socket to nodemcu with espeasy as serial server

Traceback (if applicable):

Unhandled error in exception handler
context: {'message': 'Task exception was never retrieved', 'exception': RecursionError('maximum recursion depth exceeded while calling a Python object'), 'future': <Task finished coro=<protocol._watchdog() done, defined at /srv/homeassistant/lib/python3.7/site-packages/pyotgw/protocol.py:120> exception=RecursionError('maximum recursion depth exceeded while calling a Python object')>}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 1643, in call_exception_handler
    self._exception_handler(self, context)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/core.py", line 110, in async_loop_exception_handler
    "Error doing job: %s", context['message'], **kwargs)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1412, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1519, in _log
    self.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1529, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1591, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 905, in handle
    self.emit(record)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/system_log/__init__.py", line 168, in emit
    _figure_out_source(record, stack, self.hass))
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/system_log/__init__.py", line 98, in __init__
    traceback.format_exception(*record.exc_info))
  File "/usr/local/lib/python3.7/traceback.py", line 121, in format_exception
    type(value), value, tb, limit=limit).format(chain=chain))
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  [Previous line repeated 486 more times]
  File "/usr/local/lib/python3.7/traceback.py", line 473, in __init__
    _seen.add(id(exc_value))
RecursionError: maximum recursion depth exceeded while calling a Python object

Additional information:
pyotgw is from latest version which should have 'reconnection issue' fixed - 0.4b2
@mvn23

@andriej
Copy link
Contributor Author

andriej commented Mar 15, 2019

It could be fault of some connection issue with the gateway being unavailable, but right now even restarting whole HA cause it's being unavailable:
2019-03-15 14:50:37 ERROR (MainThread) [pyotgw.pyotgw] Could not connect to serial device on socket://192.168.x.x:xxxx. Will keep trying. Reported error was: Could not open port socket://192.168.x.x:xxxx: [Errno 113] No route to host

@mvn23
Copy link
Contributor

mvn23 commented Mar 15, 2019

Did you have a lot of disconnect errors between HA start and this problem occuring?

@andriej
Copy link
Contributor Author

andriej commented Mar 15, 2019

Yes - sort of. Before the fix some ofthe disconnection happened to freeze HA totally. After the fix they seem to get receonnect and work properly, but still it's nodemcu serial server, so WiFi and possible connection dropouts.

@mvn23
Copy link
Contributor

mvn23 commented Mar 15, 2019

Trying to reproduce it here by stressing the watchdog/reconnect logic, but so far without success. Any further details about the connection drops (frequency, duration) and/or a debug log could be helpful.

@andriej
Copy link
Contributor Author

andriej commented Mar 15, 2019

The thing is that when that error occured - probably the nodemcu otgw itself somehow hanged/lost connection. But what's worse, HA didn't success to start until that nodemcu got hard-restarted (power off/power on).
So basically when it occured, HA waited when startup until the restart - without that I've tried to restart HA itself and no successs.

(totally in meanwhile I wait for ethernet module for OTGW to arrive then I'll do tests over LAN, but it's still few weeks before delivery tho...)

In logfile, just after the restart of otgw gateway - after this message HA finished to startup (after loooong wait from my side):

Mar 15 16:00:49 hass[14329]: 2019-03-15 16:00:49 WARNING (MainThread) [pyotgw.protocol] Command PS failed with Low power, retrying...
Mar 15 16:00:49 hass[14329]: 2019-03-15 16:00:49 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: Low power

@mvn23
Copy link
Contributor

mvn23 commented Mar 15, 2019

The thing is that when that error occured - probably the nodemcu otgw itself somehow hanged/lost connection. But what's worse, HA didn't success to start until that nodemcu got hard-restarted (power off/power on).

I see the same behavior indeed. If the gateway is unreachable when HA starts, HA's interface will not be loaded until the connection to the gateway is made. During this time, I do see other activity in the debug log so it's not completely blocked. It seems to be related to how the connecting routine is scheduled, looking into that now.

@andriej
Copy link
Contributor Author

andriej commented Mar 15, 2019

Great! I hope to help finding the bug and making component even better :-)

@andriej
Copy link
Contributor Author

andriej commented Mar 17, 2019

Seems fixed with #22121 and #22106.
Will test next release and reopen if still an issue.

@andriej andriej closed this as completed Mar 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants