Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KNX integration stops working #47735

Closed
0lek opened this issue Mar 10, 2021 · 13 comments · Fixed by XKNX/xknx#636 or #48350
Closed

KNX integration stops working #47735

0lek opened this issue Mar 10, 2021 · 13 comments · Fixed by XKNX/xknx#636 or #48350

Comments

@0lek
Copy link

0lek commented Mar 10, 2021

The problem

The KNX integration stops randomly working. Pressing e.g. a switch first changes the state of the switch, but then returns to the initial position. The load (lamp) doesn’t change - so the bus does not receive the information. I cannot find anything of interest in the logs. It’s one of those “it just doesn’t work” errors. A restart (but not reloading of KNX) fixes the issue for another period (periods are random - e.g. a couple of days or hours).
Maybe unrelated: HA does not pick up new KNX entities via reloading KNX stack, only after restart of HA. This used to work, but doesn’t know.

What is version of Home Assistant Core has the issue?

2021.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Core

Integration causing the issue

KNXo

Link to integration documentation on our website

https://www.home-assistant.io/integrations/knx/

Example YAML snippet

# Put your YAML below this line

Anything in the logs that might be useful for us?

# Put your logs below this line
@probot-home-assistant
Copy link

knx documentation
knx source
(message by IssueLinks)

@probot-home-assistant
Copy link

Hey there @Julius2342, @farmio, @marvin-w, mind taking a look at this issue as its been labeled with an integration (knx) you are listed as a codeowner for? Thanks!
(message by CodeOwnersMention)

@farmio
Copy link
Contributor

farmio commented Mar 10, 2021

Hi!
You may set the log level of xknx loggers to debug to find out what is happening. I'd suggest to start with .log and .telegram. If you don't find anything remove .telegram and enable the last two.

logger:
  default: warning
  logs:
    # From low to high verbosity.
    xknx.log: debug
    # xknx.telegram: debug
    # xknx.knx: debug
    # xknx.raw_socket: debug

Also please add knx configuration and some information about your installation (interface type etc.)

@0lek
Copy link
Author

0lek commented Mar 16, 2021

@farmio Thanks, I enabled telegram and knx. It just stopped working. PFA the last 1000 lines of logs. I honestly cannot find anything interesting. Also including raw_socket did not really help. I see that group addresses can't be synced, but this is rather a symptom...
logs.txt

As for the installation, see below for the config (indent might be off, as I copied the different yamls to notepad; but it all works)
knx.txt

The devices I have are a lot and from different manufacturers... Lingg&Janke, ABB, Merten, MDT. Not sure how much info you need there?

The interface is based on the Timberwolf Server 950Q (https://shop.elabnet.de/timberwolfserver/hutschiene-reg/timberwolf-server-950q-16gb-microsd_827_2265) - it has a certified KNX IP interface. I connect via LAN / Ethernet.
Pretty sure that the server is not the problem, because:

  1. It did work rock-solid for a couple of weeks, before breaking down
  2. I have actually a neighbor with the exact same symptoms; I set up a hass instance for him. However, he has a pretty standard Gira Interface

Let me know what kind of additional info you need?

Thanks!

@farmio
Copy link
Contributor

farmio commented Mar 16, 2021

Hm... this is odd. I don't see a knx tunnel reconnect or something other knx related that is not a slow GroupValueRead (which shouldn't cuase any troubles). Until line 1000 incoming every telegram get its Ack and this seems fine.
When taking a closer look I realized that there is not a single outgoing telegram in xknx.knx logger except for the Acks. There has to be at least the GroupValueReads...
You have a lot of sync_state: "every 5" sensors (94) . These should all generate telegrams in this 7 minutes of logs.

Did you leave xknx.log on debug? (just out of interest)

Something seems to block the telegram_queue here... 🧐 When this error occurs, does HA still update its states when triggered from the bus (eg. turning on a light from light switch)? So is only outbound communication not working?
Do you have the part of the log where the last outbound telegram is seen? You can search for the term
Sending: <KNXIPFrame <KNXIPHeader HeaderLength="6" ProtocolVersion="16" KNXIPServiceType="TUNNELLING_REQUEST"

@0lek
Copy link
Author

0lek commented Mar 17, 2021

I agree it's odd :)

The logs are currently:

xknx.telegram: debug
xknx.knx: debug
xknx.raw_socket: debug

Good idea with checking if state works when triggering from the bus. Will do that. The only issue is that I need to wait until HA "breaks" :) I have no way of triggering this issue currently. I will make sure to include the above search term in the log once it breaks again.

BTW, any pointers on the other issue? HA not picking up new entities with "reload KNX"? It's a super pain to restart HA every time I add a switch :)

@farmio
Copy link
Contributor

farmio commented Mar 17, 2021

No, the reload issue is a known one, but can't be reproduced by any developer till now (same for your main issue unfortunately). See #45129

You could add xknx.state_updater:debug bit this is just a wild guess...

How long does it take until it breaks after a restart?

@0lek
Copy link
Author

0lek commented Mar 17, 2021

Hey. OK, pitty on the reload. It's one of those stupid errors where it doesn't work, but doesn't throw any error message or anything... the worst to debug :)

It just now broke. So I guess as the last was 2d ago that's more or less the time. Seems more-or-less consistent with my gut feeling ("a couple times a week").

Attached tail -n 10k lines. There's a sending command there as well. I did not yet add the state_update to logs as I didn't see the comment before it broke.
20210317_ha.log

Oh and BTW just before writing this post ~8-10 mins maybe, I tried a couple of times to toggle a light via ETS. Very concretely, this one:

- name: "knx_light_P5_sufit" address: "1/1/25" state_address: "1/2/25"

HA did not update the state. However, the timeframe of me switching it manually via ETS is covered in the logs attached.

Thanks!
Aleksander

@farmio
Copy link
Contributor

farmio commented Mar 17, 2021

2021-03-17 13:22:44 DEBUG (MainThread) [xknx.telegram] <Telegram direction="Incoming" source_address="1.2.203" destination_address="1.2.61" payload="<DeviceDescriptorRead descriptor="0" />" />

so this is the last xknx.telegram log. Seems like something is sending a DeviceDescriptorRead telegram and this is causing the telegram queue to stall.
I think ETS line scans use these, but as far as I remember I tested these... will have a look when there is some time left.
Maybe you can try to enforce the bug from ETS diagnostics - I'm not sure if different tunneling servers handle such in the same way so it may cause a bug in your installation while mine works fine...

What devices are 1.2.203 (the source address) and 1.2.61 (destination address)? I think your tunnel endpoint is 1.2.202 so this is not even addressed to xknx.

@0lek
Copy link
Author

0lek commented Mar 18, 2021

Hey @farmio. So The above mentioned log is 13:22:44. At this exact timing I was programming the device in question (1.2.61, a presence detector) from ETS. 1.2.203 is one of the tunnels of the Timberwolf Server. The TWS has 25 tunnels, so it has 1.2.200-1.2.225
ets

I just did the test: HA works. I program another device using ETS. HA stops working. Attached another 10k lines of log. Programming around 1:00 am. Not sure if you need (s I see that you have a pull request), but just in case.

Hope that fixes it. Thanks and good night :)
Uploading 20210318_ha.log…

@farmio
Copy link
Contributor

farmio commented Mar 18, 2021

I think its fixed once the PR gets merged and HA gets a dependency update. So I guess next release. Thank you for pointing out this bug and providing all these logs!

@cyberjunky
Copy link
Contributor

I love it when people don't give up and push to get something fixed, it seems the number of HA KNX users is increasing too, thanks all!

@0lek
Copy link
Author

0lek commented Mar 18, 2021

Well you know the ultimate motivator was WAF... :D Telling her "Oh if it doesn't turn off the light you just need to press the restart button, wait 2 minutes until it comes up again and then turn it off" wasn't really the best option :D

Thanks all!

@farmio farmio mentioned this issue Mar 26, 2021
21 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Apr 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants