Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTBR with SkyConnect non-functional with error:InvalidArgs, quits unexpectedly #3447

Closed
therealmrfox opened this issue Feb 4, 2024 · 12 comments

Comments

@therealmrfox
Copy link

Describe the issue you are experiencing

OTBR configured with SkyConnect in Thread-only mode never gets into a functional state.

Installed OTBR add-on, Thread and OTBR integrations were added to HA, and homeassistant.local shows up as the preferred (and only) network in the Thread integration. When attempting to commission a device (Android companion app) the process fails with “Your device requires a Thread border router”.

I suspect this is because the OTBR add-on fails to properly start. The log is full of errors such as

otbr-agent[187]: 00:00:33.158 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:33.163 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:fc00 failed 

The add-on (perhaps not directly related to the first issue?) also after a few hours quits with the following error:

otbr-agent[187]: 06:47:36.251 [W] P-RadioSpinel-: radio tx timeout
otbr-agent[187]: 06:47:36.294 [C] Platform------: HandleRcpTimeout() at radio_spinel.cpp:2076: RadioSpinelNoResponse
[03:52:00] WARNING: otbr-agent exited with code 6 (by signal 0).

Both issues are consistently reproducible. I started with a clean slate (SkyConnect fresh out of the box, never had zigbee or multiprotocol installed). Rebooted a gogo, Deleted and reinstalled OTBR add-on, thread integration, and OTBR integration several times with the same outcome.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

OpenThread Border Router

What is the version of the add-on?

2.4.5

Steps to reproduce the issue

  1. Connect a SkyConnect to HA
  2. Ignore the autodiscovered device and don't install ZHA
  3. Install OTBR add-on, and select SkyConnect from 1. as its device
  4. Add-on starts with errors above.
  5. Install Matter add-on.
  6. Attempt to commission a Thread device with the Android app (stock Android on Pixel 7, latest companion app). Commissioning Matter devices fails with "Your device requires a Thread border router".
  7. Add-on dies with error message about radio timeout above after O(hours)

System Health information

System Information

version core-2024.1.6
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.63-haos-raspi
arch aarch64
timezone America/Los_Angeles
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.34.0
Stage running
Available Repositories 1392
Downloaded Repositories 5
HACS Data ok
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 11.4
update_channel stable
supervisor_version supervisor-2023.12.1
agent_version 1.6.0
docker_version 24.0.7
disk_total 56.3 GB
disk_used 11.8 GB
healthy true
supported true
board rpi3-64
supervisor_api ok
version_api ok
installed_addons ESPHome (2023.12.9), Z-Wave JS (0.4.3), Advanced SSH & Web Terminal (17.0.5), Duck DNS (1.15.0), Samba share (12.2.0), File editor (5.7.0), OpenThread Border Router (2.4.5), Matter Server (5.0.4)
Dashboards
dashboards 2
resources 5
views 7
mode storage
Recorder
oldest_recorder_run January 27, 2024 at 01:42
current_recorder_run February 2, 2024 at 16:53
estimated_db_size 113.04 MiB
database_engine sqlite
database_version 3.41.2

Anything in the Supervisor logs that might be useful for us?

Nothing useful I can find.

Anything in the add-on logs that might be useful for us?

[21:04:22] INFO: Setup OTBR firewall...
[21:04:23] INFO: Starting otbr-agent...
otbr-agent[187]: [NOTE]-AGENT---: Running 0.3.0-13d583e-dirty
otbr-agent[187]: [NOTE]-AGENT---: Thread version: 1.3.0
otbr-agent[187]: [NOTE]-AGENT---: Thread interface: wpan0
otbr-agent[187]: [NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///dev/ttyUSB0?uart-baudrate=460800&uart-flow-control
otbr-agent[187]: [NOTE]-AGENT---: Radio URL: trel://eth0
otbr-agent[187]: [NOTE]-ILS-----: Infra link selected: eth0
otbr-agent[187]: 00:00:00.217 [N] RoutingManager: BR ULA prefix: fdae:7e36:9ab9::/48 (loaded)
otbr-agent[187]: 00:00:00.218 [N] RoutingManager: Local on-link prefix: fd1a:25ec:6888:2945::/64
otbr-agent[187]: 00:00:00.435 [N] Mle-----------: Role disabled -> detached
otbr-agent[187]: 00:00:00.578 [N] Platform------: [netif] Changing interface state to up.
s6-rc: info: service otbr-agent successfully started
s6-rc: info: service otbr-agent-rest-discovery: starting
s6-rc: info: service otbr-agent-configure: starting
otbr-agent[187]: 00:00:00.653 [W] Platform------: [netif] Failed to process request#2: No such process
otbr-agent[187]: 00:00:00.655 [W] Platform------: [netif] ADD [U] fe80:0:0:0:c85d:60d3:78db:8513 failed (InvalidArgs)
otbr-agent[187]: 00:00:00.655 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:00.657 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:206b:39bb:cca:f9 failed (InvalidArgs)
otbr-agent[187]: 00:00:00.657 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:00.658 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:800 failed (InvalidArgs)
otbr-agent[187]: 00:00:00.658 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:00.659 [W] Platform------: [netif] Failed to process request#6: No such process
s6-rc: info: service otbr-web: starting
s6-rc: info: service otbr-web successfully started
[21:04:24] INFO: Starting otbr-web...
otbr-web[275]: [INFO]-WEB-----: Running 0.3.0-13d583e-dirty
otbr-web[275]: [INFO]-WEB-----: Border router web started on wpan0
Done
s6-rc: info: service otbr-agent-configure successfully started
[21:04:29] INFO: Successfully sent discovery information to Home Assistant.
s6-rc: info: service otbr-agent-rest-discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
otbr-agent[187]: 00:00:26.155 [N] Mle-----------: RLOC16 0800 -> fffe
otbr-agent[187]: 00:00:26.368 [N] Mle-----------: Attach attempt 1, AnyPartition reattaching with Active Dataset
otbr-agent[187]: 00:00:32.881 [N] RouterTable---: Allocate router id 2
otbr-agent[187]: 00:00:32.882 [N] Mle-----------: RLOC16 fffe -> 0800
otbr-agent[187]: 00:00:32.887 [N] Mle-----------: Role detached -> leader
otbr-agent[187]: 00:00:32.889 [N] Mle-----------: Partition ID 0xc6ae259
otbr-agent[187]: 00:00:33.619 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:800 failed (InvalidArgs)
otbr-agent[187]: 00:00:33.620 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:33.624 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:fc00 failed (InvalidArgs)
otbr-agent[187]: 00:00:33.625 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary!
otbr-agent[187]: 00:00:33.996 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:fc38 failed (InvalidArgs)
otbr-agent[187]: 00:00:33.997 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:34.013 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:fc10 failed (InvalidArgs)
otbr-agent[187]: 00:00:34.013 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:35.217 [W] Platform------: [netif] ADD [U] fdae:7e36:9ab9:1:f208:64f9:ff80:8b5e failed (InvalidArgs)
otbr-agent[187]: 00:00:35.217 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 00:00:44.975 [W] Platform------: [netif] ADD [U] fd57:9fdc:d898:2a9:0:ff:fe00:fc11 failed (InvalidArgs)
otbr-agent[187]: 00:00:44.975 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[187]: 06:47:36.251 [W] P-RadioSpinel-: radio tx timeout
otbr-agent[187]: 06:47:36.294 [C] Platform------: HandleRcpTimeout() at radio_spinel.cpp:2076: RadioSpinelNoResponse
[03:52:00] WARNING: otbr-agent exited with code 6 (by signal 0).
OTBR_FORWARD_INGRESS  all opt    in * out wpan0  ::/0  -> ::/0  
Chain OTBR_FORWARD_INGRESS (0 references)
target     prot opt source               destination         
DROP       all      anywhere             anywhere             PKTTYPE = unicast
DROP       all      anywhere             anywhere             match-set otbr-ingress-deny-src src
ACCEPT     all      anywhere             anywhere             match-set otbr-ingress-allow-dst dst
DROP       all      anywhere             anywhere             PKTTYPE = unicast
ACCEPT     all      anywhere             anywhere            
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
OTBR_FORWARD_EGRESS  all opt    in wpan0 out *  ::/0  -> ::/0  
Chain OTBR_FORWARD_EGRESS (0 references)
target     prot opt source               destination         
ACCEPT     all      anywhere             anywhere            
[03:52:01] INFO: OTBR firewall teardown completed.
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service otbr-agent-rest-discovery: stopping
s6-rc: info: service otbr-agent-configure: stopping
s6-rc: info: service otbr-web: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service otbr-agent-rest-discovery successfully stopped
s6-rc: info: service otbr-agent-configure successfully stopped
Default: mDNSResponder (Engineering Build) (Jan 29 2024 09:49:17) stopping
otbr-web[275]: [CRIT]-WEB-----: Stopping web server
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
listenAddr not specified, using default ::
[03:52:01] INFO: mDNS ended with exit code 4 (signal 0)...
[03:52:01] INFO: otbr-web ended with exit code 0 (signal 0)...
s6-rc: info: service mdns successfully stopped
s6-rc: info: service otbr-web successfully stopped
s6-rc: info: service otbr-agent: stopping
s6-rc: info: service otbr-agent successfully stopped
s6-rc: info: service universal-silabs-flasher: stopping
s6-rc: info: service universal-silabs-flasher successfully stopped
s6-rc: info: service banner: stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped

Additional information

HAOS is running on an RPi 3. IPv6 is enabled (SLAAC) and working fine.

Similar issues:

Related forum discussions:

@marbon87
Copy link

marbon87 commented Feb 5, 2024

Same here.

@Oeroeboeroe15
Copy link

Same issue here, same errors, and not able to commission a Eve Smart Plug with Thread support.

@Jordenjj
Copy link

Jordenjj commented Feb 6, 2024

I've found anything above 2.4.3 drops the whole system. I'd downgrade for now.

@marbon87
Copy link

marbon87 commented Feb 6, 2024

How can I downgrade the addon to a specific version?

@Callum6677
Copy link

Can reproduce still:
Add-on version: 2.4.5
You are running the latest version of this add-on.
System: Home Assistant OS 11.5 (aarch64 / yellow)
Home Assistant Core: 2024.2.1
Home Assistant Supervisor: 2024.01.1

otbr-agent[181]: 00:00:07.822 [N] RouterTable---: Allocate router id 19
otbr-agent[181]: 00:00:07.823 [N] Mle-----------: RLOC16 fffe -> 4c00
otbr-agent[181]: 00:00:07.825 [N] Mle-----------: Role detached -> leader
otbr-agent[181]: 00:00:07.825 [N] Mle-----------: Partition ID 0x1fe38cda
otbr-agent[181]: 00:00:07.911 [W] Platform------: [netif] ADD [U] fdd7:62c5:da3c:e937:0:ff:fe00:4c00 failed (InvalidArgs)
otbr-agent[181]: 00:00:07.912 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[181]: 00:00:07.914 [W] Platform------: [netif] ADD [U] fdd7:62c5:da3c:e937:0:ff:fe00:fc00 failed (InvalidArgs)
otbr-agent[181]: 00:00:07.915 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[181]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary!
otbr-agent[181]: 00:00:08.183 [W] Platform------: [netif] ADD [U] fdd7:62c5:da3c:e937:0:ff:fe00:fc38 failed (InvalidArgs)
otbr-agent[181]: 00:00:08.184 [W] Platform------: [netif] Failed to process event, error:InvalidArgs
otbr-agent[181]: 00:00:08.189 [W] Platform------: [netif] ADD [U] fdd7:62c5:da3c:e937:0:ff:fe00:fc10 failed (InvalidArgs)
otbr-agent[181]: 00:00:08.189 [W] Platform------: [netif] Failed to process event, error:InvalidArgs

@agners
Copy link
Member

agners commented Feb 16, 2024

The InvalidArgs errors are not per se fatal. I have those too in working environments:

otbr-agent[188]: 00:00:00.162 [W] Platform------: [netif] Failed to process event, error:InvalidArgs

However, the radio tx timeout is though:

otbr-agent[187]: 06:47:36.251 [W] P-RadioSpinel-: radio tx timeout
otbr-agent[187]: 06:47:36.294 [C] Platform------: HandleRcpTimeout() at radio_spinel.cpp:2076: RadioSpinelNoResponse
[03:52:00] WARNING: otbr-agent exited with code 6 (by signal 0).

@therealmrfox thee latest OTBR add-on 2.4.6 should properly shutdown. If you enable the watchdog functionality, this will restart the add-on automatically. Now of course this is more of a work around, but might help in your case.

Do the radio tx timeout messages align with commissioning attempts?

When attempting to commission a device (Android companion app) the process fails with “Your device requires a Thread border router”.

You probably have to sync the Thread credentials first. See the our integration documentation of Turning Home Assistant into a Thread border router (specifically Case 1: Making Home Assistant your first Thread network).

@agners
Copy link
Member

agners commented Feb 16, 2024

@marbon87 only through backups. Normally when upgrading a partial backup is being made. You can restore that particular backup.

@Jordenjj do you have the same radio tx timeout errors?

@Callum6677 the InvalidArgs are not necessarily fatal. What problems do you have in the end?

@rolf-tx
Copy link

rolf-tx commented Feb 16, 2024

@agners
Hi Stefan,
The instructions for syncing credentials in "case 1" above do not seem to work/apply to iOS.
"In the Companion app, go to Settings > Companion app > Troubleshooting, then select Sync Thread credentials.'
I can't find a Troubleshooting entry under Companion app.

I tried iOS app versions 2024.1 (2024.535) and 2024.2 (2024.545).

@agners
Copy link
Member

agners commented Feb 16, 2024

@rolf-tx transfering the Thread credentials into iOS is not yet supported, but should soon follow. Currently you can only join an existing Thread network, e.g. when you have an Apple Thread Border router already.

@therealmrfox
Copy link
Author

You probably have to sync the Thread credentials first.

Well do I feel like an idiot. Sure enough - this was my problem. I completely missed this step. Happy to report that I can now commission Matter devices!

It does seem odd that these InvalidArgs errors are expected, though 🤷

Do the radio tx timeout messages align with commissioning attempts?

No, it happens randomly after a couple of hours. Haven't observed it with 2.4.6 yet, will report here if I do.

enable the watchdog functionality, this will restart the add-on automatically.

Restarting OTBR seems to make some - though strangely, not all - Matter devices unresponsive without also restarting the Matter server. In particular an Eve motion sensor becomes inoperable after a OTBR restart; Aqara door sensors don't mind. Could probably set up some automation to restart Matter as a band aid.

Thank you @agners for looking into this - very much appreciated!

@Callum6677
Copy link

@Callum6677 the InvalidArgs are not necessarily fatal. What problems do you have in the end?
A wall of InvalidArgs being spammed for approximately 8 hours, once every 10-30 seconds. At the 8 hour mark, the integration crashing. Any attempt to pair a matter device (having synced credentials already) returned that there wasnt a thread border router.
Multiple uninstall and reinstalls later and somehow its working, but I have had the border router crash twice now, first time I blindly clicked to start the service again without checking logs, second time, the log just ended, no error, nothing, just end of the log and no service running.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 19, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants