Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wake word & Satellite audio stream using seemingly random UDP high (> 1024) port #105106

Closed
morremeyer opened this issue Dec 5, 2023 · 20 comments

Comments

@morremeyer
Copy link

morremeyer commented Dec 5, 2023

The problem

The wake word pipeline as described in Year of the Voice - Chapter 4 connects from the satellite to Home Assistant using a seemingly random "high" (= number > 1024) port.

This leads to pipeline timeouts as described in e.g.:

The current workaround for all systems is "open the firewall on all UDP ports for Home Assistant".
This does not apply for Home Assistant OS since it seems to not firewall these ports/open these for that purpose.

However, this is not possible without massively compromising on security on some setups, e.g. the one I run with the docker container on Kubernetes. In that setup, I would need to run Home Assistant in "host mode", meaning that it effectively runs in the network space of the host.

Proposal

To enable the use of wake words in all supported Home Assistant installation methods, the UDP port used to transmit the voice stream should be fixed.

What version of Home Assistant Core has the issue?

core-2023.11.3

What was the last working version of Home Assistant Core?

never worked

What type of installation are you running?

Home Assistant Container

Integration causing the issue

wyoming

Link to integration documentation on our website

https://www.home-assistant.io/integrations/wyoming/

Diagnostics information

Check these tcpdump lines to see that an M5 Atom Echo configured according to the 13$ Voice Assistant tutorial connects on seemingly random UDP high ports.

The M5 Atom Echo is currently configured to not use the Wake Word, but the button.

Pressing the button for the first time leads to the light turning blue (= listening) and the connections in the log on port 60252:

22:03:54.982184 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024

Pressing it again deactivates the microphone. Then pressing a third time leads to it listening again and connections on port 55427:

22:04:04.558757 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024

Full log in the logs section below.

Example YAML snippet

No response

Anything in the logs that might be useful for us?

22:03:54.725859 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.758182 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.789796 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.793407 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 108:118, ack 18, win 5344, length 10
22:03:54.793632 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 118, win 65535, length 0
22:03:54.822011 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.854055 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.885818 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.919433 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.949970 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:54.950083 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 60252 unreachable, length 556
22:03:54.982184 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.013683 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.045711 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.079668 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.109903 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.141867 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.160575 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 118:126, ack 18, win 5344, length 8
22:03:55.160788 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 126, win 65535, length 0
22:03:55.174065 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.205864 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024
22:03:55.224046 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 126:131, ack 18, win 5344, length 5
22:03:55.224227 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 131, win 65535, length 0
22:03:55.235352 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 131:177, ack 18, win 5344, length 46
22:03:55.235770 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 177, win 65535, length 0
22:04:03.797933 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 177:187, ack 18, win 5344, length 10
22:04:03.798144 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 187, win 65535, length 0
22:04:04.145943 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 187:195, ack 18, win 5344, length 8
22:04:04.146150 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 195, win 65535, length 0
22:04:04.207595 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 195:213, ack 18, win 5344, length 18
22:04:04.207780 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 213, win 65535, length 0
22:04:04.213074 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 18:25, ack 213, win 65535, length 7
22:04:04.213907 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 25:30, ack 213, win 65535, length 5
22:04:04.214685 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 30:35, ack 213, win 65535, length 5
22:04:04.215479 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], ack 30, win 5332, length 0
22:04:04.279902 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.279972 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.298488 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 213:267, ack 35, win 5327, length 54
22:04:04.298641 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 267, win 65535, length 0
22:04:04.307496 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.307553 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.334908 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.335048 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.366975 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.367143 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.401671 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.401794 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.430792 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.430930 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556
22:04:04.462812 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.495515 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.528875 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.558757 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.590977 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.622744 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024
22:04:04.654825 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024

Additional information

I am seriously impressed with all the progress voice integration in Home Assistant is making. It's amazing to see it coming along and I'm looking forward to the future with this.

I decided to open this issue to get a discussion started rather sooner than later.
If I missed another open issue or documentation for this - I couldn't find any - please let me know.

@home-assistant
Copy link

home-assistant bot commented Dec 5, 2023

Hey there @balloob, @synesthesiam, mind taking a look at this issue as it has been labeled with an integration (wyoming) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of wyoming can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign wyoming Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


wyoming documentation
wyoming source
(message by IssueLinks)

@danieljkemp
Copy link
Contributor

danieljkemp commented Dec 19, 2023

Seems like I ran face first into this as well

This will still likely break for some use cases without specifying the listen IP address as well, since the UDP connection to HA won't be an established connection so docker networking/nat/etc can all break it

@koriwi
Copy link

koriwi commented Dec 21, 2023

me too. i refuse to run containers as root and/or in host mode so it's impossible for me right now to establish the needed udp connection

Edit:
Some info about failed workarounds:
i also tried just forwarding a huge range of ports, but that is also not possible in practice as the host can run out of memory if using userland proxy. If using iptables, it will create an iptable entry for every port in the range instead of providing iptables with a range itself. This is a known and old docker problem. Containers will take minutes to hours to start with these changes.

So i think i a fixed port/small port-range would be awesome

Probably relevant: #95654
Was a bit disappointed by the response

@dmakovec
Copy link

Configurable port ranges would be fine. But it's poor practice to open the network (or Docker config) for "all ports > 1024”.

@morremeyer
Copy link
Author

Folks, I know this is annoying for some or most of us, but please don't comment on this just to add that you're affected, too.

That does provide any additional input towards a solution.

Give the issue a thumbs up instead. That helps visibility more and keeps it easy to read for everyone.

Thanks!

@koriwi
Copy link

koriwi commented Dec 23, 2023

I tried to monkey fix it myself. Maybe you guys can give it a try: https://gist.github.com/koriwi/617cf90107c7d413b53ca2c2c6fdf1e6

@ikelner
Copy link

ikelner commented Dec 29, 2023

@koriwi great idea. I did a very similar local patch and it works great for me now.
It's unfortunate that every satellite needs its own port in HA for a unique connection, meaning that this little kludge only works for a fixed number of satellites (32 in your case). This feels like a consequence of the implementation of input audio streams in home assistant, so until that's revamped (and associated ESPHome libraries) this limitation may be with us for a while.
Still, would be nice to set the port range in a config file or UI somewhere - seems fairly easy given the patch.

@seang96
Copy link

seang96 commented Jan 2, 2024

I tried to monkey fix it myself. Maybe you guys can give it a try: https://gist.github.com/koriwi/617cf90107c7d413b53ca2c2c6fdf1e6

I tried this in kurbenetes and was unable to get it working. Running a k3s cluster at home, nodeport service and tried using 3100-3107 with those 8 ports configured from your code. I tested the ports opening when a request goes through that makes the port open and the port does open, but it ends up timing out on whisper.

I setup the ports to open under home assistant container, I assume that would be correct since the being modified is under home assistant, or should it be under whisper?

@koriwi
Copy link

koriwi commented Jan 2, 2024

Yeah. That is correct.

But do you get any other log messages? Like audio empty or something? I needed to switch the mirophone channel(in esphome) because it was outputting on the wrong channel

@seang96
Copy link

seang96 commented Jan 2, 2024

I am not using wake work and making the esphome device requiring holding the button down to activate for testing.
I tell it a command and it flashes blue indefinitely. When I hold it long enough it will cancel and I get the following:

Home Assistant:
Voice assistant UDP server was not stopped

Whisper:

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-8' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=ValueError("can't extend empty axis 0 using modes other than 'constant' or 'empty'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 69, in handle_event
    segments, _info = self.model.transcribe(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 124, in transcribe
    features = self.feature_extractor(audio)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 152, in __call__
    frames = self.fram_wave(waveform)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 98, in fram_wave
    frame = np.pad(frame, pad_width=padd_width, mode="reflect")
  File "/usr/local/lib/python3.9/dist-packages/numpy/lib/arraypad.py", line 819, in pad
    raise ValueError(
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

under debug assistant I get

stage: stt
run:
  pipeline: 01h2ma66dys5b586jj7vfh10sx
  language: en
events:
  - type: run-start
    data:
      pipeline: 01h2ma66dys5b586jj7vfh10sx
      language: en
    timestamp: "2024-01-02T18:38:28.970504+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: en
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-01-02T18:38:28.970646+00:00"
stt:
  engine: stt.faster_whisper
  metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false

@seang96
Copy link

seang96 commented Jan 14, 2024

@koriwi, how did you change the channel and what channel did you set it to? Looks like the board version for my device is v1.1 and G23 shown here http://docs.m5stack.com/en/atom/atomecho appears to be what the official firmware uses.

@roobre
Copy link

roobre commented Jan 20, 2024

I wonder if anyone has tried or considered making hass send a first UDP packet to punch a hole in the NAT gatewa/firewall, as an alternative to restrict the port range. The way this could work would be as follows:

  • Hass allocates opens a UDP socket on a "random" (os-allocated) port exactly as it is doing
  • Before telling the esphome device to send anything there, hass sends a "ping" or "noop" UDP datagram to the esphome device. The port in the device is, as far as I can tell, fixed so this is not a problem.
  • The rest of the flow keeps working exactly as it is working right now

The advantage of doing this is that firewalls and NAT gateways will see Homeassistant as the client, or the peer who initiated the connection, and thus create a path for the packets data the esphome device sends to flow back to hass. This would not require any configuration or privileged access, and it should work behind firewalls and NAT gateways, including Docker and Kubernetes.

I'm not very strong at Python but I think it is simple enough for someone to give it a shot. WDYT?

@koriwi
Copy link

koriwi commented Jan 22, 2024

@koriwi, how did you change the channel and what channel did you set it to? Looks like the board version for my device is v1.1 and G23 shown here http://docs.m5stack.com/en/atom/atomecho appears to be what the official firmware uses.

I'm on my phone right now, but you will find it in the i2c microphone docu of esphome

Also I'm not using an m5 echo

@seang96
Copy link

seang96 commented Feb 11, 2024

I managed to get this working, but on the latest docker stable tag, it looks like the file had some heavy modifications and tts stopped working. I tried to reapply the port range modification, and unfortunately it still errors, though it does play tts for a split second.

Stable version the image is currently pulling from

Updated code gist here

2024-02-11 16:44:40.663 WARNING (MainThread) [homeassistant.components.esphome.manager] Voice assistant UDP server was not stopped
2024-02-11 16:44:40.690 WARNING (MainThread) [homeassistant.components.esphome.voice_assistant] Sending 1024 of chunk to ('10.42.2.1', 27703)
2024-02-11 16:44:40.690 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1190, in sendto
    self._sock.sendto(data, addr)
    ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'sendto'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/esphome/voice_assistant.py", line 333, in _send_tts
    self.transport.sendto(chunk, self.remote_addr)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1200, in sendto
    self._fatal_error(
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 867, in _fatal_error
    self._loop.call_exception_handler({
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'call_exception_handler'

@lerra
Copy link

lerra commented Apr 14, 2024

I am facing the same problem, would be very good to be able to configure a range of ports to be able to support segmented and firewalled networks.

@danieljkemp
Copy link
Contributor

Looks like this is in the works: esphome/esphome#6471

@lerra
Copy link

lerra commented Apr 14, 2024

Looks very promising, thanks mate

@morremeyer
Copy link
Author

ESPHome 2024.4.0 has been released.

The changelog notes:

Beginning with Home Assistant 2024.5, both sides will automatically recognise that they both support API Audio and will use that route instead.

I will verify this when Home Assistant 2024.5.0 is released and will update everyone here.

@Nardol
Copy link
Contributor

Nardol commented May 4, 2024

In the case it could help, in my case it looks UDP port is still used even if I uploaded a firmware with ESPHome 2024.4.2 and Home Assistant 2024.5.1.

2024-05-03 18:27:01.590 WARNING (MainThread) [homeassistant.components.esphome.manager] Voice assistant UDP server was not stopped

I don't really understand why I still have message about UDP port if both side recognize native API can be used instead of UDP or I probably have missed something.

@balloob
Copy link
Member

balloob commented May 4, 2024

Closing this issue as it works now with latest HA and ESPHome version. If you still encounter issues, please open a new issue and attach logs.

@balloob balloob closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests