[x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
[x] I have searched the existing issues and I'm convinced that mine is new.
[x] The title contains the plugin to which this issue belongs
Description
I'm running half a dozen virtual OPNSense Gateways (21.7.4) under oVirt (4.3.10) using QEMU-Guest-Agent.
After roughly 60 days without reboot these gateways stall, the Web-GUI as well as the console show "cannot fork" errors and it is impossible to login via console or ssh.
To find the culprit I rebooted an affected OPNSense gateway from oVirt using qemu-ga and did a
ssh root@mygateway ps uxawj
during reboot to find the culprit.
There were several thousand zombie-processes starting at various dates with PPID 8995
...
root 182 0.0 0.0 0 0 - Z Fri05 0:00.01 <defunct> 8995 8995 8995 0
root 192 0.0 0.0 0 0 - Z 4Nov21 0:00.00 <defunct> 8995 8995 8995 0
root 215 0.0 0.0 0 0 - Z Fri04 0:00.00 <defunct> 8995 8995 8995 0
root 249 0.0 0.0 0 0 - Z 13Nov21 0:00.00 <defunct> 8995 8995 8995 0
root 253 0.0 0.0 0 0 - Z 3Nov21 0:00.00 <defunct> 8995 8995 8995 0
...
Luckily the process with PID 8995 was still alive
root 8995 0.0 0.4 17236 4052 - Ss 29Oct21 2:48.09 /usr/local/bin/q 1 8995 8995 0
The output is truncated, but on this OPNSense the only binary beginning with "q" in /usr/local/bin is qemu-ga.
I think, the problem is described for FreeBSD
aborche/qemu-guest-agent#17
and there already seems to be a fix for it
aborche/qemu-guest-agent@71edc56
As far as I can see this fix hasn't made it into FreeBSD and therefore in OPNSense yet.
So from OPNSense perspective this seems to be an upstream issue.
I was able to fix the problem for me by disabling the 'guest-get-fsinfo' RPC call in OPNSense configuration.
Maybe it should be disabled by default.
To Reproduce
See description
Additional context
oVirt 4.3.10
Environment
OPNsense 21.7.4 (amd64, OpenSSL).
os-qemu-guest-agent 1.1
[x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
[x] I have searched the existing issues and I'm convinced that mine is new.
[x] The title contains the plugin to which this issue belongs
Description
I'm running half a dozen virtual OPNSense Gateways (21.7.4) under oVirt (4.3.10) using QEMU-Guest-Agent.
After roughly 60 days without reboot these gateways stall, the Web-GUI as well as the console show "cannot fork" errors and it is impossible to login via console or ssh.
To find the culprit I rebooted an affected OPNSense gateway from oVirt using qemu-ga and did a
during reboot to find the culprit.
There were several thousand zombie-processes starting at various dates with PPID 8995
Luckily the process with PID 8995 was still alive
The output is truncated, but on this OPNSense the only binary beginning with "q" in /usr/local/bin is qemu-ga.
I think, the problem is described for FreeBSD
aborche/qemu-guest-agent#17
and there already seems to be a fix for it
aborche/qemu-guest-agent@71edc56
As far as I can see this fix hasn't made it into FreeBSD and therefore in OPNSense yet.
So from OPNSense perspective this seems to be an upstream issue.
I was able to fix the problem for me by disabling the 'guest-get-fsinfo' RPC call in OPNSense configuration.
Maybe it should be disabled by default.
To Reproduce
See description
Additional context
oVirt 4.3.10
Environment
OPNsense 21.7.4 (amd64, OpenSSL).
os-qemu-guest-agent 1.1