zedkube/edgeview: fix stale vmiVNC.run blocking new VNC sessions#5875
zedkube/edgeview: fix stale vmiVNC.run blocking new VNC sessions#5875milan-zededa merged 1 commit intolf-edge:masterfrom
Conversation
b49eec0 to
b32e0c1
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5875 +/- ##
==========================================
+ Coverage 21.62% 22.05% +0.43%
==========================================
Files 464 475 +11
Lines 83994 85714 +1720
==========================================
+ Hits 18165 18907 +742
- Misses 64303 65099 +796
- Partials 1526 1708 +182 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b32e0c1 to
a2833f4
Compare
| logmsg "monitor_caller_pid: Caller PID $caller_pid is gone (crashed?), cleaning up" | ||
|
|
||
| # Re-check CallerPID in the file before acting, in case a new edgeview | ||
| # process with a recycled PID already wrote a new session file. |
There was a problem hiding this comment.
Is this really for "recycled PID" if we are checking for changed CallerPID?
There was a problem hiding this comment.
updated comments.
| │ ┌─────────┐ AppInstanceConfig ┌──────────┐ │ | ||
| │ │zedagent │─────────────────────►│ zedkube │ │ | ||
| │ └─────────┘ (RemoteConsole) └────┬─────┘ │ | ||
| │ │ runAppVNC() │ |
There was a problem hiding this comment.
Add space to fix the arrow
| │ │edgeview │◄────────────────────│ edgeview │ │ | ||
| │ │ client │ │ (server) │ │ | ||
| │ └──────────┘ └────┬─────┘ │ | ||
| │ │ setupEveKVNC() │ |
There was a problem hiding this comment.
Is edgeview used also for Controller-triggered VNC access? Based on this diagram it seems that zedkube talks to edgeview to setup VNC.
There was a problem hiding this comment.
redo the drawing to reflect the schemes properly.
|
|
||
| Rules: | ||
|
|
||
| - `CallerPID > 0` → edgeview owns this session; liveness via `/proc/<pid>/comm` |
There was a problem hiding this comment.
I believe that in the edgeview-case, the liveness is both PID-check and port-listen check (?)
But the port-listen check is not mentioned for CallerPID > 0
There was a problem hiding this comment.
yes, we actually missing the code in checking the vnc port. updated the code and the doc.
|
|
||
| Both `removeStaleVNCFile` (edgeview) and `canClaimVNCFile` (zedkube) probe | ||
| whether a virtctl proxy is actually running before declaring a remote-console | ||
| file stale. They do this differently: |
There was a problem hiding this comment.
Do they really do it differently? They use the same helper function after all
a2833f4 to
b257790
Compare
79515e0 to
6f0c5c2
Compare
Problem: vmiVNC.run was not being cleaned up when an edgeview instance
crashed or was killed.
Fixes:
- clustertypes.go: Add AppUUID field to VmiVNCConfig to identify file
ownership per-app. Add OwnerAlive() which reads /proc/<pid>/comm and
checks the name is edge-view, guarding against PID reuse by
unrelated processes.
- zedkube/vnc.go: Replace the blind os.Stat block with
canClaimVNCFile(). It evicts stale files (dead edgeview PID, reused
PID, or idle virtctl port) and blocks only on a genuinely live
session — a running edgeview process or a virtctl proxy still
listening on the port.
- edgeview/network.go: Replace the blind os.Stat block with
removeStaleVNCFile(), the edgeview-side equivalent. Also call it at
edgeview startup (runOnServer) with evictIdle=false to clear files
left from a previous crash without disturbing a live remote-console
session. Rewrite isPortListening to read /proc/net/tcp directly
instead of spawning ss (avoids disrupting the virtctl WebSocket
connection).
- vnc-proxy.sh: Move monitor_caller_pid start to the top of handle_vnc
before the virtctl retry loop
Signed-off-by: naiming-zededa <naiming@zededa.com>
6f0c5c2 to
0750c39
Compare
Description
Problem: vmiVNC.run was not being cleaned up when an edgeview instance crashed or was killed.
Fixes:
PR dependencies
How to test and validate this PR
run the edgeview VNC in multiple times, and verify they are not blocked by the stale config file on the device.
Changelog notes
zedkube/edgeview: fix stale vmiVNC.run blocking new VNC sessions
PR Backports
Checklist
For backport PRs (remove it if it's not a backport):
And the last but not least:
check them.