Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown fsserver on fs terminate #176

Merged
merged 5 commits into from
Oct 20, 2022

Conversation

dehorsley
Copy link
Member

This commit (re)adds the feature that fsserver gracefully shuts down when the child fs process terminates. During the shutdown, all child FS windows will be killed.

I've done some basic testing, and it should do what it says on the tin, but I haven't fully thought through the consequences.

@wehimwich
Copy link
Member

A surgical touch. Thanks. Preliminary testing looks good. I am inclined to merge it in for 10.1.0-beta1.

For those not familiar, the server continuing to run after FS termination would have useful application for remote clients and a webpage interface. For now, it creates three situations where it is necessary to stop the server manually when restarting the FS:

  • If antcn, or another local program opens an X11 application, say for example, for a dialog box to let the operator select the antenna, and later an operator on a different display wants to restart the FS.
  • To update the environment variables used by the FS.
  • To change the user that is running the FS.

These are pretty minor issues. The last two also increase the verbiage needed to explain updating the FS. They will be well worth it when we have a remote or webpage interface. This change should be easy to revert at that time.

A minor benefit of this change is that the server error log doesn't stay open beyond FS terminations.

I added the (trivial) commit to this branch partly to get practice with that. The commits should be squashed when merged.

@wehimwich
Copy link
Member

Some testing shows that this PR seems to work as expected for both attached and detached client windows.

As @dehorsley suggested the possibility of elsewhere, there is an issue with server run windows. In particular, when the server is shutdown, as in this PR, it will cause any still running autoftp windows to terminate as well. When the server is not shutdown, those windows will continue running, hidden in the background. That at least gives the autoftp the chance to finish successfully. That is perhaps also a weakness. For example, seeing no activity, the operator might reboot before they finish.

If we implement the current PR, the terminate command could be modified to require an override, maybe terminate=autoftp_ok if an autoftp is active. This would help avoid inadvertently losing autoftp processes and keep the autoftp windows visible.

Any other server run windows, particularly fs.prompt, will also be terminated when the server terminates. We could also have an override for fs.prompt. Since it is only used to interactively sequence operator actions, and rarely these days, it may not be so important to protect it if the FS is terminated. An override for it could always be added later if it seems warranted.

@wehimwich
Copy link
Member

wehimwich commented Sep 28, 2022

I finally got back to this. I didn't want to use ps; the error handling seemed like it might be cumbersome. I found some example code that returns a process id for a given program. I fixed that up a little bit, mostly to modify the error handling so it would integrate into the FS. This was a learning experience for reading a directory, the contents of /proc, and the %[ conversion in scanf.

@wehimwich wehimwich mentioned this pull request Sep 29, 2022
@wehimwich wehimwich force-pushed the deh/fsserver/shutdown-on-terminate branch from f4ea599 to 8650552 Compare October 20, 2022 17:54
dehorsley and others added 5 commits October 20, 2022 18:03
@wehimwich wehimwich force-pushed the deh/fsserver/shutdown-on-terminate branch from 8650552 to c7735f5 Compare October 20, 2022 18:04
@wehimwich wehimwich merged commit 02b3fb8 into nvi-inc:main Oct 20, 2022
@wehimwich
Copy link
Member

A small downside of this change is that if the FS is terminated and restarted in quick succession, there may be a socket conflict (while the old server instance cleans up) that prevents the restart. This can be handled by waiting a moment and trying to restart again, but maybe we can find a programatic solution.

@wehimwich
Copy link
Member

The error that occurs when restarting fails is:

fsclient.c:436 (fetch_state) error unable to connect to server: Connection refused

This may be occurring in an interval with an ambiguous state: when the server is shutting down, but hasn't fully exited.

A brute force solution to this is to remember when the FS exited. Then on restart, sleep until two seconds (that seems sufficient) have passed since exiting. Of course, no sleep is needed if two seconds have already passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants