Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add streamlog #64

Merged
merged 9 commits into from
Oct 30, 2022
Merged

add streamlog #64

merged 9 commits into from
Oct 30, 2022

Conversation

dehorsley
Copy link
Member

@dehorsley dehorsley commented Aug 26, 2020

This adds a utility to stream the log (and follow it as the output changes). It uses the FS server log stream if the server is enabled, otherwise falls back to tail -fing the active log while polling lognm.

It has the advantage over a simple tail -f /usr2/log/$(lognm).log that it will continue to work after the FS switches to a different log. While with the FS server based method streamlog shouldn't drop messages, this isn't guaranteed for the fallback method when switching to a log that has already been written to (eg "station.log").

The server method could work remotely as ssub should be compilable on any modern UNIX based OS, even without the FS, and should be fairly reliable on poor networks. Otherwise, ssh pcfs streamlog should be fine on a reliable network.

streamlog/streamlog Outdated Show resolved Hide resolved
streamlog/streamlog Outdated Show resolved Hide resolved
@wehimwich
Copy link
Member

wehimwich commented Aug 28, 2020

A tour de force. The handling of non-display servers (for those who haven't switched) is very considerate. I learned a lot just reading the code for that. Thanks and thanks.

I made a couple comments on the code, but also:

  1. Without the display server running sometimes the log header is lost on a log change. I guess those are the breaks. It seems fine with the display server.

  2. If the display server isn't in use, streamlog waits forever for the FS to start and waits forever after it ends. With the display server it ends right away. I guess I prefer the display server behavior. Is there any reason to not make it consistent? I think the display server behavior would be safe even if run from stpgm.ctl. I can noodle around to find the right if test for that, but maybe you know. Maybe just break the first half of the || to a separate if clause that calls exit. It is a nice catch that you wait if the file doesn't exist (presumably hasn't been created yet). (see later update, I think this my comment was wrong)

  3. Any reason to not add it to the main Makefile?

if ! fsserver status > /dev/null; then echo fsserver not running; exit 1 ; fi
if ! fsserver fs status > /dev/null; then echo fs not running; exit 1; fi
FS_SERVER_URL_BASE=$(awk '/FS_SERVER_URL_BASE/ {gsub(/"/, "", $3); print $3}' /usr2/fs/include/params.h)
exec ssub "$FS_SERVER_URL_BASE/log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pre-beta1, replace this line with:

exec ssub -s -w ipc:///var/run/fsserver/windows/fs/{pub,rep}

Copy link
Member Author

@dehorsley dehorsley Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. Any compelling reason to backport it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to backport it, but there is a station with a need. They can use a local copy with the change until they can update. I thought it might be good to record the change needed. Maybe the update notice would be a better place.

streamlog/streamlog Outdated Show resolved Hide resolved
@wehimwich
Copy link
Member

I withdraw my point 2. It is better to wait indefinitely in case it is being run from stpgm.ctl Supporting non display server use is probably just not going to be as smooth as with the server.

@dehorsley
Copy link
Member Author

  1. Without the display server running sometimes the log header is lost on a log change. I guess those are the breaks. It seems fine with the display server.

Yeah not much you can do about that. It should print the header of a new file. If not, that age test might need some tweaking.

I actually wrote the original version of this script at AuScope, and that drawback was one of the motivators for the server.

  1. If the display server isn't in use, streamlog waits forever for the FS to start and waits forever after it ends. With the display server it ends right away. I guess I prefer the display server behavior. Is there any reason to not make it consistent?

Mmm good point. I think the behaviour of the non-server branch is more useful for things like log monitors. The server branch can use ssub's -w flag to reproduce this.

I think the display server behavior would be safe even if run from stpgm.ctl. I can noodle around to find the right if test for that, but maybe you know. Maybe just break the first half of the || to a separate if clause that calls exit.

Yup, if you wanted to do this, you could split out that first half and exit.

It is a nice catch that you wait if the file doesn't exist (presumably hasn't been created yet).

Right, exactly.

  1. Any reason to not add it to the main Makefile?

Simply forgot 😄

@dehorsley
Copy link
Member Author

Re point 2: both branches now have the same behaviour when the FS is restarted (ie waiting). Still though, for the server branch, if the FS or fsserver aren't running when streamlog is called, it will exit. This is racy. Fixing this might be just removing these lines:

if ! fsserver status > /dev/null; then echo fsserver not running; exit 1 ; fi
if ! fsserver fs status > /dev/null; then echo fs not running; exit 1; fi

but I need to check how ssub responds.

@wehimwich
Copy link
Member

wehimwich commented Sep 1, 2020

I don't know. It seems like the first is need to decide which branch to use. Maybe the second isn't if it is suppose to wait for the FS.

Using the test (to exit) for the FS not running seemed racy with use in stpgm.ctl, which is one reason I gave up on it. I tried an additional separate test before the trap (otherwise it generated an error message, that could be suppressed with &> in the trap). Also a test inside the FS/log-open branch if that exits was no better). Maybe a sleep before the if of a second or two would be better. Even better to be consistent and reliable though.

I guess waiting for the FS to re-/start is more consistent with the client's behavior. I guess that is another reason to not do it.

@wehimwich
Copy link
Member

Curious thing, with something like this in stpgm.ctl:

strml n streamlog | stream_check &

the two processes in the pipeline are left running after a terminate if the display server is not being used (bad outcome of course, you get another instance when the FS is restarted, and on and on). However, they are killed off on terminate when the display server is being used (good). Maybe this difference is related to the exec used in the display server and/or something about process group handling. It may require a black belt in the latter to solve. Maybe there is a simple solution. If not, running without the display server is deprecated anyway; maybe just say streamlog can't be used in stpgm.ctl without the display server.

@dehorsley
Copy link
Member Author

dehorsley commented Sep 3, 2020 via email

@wehimwich
Copy link
Member

This is complicated to explain. Run from stpgm.ctl we don't want it to hang around. The problem is that when the server is not being used, it does stick around.

For tests with both the "pre-beta1" and "up to date" versions with -s and -w they do not hang around when started from stpgm.ctl (server enabled of course or ssub never gets invoked).

@wehimwich
Copy link
Member

It seems like if there were a "wrapper" program run from stpgm.ctl that, like xterm creates a session (or something), out of the commands and all the subprocesses died when it was killed, it would work for no display server (i.e., no left over processes). I tried a few permutations of command line options for screen and that didn't see like it was it, but maybe I just didn't hit the right combination in my scattering experiment. I also did not get something satisfactory with tmux, but that seemed closer to working.

@wehimwich
Copy link
Member

Any feelings about how this should proceed?

I think the two if tests inside the FS_DISPLAY_SERVER clause have to be deleted to get the same non-server waiting for start-up behavior. Adding the -w flag to ssub does make the start/stop behavior between server and non-server symmetrical, but you apparently have to add -s or the startup log header is lost (even with -s the output delayed, until there is more log output; without -s, no header, but no delay). It is not lost for non-server (ironic).

Is having a wrapper a solution to having it in stpgm.ctl viable? Or maybe there is another solution. Otherwise, it might be prevented from being run from stpgm.ctl with no server by requiring a command line option (-t?) for the non-server branch to work, otherwise it would die. It could also die if the command option is there and the server is in use. This would be messy, causing the FS to fail on start-up if the flag was left out of stpgm.ctl and the user disabled the display server. That would at least keep it from creating more processes on restarts (largely invisibly). The rule would be to not use the option in stpgm.ctl. If the user puts the option in and server is not configured, that would be their own fault. Another downside would be that the command line option would be required for any non-server use, but maybe non-server FS use is just deprecated anyway.

@wehimwich
Copy link
Member

streamlog is very useful and would be good to get on the main branch. As a result, I am trying to come up to speed on this again. After two years it is not easy. It does give some perspective though. My current thinking is that we could make streamlog only for the display server. The non-display server part can be broken out in a separate utility, maybe follow_log, in misc/ that people can use if they don't want to use the display server. It should have a warning that it should not be used in stpgm.ctl. This would also get us out of the rabbit hole of trying to make the behaviours the same.

Other points:

For the display server use, we could make -w an optional argument so the user can decide what they want to do. This will make its behaviour parallel to fsclient. I suppose -s should be included as an option too. There is one more, -p, that might useful too, especially with -w.

Maybe all command line options should just be copied to ssub. OTOH, they could be parsed explicitly in case there are more arguments for streamlog someday. Not to be lazy, but maybe wait to cross that bridge when we come to it. There should be a '-h' option though.

Re point 2: both branches now have the same behaviour when the FS is restarted (ie waiting). Still though, for the server branch, if the FS or fsserver aren't running when streamlog is called, it will exit. This is racy. Fixing this might be just removing these lines:

if ! fsserver status > /dev/null; then echo fsserver not running; exit 1 ; fi
if ! fsserver fs status > /dev/null; then echo fs not running; exit 1; fi

but I need to check how ssub responds.

I am not sure now exactly what raciness you were concerned with here. It can't be that starting streamlog from stpgm.ctl might fail. You can get into trouble if the server terminates after these checks and before ssub is run. You would be stuck with ssub running anyway. I suppose those might just be the breaks. Using -p would be useful in this situation. Perhaps that should be hardcoded and not an option. It is benign in other situations, I think. Then maybe the above tests could be deleted. Or was the raciness something else?

If this makes sense, it could start to be tested with the changes of #176.

@wehimwich
Copy link
Member

Curious thing, with something like this in stpgm.ctl:

strml n streamlog | stream_check &

the two processes in the pipeline are left running after a terminate if the display server is not being used (bad outcome of course, you get another instance when the FS is restarted, and on and on).

It turns out that this is related to streamlog looping forever waiting for the FS to restart. Why the SIGKILL used to kill FS processes isn't enough to remove it is not clear. It may be related to the use of the pipeline and the sh -c exec command used to run it. I may try to run that down yet, but maybe someone else sees it.

Anyway, I changed streamlog to accept two arguments:

  • -w to wait for FS startup and for a restart after termination. This works for both when the display server is in use and when it isn't.
  • -s for scroll-back when the display server is in use

This way the user can decide what functionality they want. The -w option must not be used in stpgm.ctl. This sort of makes the issue moot so that streamlog can be used in stpgm.ctl both with and without the display server.

I moved the trap on EXIT to avoid getting a non-useful error message from kill if the script gets a SIGINT while it is waiting for the FS start.

Lastly, I reduced the size for showing the whole log with tail to be just a bit larger than a new log's opening size. As a result, there may be less irrelevant "scroll-back" for small logs. It may make sense to have this depend on -s.

Some log lines may not get streamed when the FS is started (-s improves this when the display server is used), but that is probably reasonable enough. Some lines may also be missed when the log changes and the display server is not is use, but there probably isn't anything to do about that; it is after all one of the points of using the display server.

I will let this soak a bit and come back to it in a few days for more testing and cleanup. Any comments would be appreciated.

@wehimwich
Copy link
Member

This is a proposed final version. Some minor issues were fixed. The most significant change is that -s (scroll-back) when the display server is not enabled is limited to the last 20 lines of the newly selected log. This is enough to cover the header lines. Picking those up may be useful in some situations. It doesn't seem like there is a general solution for picking up an arbitrary number of new lines if they are being generated rapidly. The support when the server is not enabled is never going to be as good as when it is. Hopefully, most people will use the server.

It looks like the reason a pipeline in stpgm.ctl isn't killed off when the FS is terminated is that the individual steps in the pipeline are run spawned as separate processes and the spawning process is not their session leader. This is easily finessed by just not using -w in stpgm.ctl.

@wehimwich wehimwich force-pushed the streamlog branch 2 times, most recently from c4a3b25 to f13fa08 Compare October 30, 2022 20:16
dehorsley and others added 9 commits October 30, 2022 20:45
Added options including help, wait, and scroll-back.

Don't set TRAP for EXIT until expecting a 'tail' process to exist;
this avoids spurious errors if control-C is used during a wait.

The use of tail was changed to provide a limited scroll-back function
that may help pick up some missing lines when there is a log change if
the FS isn't generating a lot of output at that point.
This is only available if the display server is enabled.
for consistency with other scripts.
@wehimwich
Copy link
Member

Added -d option to use display stream instead of log stream. Changed streamlog/Makefile to use symbolic links, as is done for other scripts. Rebased onto main.

@wehimwich wehimwich merged commit 5bc44f2 into nvi-inc:main Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants