log: Overhaul Windows Logger with Pipes and FHs#51791
log: Overhaul Windows Logger with Pipes and FHs#51791johnwparent wants to merge 12 commits intospack:developfrom
Conversation
Redo how we handle IO direction for logging on Windows Previous we closed sys.stdout, replaced it with a new buffer pointing at the logging file, and then dup2'd to redirect stdout. The logfile was then read from a thread, controlled by the main spack thread, and said thread forwarded on the logging context to main Spack's dup'd stdout. This is deeply insufficient for a number of use cases: It did not support filtering, unicode, control characters, or force_echo All echo, including force_echo if it did support, was racey, due to the disparate mechanism for which the reader thread had to control thread behavior vs where it was obtaining information. I.e. the reader thread could be killed or echo turned off from pyhon, before any stdout buffers could be flushed and the reader thread wouldn't see the log output it was supposed to echo before turning echo off or terminating. Nested logging, or cached references to stdout and stderr were invalidated Previously, on io redirection, we were closing stdout and replacing it with our new buffer. On io restoration, we were restored a buffer pointing at console stdout to sys.stdout, but any cached references to stdout (i.e. nested logging contexts or pytest) were invalidated. Now we: * do not close whatever sys.stdout refers to, we mearly store a reference to it * do not construct a new buffer from python's api, instead we perform redirection from the ground up, starting with fhs to fds and compose a buffer from the existing fh's fd. * Use pipes for communication with our reader thread, which enables filtering, control characters, etc and allows for non racey echo behavior. At a glance: * Redirects IO starting at the Windows file handle (fh) level * Properly maps fhs to fds, and obtains the appropriate buffer wrapping each * Enables proper subprocess io redirection inheritance * Manages shared file handles between stderr and stdout on redirect * Properly redirects IO at each level of handling, kernel, C, and python * Ensures cached references to console streams are not invalidated * Ensures correct order of release for buffers -> fhs Moves toward a unified logging class by introducing fundamental support for the concepts leveraged by the non Windows logger classes like pipes Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
|
I haven't looked at this in detail, but just wondering if you've seen |
Ahh, I have not. Perhaps this PR should be refactored in favor of supporting Thanks for pointing that out. |
|
Let's chat about it on Slack another time :) have a good holidays! |
scheibelp
left a comment
There was a problem hiding this comment.
Not complete, but I have a few preliminary requests and questions
| # Flush the system stream before redirection | ||
| sys_stream.flush() | ||
| except BaseException: | ||
| # swallow flush errors |
There was a problem hiding this comment.
This would indicate mismanagement wouldn't it? We might want to log this at the debug level.
There was a problem hiding this comment.
Yes, this would indicate an incorrectly timed flush, added debug logging.
| # Flush and close sys_stream - also closes the file descriptor (fd) | ||
| sys_stream = getattr(sys, self.sys_attr) | ||
| sys_stream.flush() | ||
| sys_stream.close() |
There was a problem hiding this comment.
Does this address the case where a context-manager-like functionality above this has saved a reference to sys.stdout?
There was a problem hiding this comment.
Yes, we're only closing IO components owned by this logging redirect instance.
| # Check captured output (echoed content) | ||
| # Note: 'logged' is not echoed because force_echo() scope ended | ||
| assert capfd.readouterr()[0] == "echo\n" | ||
| # "print(echo)" above automatically uses an "\r\n" on Windows |
There was a problem hiding this comment.
(minor) add Note: or something to indicate this is not a continuation of prior note
There was a problem hiding this comment.
Oh yeah good point. Thanks.
lib/spack/spack/llnl/util/tty/log.py
Outdated
| dup_redirect_h = dup_fh(redirect_h) | ||
| # ensure inheritable handles | ||
| os.set_handle_inheritable(redirect_h, True) | ||
| # Todo: investigate if this is neccesary if we do the above before the handle dup |
There was a problem hiding this comment.
Question(s): Can this be determined before a merge? I think redirect_h is what's being used by subprocs, and I would have expected that's the only one that needs to be inheritable.
There was a problem hiding this comment.
Investigating now.
| raise KeyError(self.sys_attr) | ||
|
|
||
| self.saved_stream = getattr(sys, self.sys_attr) | ||
| self.saved_std_fd = self.saved_stream.fileno() |
There was a problem hiding this comment.
Do we need to track this explicitly? It seems to always be available via self.saved_stream.fileno()
If I'm right but you still think it should be tracked, I can think on that. I worry about situations where duplicate data is maintained with separate references though.
There was a problem hiding this comment.
I think it's safer to track it explicitly yes. Basically it allows us to have a consistent idea of what fd we're using to replace whatever the std stream happens to be at redirection time without needing to consider the state of saved_stream.
| # "print(echo)" above automatically uses an "\r\n" on Windows | ||
| # and will replace any \n with \r\n (so end=\n does not work) | ||
| # \r\n is expected and correct here | ||
| newline = "\r\n" if sys.platform == "win32" else "\n" |
There was a problem hiding this comment.
It looks like redirect_stream is setting newline="\n",, so does that not have any effect?
There was a problem hiding this comment.
It's been a little while but IIRC this is an artifact of pytest and capfd. If you look at the check above where we are able to use \n consistently in the file output, that is the effect of setting newline in redirect_stream.
| read.close() | ||
| log_writer.close() | ||
| stdout.flush() | ||
| stdout.close() |
There was a problem hiding this comment.
I don't think you should have to flush before you close.
There was a problem hiding this comment.
You are correct, removed.
lib/spack/spack/llnl/util/tty/log.py
Outdated
| # the pipe is closed or otherwise inaccesible | ||
| return | ||
| norm_data = data.decode(encoding="utf-8", errors="replace") | ||
| # stdout.write(data.decode()) |
| output = clean_line | ||
| if filter_fn: | ||
| output = filter_fn(output) | ||
| enc = stdout.encoding |
There was a problem hiding this comment.
(minor) Can this check be done outside the loop (and e.g. define a lambda here?
Might be worth printing a warning if the encoding is not utf-8
Also question: is there a reason you can't consume the bytes from the pipe and pass them directly through without detecting/matching encoding?
There was a problem hiding this comment.
Windows love utf-16 (wide), so depending what stdout is, it may be utf-16, which is normal and expected, particularly if we're in nested logging contexts and stdout is a pipe.
Also question: is there a reason you can't consume the bytes from the pipe and pass them directly through without detecting/matching encoding?
The encoding we're reading may not match the encoding of the output, without this output is often garbled/unreadable (or mandarin characters, I hit those a lot without this)
scheibelp
left a comment
There was a problem hiding this comment.
Not complete, but I have some preliminary requests/questions
Signed-off-by: John Parent <john.parent@kitware.com>
MacOS GH CI tests have some issue with coverage not finding the "bin" module that also does not occur with local testing on mac, linux, or Windows. Loosen the check so the extra error cruft of converage does not cause the test to fail, and we still check what we need to check Signed-off-by: John Parent <john.parent@kitware.com>
Redo how we handle IO direction for logging on Windows
Previous we closed sys.stdout, replaced it with a new buffer pointing at the logging file, and then dup2'd to redirect stdout. The logfile was then read from a thread, controlled by the main spack thread, and said reader thread forwarded on the logging context to main Spack's dup'd stdout.
This is deeply insufficient for a number of use cases:
It did not support filtering, unicode, control characters, or force_echo
All echo, including force_echo when supported, was racey, due to the disparate mechanisms by which reader thread behavior was controlled vs the context from which it obtained information. I.e. the reader thread could be killed or echo turned off, before any stdout buffers could be flushed (or whatever out-of-sync ordering) and the reader thread wouldn't see the log output it was supposed to echo prior to turning echo off or terminating.
Nested logging, or cached references to stdout and stderr were invalidated. Previously, on io redirection, we were closing stdout and replacing it with our new buffer. On io restoration, we were restored a buffer pointing at console stdout to sys.stdout, but any cached references to stdout (i.e. nested logging contexts or pytest) were invalidated.
Now we:
At a glance:
Moves toward a unified logging class by introducing fundamental support
for the concepts leveraged by the non Windows logger classes like pipes
Also enabled logging testing for Windows
Adds a test for nested logging context managers.