New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[automation/python - Expose structured logging #6527
Conversation
4e61b41
to
c8b413a
Compare
8a32049
to
dfad44f
Compare
dfad44f
to
dc39a48
Compare
bc44e40
to
e583c94
Compare
filepath = os.path.join(log_dir.name, "eventlog.txt") | ||
|
||
# Open and close the file to ensure it exists before we start polling for logs | ||
f = open(filepath, "w+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, I think, another thing to think about what does it do when the file exists already? Guess that's not possible given TemporaryDirectory randomization.
There is another API achieves a similar effect potentially using different sys calls:
from pathlib import Path
Path('path/to/file.txt').touch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that's not possible given TemporaryDirectory randomization.
Yup, it's not possible. Thanks for the reference, good to know but I think I'll leave this as-is.
And even if it were possible, there's nothing wrong with opening and closing an existing file.
return filepath, log_dir | ||
|
||
|
||
def _watch_logs(filename: str, callback: OnEvent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this a bit it seems to work pretty well. sleep() does not hold GIL so other threads can make progress. The f.readline()
keeps returning empty str even when it reached eof.
Something you might want to think about is what bad things happen when we run user callbacks on the logging thread. It does not seem to be the user's expectation. These callbacks cannot run simultaneously with user Py code because Py code always takes the GIL, but if they utilize sys calls concurrency is possible. I can't construct a confusing example but I sense there might be one here. Let me look up python threading modules to see what else we can do to push the events back to the parent thread and execute the callbacks there.
time.sleep(0.1) | ||
continue | ||
|
||
event = EngineEvent.from_json(json.loads(line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to think about is what happens when an exception gets thrown here, either we are encountering corrupt non-JSON garbage on the input, or the user callback throws an exception. Does this surface to the user in an intelligible way?
def _cleanup(temp_dir: Optional[tempfile.TemporaryDirectory], | ||
thread: Optional[threading.Thread], | ||
on_exit_fn: Optional[Callable[[], None]] = None) -> None: | ||
# If there's an on_exit function, execute it (used in preview/up to shut down server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequencing seems to matter here if any of these throw exception. Let me try thread.join and see if that throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread.join()
can only throw if you try to join with the current thread or join before starting: source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah seems so. If there's an exception in the background thread, it gets nicely printed out "Excpetion in thread Thread-1" with a stack trace. So far so good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds like none of these finalizers will throw so we might be good here! The way to avoid having to reason/worry through this is nesting with
statements and from contextlib import contextmanager
- the with statements basically a condensed form of try..finally, that kind of code is easy to see at a glance that finalizers don't interfere and all try to run even if some fail.
event = EngineEvent.from_json(json.loads(line)) | ||
callback(event) | ||
|
||
# if this is the cancel event, stop watching logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to know more about this one, what kind of cancel event? I do not think I have matching logic in dotnet SDK, perhaps a miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CancelEvent
is the last event transmitted by the engine as a signal that logging is over. I've used it here to signal that the thread can terminate, so when thread.join()
waits it doesn't always time out because otherwise there would be no end to this _watch_logs
function. We can also use an Event
object to signal completion though, if there is hesitation to go this route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the update successfully completes.
export interface CancelEvent {
}
So it is not the above Cancel event? And nothing about stack.Cancel() right?
Yeah this is awesome that the engine signals completion of logging. I wanted something like his for .NET to avoid race conditions, but I did not realize there is an event we can count on to be the last event. Calling it "cancel event" seems a little misleading but it does seem like we need something like this.
If we use built-in concurrency primitives to try to cancel the bg thread when we think the work is done, we're always open to a subtle race condition where we guess wrong (we think the work is done but IO has not propagated all the logs yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed the same CancelEvent
that you referenced. It is emitted on stack.cancel()
but also whenever an update completes, in both the error and success case. It's not so much a "completion of logging" but a "completion of update" which I am equating to the same thing here. I admit I'm not 100% confident that this will always be the case, but we do have the timeout on thread.join()
to fall back on in any case (maybe that should be shorter?).
Calling it "cancel event" seems a little misleading but it does seem like we need something like this.
Heh, agreed 😄
I'm tempted to go ahead with this and we can make adjustments if we run into issues. All of our error tests complete successfully so I'm not too worried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, don't block on this. I will follow up to understand this better. Perhaps we can rename things later to make more sense.
OK update on threading issue. We can't really run callbacks from the parent thread in this design because of To get out of this situation would be to async-io-ify the SDK from _run_pulumi_cmd, in which case we could be polling the log file in AIO loop as well without threading. There seems to be support in 3.6 but it's gnarly: https://docs.python.org/3.6/library/asyncio-subprocess.html Def out of scope fo this PR. |
67da9f4
to
0c5f415
Compare
See also: #6436 (go), #6454 (nodejs), #6572 (dotnet).
Changes
on_event
option toStack.up()
,Stack.preview()
,Stack.refresh()
andStack.destroy()
for acting on structured json events.PreviewResult
interface to include achange_summary
property that summarizes changes to resources.Fixes: #6443