Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[automation/python - Expose structured logging #6527

Merged
merged 14 commits into from Apr 13, 2021

Conversation

komalali
Copy link
Member

@komalali komalali commented Mar 14, 2021

See also: #6436 (go), #6454 (nodejs), #6572 (dotnet).

Changes

  • Adds an on_event option to Stack.up(), Stack.preview(), Stack.refresh() and Stack.destroy() for acting on structured json events.
  • Updates the PreviewResult interface to include a change_summary property that summarizes changes to resources.

Fixes: #6443

@komalali komalali force-pushed the komalali/python-auto-events branch 2 times, most recently from 4e61b41 to c8b413a Compare March 16, 2021 22:43
@komalali komalali force-pushed the komalali/python-auto-events branch 2 times, most recently from 8a32049 to dfad44f Compare March 29, 2021 16:45
@komalali komalali force-pushed the komalali/python-auto-events branch from dfad44f to dc39a48 Compare March 29, 2021 17:17
@komalali komalali force-pushed the komalali/python-auto-events branch 2 times, most recently from bc44e40 to e583c94 Compare April 13, 2021 05:55
@komalali komalali marked this pull request as ready for review April 13, 2021 06:28
filepath = os.path.join(log_dir.name, "eventlog.txt")

# Open and close the file to ensure it exists before we start polling for logs
f = open(filepath, "w+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, I think, another thing to think about what does it do when the file exists already? Guess that's not possible given TemporaryDirectory randomization.

There is another API achieves a similar effect potentially using different sys calls:

from pathlib import Path
Path('path/to/file.txt').touch()

Copy link
Member Author

@komalali komalali Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess that's not possible given TemporaryDirectory randomization.

Yup, it's not possible. Thanks for the reference, good to know but I think I'll leave this as-is.
And even if it were possible, there's nothing wrong with opening and closing an existing file.

return filepath, log_dir


def _watch_logs(filename: str, callback: OnEvent):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this a bit it seems to work pretty well. sleep() does not hold GIL so other threads can make progress. The f.readline() keeps returning empty str even when it reached eof.

Something you might want to think about is what bad things happen when we run user callbacks on the logging thread. It does not seem to be the user's expectation. These callbacks cannot run simultaneously with user Py code because Py code always takes the GIL, but if they utilize sys calls concurrency is possible. I can't construct a confusing example but I sense there might be one here. Let me look up python threading modules to see what else we can do to push the events back to the parent thread and execute the callbacks there.

time.sleep(0.1)
continue

event = EngineEvent.from_json(json.loads(line))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to think about is what happens when an exception gets thrown here, either we are encountering corrupt non-JSON garbage on the input, or the user callback throws an exception. Does this surface to the user in an intelligible way?

def _cleanup(temp_dir: Optional[tempfile.TemporaryDirectory],
thread: Optional[threading.Thread],
on_exit_fn: Optional[Callable[[], None]] = None) -> None:
# If there's an on_exit function, execute it (used in preview/up to shut down server)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequencing seems to matter here if any of these throw exception. Let me try thread.join and see if that throws.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread.join() can only throw if you try to join with the current thread or join before starting: source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems so. If there's an exception in the background thread, it gets nicely printed out "Excpetion in thread Thread-1" with a stack trace. So far so good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sounds like none of these finalizers will throw so we might be good here! The way to avoid having to reason/worry through this is nesting with statements and from contextlib import contextmanager - the with statements basically a condensed form of try..finally, that kind of code is easy to see at a glance that finalizers don't interfere and all try to run even if some fail.

event = EngineEvent.from_json(json.loads(line))
callback(event)

# if this is the cancel event, stop watching logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to know more about this one, what kind of cancel event? I do not think I have matching logic in dotnet SDK, perhaps a miss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancelEvent is the last event transmitted by the engine as a signal that logging is over. I've used it here to signal that the thread can terminate, so when thread.join() waits it doesn't always time out because otherwise there would be no end to this _watch_logs function. We can also use an Event object to signal completion though, if there is hesitation to go this route.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// the update successfully completes.
export interface CancelEvent {
}

So it is not the above Cancel event? And nothing about stack.Cancel() right?

Yeah this is awesome that the engine signals completion of logging. I wanted something like his for .NET to avoid race conditions, but I did not realize there is an event we can count on to be the last event. Calling it "cancel event" seems a little misleading but it does seem like we need something like this.

If we use built-in concurrency primitives to try to cancel the bg thread when we think the work is done, we're always open to a subtle race condition where we guess wrong (we think the work is done but IO has not propagated all the logs yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the same CancelEvent that you referenced. It is emitted on stack.cancel() but also whenever an update completes, in both the error and success case. It's not so much a "completion of logging" but a "completion of update" which I am equating to the same thing here. I admit I'm not 100% confident that this will always be the case, but we do have the timeout on thread.join() to fall back on in any case (maybe that should be shorter?).

Calling it "cancel event" seems a little misleading but it does seem like we need something like this.

Heh, agreed 😄

I'm tempted to go ahead with this and we can make adjustments if we run into issues. All of our error tests complete successfully so I'm not too worried.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, don't block on this. I will follow up to understand this better. Perhaps we can rename things later to make more sense.

@t0yv0
Copy link
Member

t0yv0 commented Apr 13, 2021

OK update on threading issue. We can't really run callbacks from the parent thread in this design because of _run_pulumi_cmd. Our main thread is busy being hogged by that function while True: loop. So if we are to run user's callbacks concurrently we need another thread.

To get out of this situation would be to async-io-ify the SDK from _run_pulumi_cmd, in which case we could be polling the log file in AIO loop as well without threading. There seems to be support in 3.6 but it's gnarly:

https://docs.python.org/3.6/library/asyncio-subprocess.html

Def out of scope fo this PR.

@t0yv0 t0yv0 self-requested a review April 13, 2021 16:21
@komalali komalali force-pushed the komalali/python-auto-events branch from 67da9f4 to 0c5f415 Compare April 13, 2021 18:47
@komalali komalali merged commit 0c4e1a3 into master Apr 13, 2021
@komalali komalali deleted the komalali/python-auto-events branch April 13, 2021 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[automation/python] - Add structured output to preview/up/refresh/destroy
2 participants