Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple persist and display events #15529

Merged
merged 6 commits into from
Mar 4, 2024

Conversation

PollRobots
Copy link
Contributor

@PollRobots PollRobots commented Feb 28, 2024

Description

This removes a scenario where events could not be persisted to the cloud because they were waiting on the same event being displayed

This uses the same buffer size for both the display and persist channels [Removed to make PR a single change]

The primary change, however, is to stop rendering the tree every time a row is updated, instead, theis renders when the display actually happens in the the frame call. The renderer instead simply marks itself as dirty in the rowUpdated, tick, systemMessage and done methods and relies on the frame being redrawn on a 60Hz timer (the done method calls frame explicitly). This makes the rowUpdated call exceedingly cheap (it simply marks the treeRenderer as dirty) which allows the ProgressDisplay instance to service the display events faster, which prevents it from blocking the persist events.

This requires a minor refactor to ensure that the display object is available in the frame method

Because the treeRenderer is calling back into the ProgressDisplay object in a goroutine, the ProgressDisplay object needs to be thread safe, so a read-write mutex is added to protect the eventUrnToResourceRow map. The unused urnToID map was removed in passing.

Impact

There are scenarios where the total time taken for an operation was dominated by servicing the events.

This reduces the time for a complex (~2000 resources) pulumi preview from 1m45s to 45s

For a pulumi up with -v=11 on a the same stack, where all the register resource spans were completing in 1h6m and the postEngineEventBatch events were taking 3h45m, this PR removes the time impact of reporting the events (greatly inflated by the high verbosity setting) and the operation takes the anticipated 1h6m

Fixes # (issue)

Checklist

  • I have run make tidy to update any new dependencies
  • I have run make lint to verify my code passes the lint check
    • I have formatted my code using gofumpt
  • I have added tests that prove my fix is effective or that my feature works
  • I have run make changelog and committed the changelog/pending/<file> documenting my change
  • Yes, there are changes in this PR that warrants bumping the Pulumi Cloud API version

@PollRobots PollRobots added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Feb 28, 2024
@PollRobots PollRobots requested a review from a team as a code owner February 28, 2024 07:50
@PollRobots PollRobots requested review from pgavlin and removed request for a team February 28, 2024 07:50
@pulumi-bot
Copy link
Contributor

pulumi-bot commented Feb 28, 2024

Changelog

[uncommitted] (2024-03-04)

@PollRobots PollRobots requested review from a team and Frassle February 28, 2024 07:51
Copy link
Collaborator

@tgummerer tgummerer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test coverage we can add here? E.g. from a quick look I couldn't find a test that makes sure that the display is updated correctly following a ticker event.

For another time it would also be nicer for the reviewer if this was split up in a couple of PRs. E.g. changing the buffer size seems to be completely independent from the other display changes, and having only one of these changes in my head at a time would make it a little bit easier at least for me to review (and maybe even the refactoring in a separate PR or commit from the other change) :)

pkg/backend/httpstate/state.go Outdated Show resolved Hide resolved
@PollRobots
Copy link
Contributor Author

Is there any test coverage we can add here? E.g. from a quick look I couldn't find a test that makes sure that the display is updated correctly following a ticker event.

There are significant tests for the treeRenderer, but unfortunately they all use a subtly different code path to ensure that the output is deterministic. I'll take another look, testing would make me more comfortable too

For another time it would also be nicer for the reviewer if this was split up in a couple of PRs. E.g. changing the buffer size seems to be completely independent from the other display changes, and having only one of these changes in my head at a time would make it a little bit easier at least for me to review (and maybe even the refactoring in a separate PR or commit from the other change) :)

I'll pull that into a separate PR, you are quite correct

Copy link
Collaborator

@tgummerer tgummerer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test and splitting the PR!

I had a couple of thoughts on the test and how we could potentially improve it for your consideration, but overall this looks good!

}

// Restart the ticker with the default delay
treeRenderer.ticker.Reset(time.Millisecond * 16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m always a little worried when I see timing dependent behaviour in tests (it's easy for it to get flaky). This looks like it's going to be fine, but instead of the actual waiting, I feel like just doing a manual Tick() for the ticker would give us a similarly good test here, especially since we're already decoupling the time we're seeing for the ticker in production and the test here. (It's the same interval, but nothing guarantees that the ticker interval in the treeRenderer always stays at 16ms)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hate using timers in tests also. Unfortunately, I don't believe that there is an easy way to mock the ticker or manually provoke a tick event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to not use the timer at all, and just verify that nothing is rendered until frame is called (which is what handleEvent does when a tick happens)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed it to not use the ticker. I'd rather not inflict a possible flake on someone down the line if I can avoid it

break
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert the contents of the mock terminal here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's reasonable

Copy link
Member

@pgavlin pgavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from Thomas's feedback 👍

Paul Roberts and others added 5 commits March 4, 2024 10:52
This removes a scenario where events could not be persisted to the cloud
because they were waiting on the same event being displayed

This uses the same buffer size for both the display and persist channels

The primary change, however, is to stop rendering the tree every time
a row is updated, instead, theis renders when the display actually
happens in the the `frame` call. The renderer instead simply marks
itself as dirty in the `rowUpdated`, `tick`, `systemMessage` and `done`
methods and relies on the frame being redrawn on a 60Hz timer (the
`done` method calls `frame` explicitly)

This requires a minor refactor to ensure that the display object is
available in the frame method

Because the treeRenderer is calling back into the ProgressDisplay object
in a goroutine, the ProgressDisplay object needs to be thread safe, so
a read-write mutex is added to protect the `eventUrnToResourceRow` map.
The unused `urnToID` map was removed in passing.
Co-authored-by: Thomas Gummerer <t.gummerer@gmail.com>
This exposed some odd edgecases in the treeRenderer
@PollRobots PollRobots force-pushed the proberts/decouple-persist-and-display-events branch from 6fe7fac to ebf4abc Compare March 4, 2024 18:53
@PollRobots PollRobots added this pull request to the merge queue Mar 4, 2024
Merged via the queue into master with commit ff7a7b4 Mar 4, 2024
48 checks passed
@PollRobots PollRobots deleted the proberts/decouple-persist-and-display-events branch March 4, 2024 21:24
Frassle added a commit that referenced this pull request Mar 15, 2024
Frassle added a commit that referenced this pull request Mar 15, 2024
github-merge-queue bot pushed a commit that referenced this pull request Mar 15, 2024
<!--- 
Thanks so much for your contribution! If this is your first time
contributing, please ensure that you have read the
[CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md)
documentation.
-->

# Description

<!--- Please include a summary of the change and which issue is fixed.
Please also include relevant motivation and context. -->

This reverts commit ff7a7b4.

Fixes #15668.
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2024
<!--- 
Thanks so much for your contribution! If this is your first time
contributing, please ensure that you have read the
[CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md)
documentation.
-->

# Description

Retry #15529 with fix for the issue that required the revert in #15705 

This removes a scenario where events could not be persisted to the cloud
because they were waiting on the same event being displayed

Instead of rendering the tree every time a row is updated, instead, this
renders when the display actually happens in the the `frame` call. The
renderer instead simply marks itself as dirty in the `rowUpdated`,
`tick`, `systemMessage` and `done` methods and relies on the frame being
redrawn on a 60Hz timer (the `done` method calls `frame` explicitly).
This makes the rowUpdated call exceedingly cheap (it simply marks the
treeRenderer as dirty) which allows the ProgressDisplay instance to
service the display events faster, which prevents it from blocking the
persist events.

This requires a minor refactor to ensure that the display object is
available in the frame method

Because the treeRenderer is calling back into the ProgressDisplay object
in a goroutine, the ProgressDisplay object needs to be thread safe, so a
read-write mutex is added to protect the `eventUrnToResourceRow` map.
The unused `urnToID` map was removed in passing.

## Impact

There are scenarios where the total time taken for an operation was
dominated by servicing the events.

This reduces the time for a complex (~2000 resources) `pulumi preview`
from 1m45s to 45s

For a `pulumi up` with `-v=11` on a the same stack, where all the
register resource spans were completing in 1h6m and the
postEngineEventBatch events were taking 3h45m, this PR removes the time
impact of reporting the events (greatly inflated by the high verbosity
setting) and the operation takes the anticipated 1h6m


<!--- Please include a summary of the change and which issue is fixed.
Please also include relevant motivation and context. -->

Fixes #15668 

This was happening because the renderer was being marked dirty once per
second in a tick event, which caused frame to redraw. There is a check
in the render method that `display.headerRow` is not nil that was
previously used to prevent rendering when no events had been added. This
check is now part of the `markDirty` logic

Some of the tests needed to be updated to make this work and have also
been refactored

## Checklist

- [X] I have run `make tidy` to update any new dependencies
- [X] I have run `make lint` to verify my code passes the lint check
  - [ ] I have formatted my code using `gofumpt`

<!--- Please provide details if the checkbox below is to be left
unchecked. -->
- [X] I have added tests that prove my fix is effective or that my
feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [ ] I have run `make changelog` and committed the
`changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the
Pulumi Cloud,
then the service should honor older versions of the CLI where this
change would not exist.
You must then bump the API version in
/pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi
Cloud API version
<!-- @pulumi employees: If yes, you must submit corresponding changes in
the service repo. -->

---------

Co-authored-by: Paul Roberts <proberts@pulumi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants