PULUMI_OPTIMIZED_CHECKPOINT_PATCH temporarily allocates 100x the checkpoint memory #11653
Labels
area/backends
State storage (filestate/httpstate/etc.)
kind/bug
Some behavior is incorrect or out of spec
resolution/fixed
This issue was fixed
Milestone
What happened?
An attempt to roll out PULUMI_OPTIMIZED_CHECKPOINT_PATCH as default behavior resulted in a P0 indicating that users are experiencing OOM issues with their Pulumi programs #11650
A careful analysis of a repro case that was submitted in private and cannot be shared here fully has pinpointed the issue to the diff algorithm. Here are some excerpts of the pprof memory profiler on a program that works on a 6MB uncompressed JSON checkpoint:
Essentially the root cause here is feeding too many newlines into the Myers algorithm. In the example the JSON document has 62641 newlines.
Note that Pulumi has code that injects these newlines for PULUMI_OPTIMIZED_CHECKPOINT_PATCH specifically to enable this Myers algorithm to perform well in detecting diffs; therefore we have control over how many newlines to generate.
#11568 incidentally seems to fix this issue by reducing the newlines to one-per-resource.
A potential fix here is to extract only the parts of 11568 relevant to reducing newline counts.
Steps to reproduce
This is difficult to reproduce since built-in Pulumi profiling capability performs a manual GC before dumping the memory profile, which reclaims the space. I've reproduced by writing a custom test case to stress-test the lower level functions, and collect a heap profile without manual GC when a threshold is crossed.
Expected Behavior
Pulumi uses memory 2-3x the size of the checkpoint at most.
Actual Behavior
Pulumi uses memory 100x the size of the checkpoint at most.
Output of
pulumi about
No response
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: