Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PULUMI_OPTIMIZED_CHECKPOINT_PATCH temporarily allocates 100x the checkpoint memory #11653

Closed
t0yv0 opened this issue Dec 14, 2022 · 0 comments · Fixed by #11666
Closed

PULUMI_OPTIMIZED_CHECKPOINT_PATCH temporarily allocates 100x the checkpoint memory #11653

t0yv0 opened this issue Dec 14, 2022 · 0 comments · Fixed by #11666
Assignees
Labels
area/backends State storage (filestate/httpstate/etc.) kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@t0yv0
Copy link
Member

t0yv0 commented Dec 14, 2022

What happened?

An attempt to roll out PULUMI_OPTIMIZED_CHECKPOINT_PATCH as default behavior resulted in a P0 indicating that users are experiencing OOM issues with their Pulumi programs #11650

A careful analysis of a repro case that was submitted in private and cannot be shared here fully has pinpointed the issue to the diff algorithm. Here are some excerpts of the pprof memory profiler on a program that works on a 6MB uncompressed JSON checkpoint:

github.com/hexops/gotextdiff/myers.shortestEditSequence
/Users/t0yv0/go/pkg/mod/github.com/hexops/gotextdiff@v1.0.3/myers/diff.go

  Total:      3.43GB     3.43GB (flat, cum) 98.88%
    148            .          .           } 
    149            .          .            
    150            .          .           // shortestEditSequence returns the shortest edit sequence that converts a into b. 
    151            .          .           func shortestEditSequence(a, b []string) ([][]int, int) { 
    152            .          .           	M, N := len(a), len(b) 
    153       1.96MB     1.96MB           	V := make([]int, 2*(N+M)+1) 
    154            .          .           	offset := N + M 
    155       2.88MB     2.88MB           	trace := make([][]int, N+M+1) 
    156            .          .            
    157            .          .           	// Iterate through the maximum possible length of the SES (N+M). 
    158            .          .           	for d := 0; d <= N+M; d++ { 
    159       3.42GB     3.42GB           		copyV := make([]int, len(V)) 
    160            .          .           		// k lines are represented by the equation y = x - k. We move in 
    161            .          .           		// increments of 2 because end points for even d are on even k lines. 
    162            .          .           		for k := -d; k <= d; k += 2 { 
    163            .          .           			// At each point, we either go down or to the right. We go down if 
    164            .          .           			// k == -d, and we go to the right if k == d. We also prioritize 

github.com/hexops/gotextdiff/myers.operations
/Users/t0yv0/go/pkg/mod/github.com/hexops/gotextdiff@v1.0.3/myers/diff.go

  Total:           0     3.43GB (flat, cum) 98.88%
     47            .          .           func operations(a, b []string) []*operation { 
     48            .          .           	if len(a) == 0 && len(b) == 0 { 
     49            .          .           		return nil 
     50            .          .           	} 
     51            .          .            
     52            .     3.43GB           	trace, offset := shortestEditSequence(a, b) 
     53            .          .           	snakes := backtrack(trace, len(a), len(b), offset) 
     54            .          .            
     55            .          .           	M, N := len(a), len(b) 
     56            .          .            
     57            .          .           	var i int 
     58            .          .           	solution := make([]*operation, len(a)+len(b)) 
     59            .          .            
     60            .          .           	add := func(op *operation, i2, j2 int) { 
     61            .          .           		if op == nil { 
     62            .          .           			return 
     63            .          .           		} 

github.com/hexops/gotextdiff/myers.ComputeEdits
/Users/t0yv0/go/pkg/mod/github.com/hexops/gotextdiff@v1.0.3/myers/diff.go

  Total:           0     3.43GB (flat, cum) 98.95%
     15            .          .           // Sources: 
     16            .          .           // https://blog.jcoglan.com/2017/02/17/the-myers-diff-algorithm-part-3/ 
     17            .          .           // https://www.codeproject.com/Articles/42279/%2FArticles%2F42279%2FInvestigating-Myers-diff-algorithm-Part-1-of-2 
     18            .          .            
     19            .          .           func ComputeEdits(uri span.URI, before, after string) []diff.TextEdit { 
     20            .     3.43GB           	ops := operations(splitLines(before), splitLines(after)) 
     21            .          .           	edits := make([]diff.TextEdit, 0, len(ops)) 
     22            .          .           	for _, op := range ops { 
     23            .          .           		s := span.New(uri, span.NewPoint(op.I1+1, 1, 0), span.NewPoint(op.I2+1, 1, 0)) 
     24            .          .           		switch op.Kind { 
     25            .          .           		case diff.Delete: 

Essentially the root cause here is feeding too many newlines into the Myers algorithm. In the example the JSON document has 62641 newlines.

Note that Pulumi has code that injects these newlines for PULUMI_OPTIMIZED_CHECKPOINT_PATCH specifically to enable this Myers algorithm to perform well in detecting diffs; therefore we have control over how many newlines to generate.

#11568 incidentally seems to fix this issue by reducing the newlines to one-per-resource.

A potential fix here is to extract only the parts of 11568 relevant to reducing newline counts.

Steps to reproduce

This is difficult to reproduce since built-in Pulumi profiling capability performs a manual GC before dumping the memory profile, which reclaims the space. I've reproduced by writing a custom test case to stress-test the lower level functions, and collect a heap profile without manual GC when a threshold is crossed.

Expected Behavior

Pulumi uses memory 2-3x the size of the checkpoint at most.

Actual Behavior

Pulumi uses memory 100x the size of the checkpoint at most.

Output of pulumi about

No response

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@t0yv0 t0yv0 added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Dec 14, 2022
@t0yv0 t0yv0 self-assigned this Dec 14, 2022
@t0yv0 t0yv0 added area/backends State storage (filestate/httpstate/etc.) and removed needs-triage Needs attention from the triage team labels Dec 14, 2022
@t0yv0 t0yv0 added this to the 0.82 milestone Dec 14, 2022
@bors bors bot closed this as completed in ad74c80 Dec 15, 2022
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backends State storage (filestate/httpstate/etc.) kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants