perf: Use a faster check for empty schemas #10444

AaronFriel · 2022-08-18T23:49:04Z

Experimentally, this can reduce time taken in calls to load a schema by 1-3 seconds, which for a Pulumi YAML program execution or call to pulumi convert (on the order of 5-10s) is a significant performance improvement.

The benchmark showed that prior to this change, the benchmark took 1.1 to 2.5 seconds on a relatively recent Intel laptop processor:

Before:

goos: linux
goarch: amd64
pkg: github.com/pulumi/pulumi/pkg/v3/codegen/schema
cpu: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
BenchmarkSchemaEmptyCheck
BenchmarkSchemaEmptyCheck/large-schema-empty-check-time
BenchmarkSchemaEmptyCheck/large-schema-empty-check-time-8
       1	1734625270 ns/op	    6768 B/op	      22 allocs/op
PASS

(That's 1.7 seconds/operation, sample size of 1.)

After:

BenchmarkSchemaEmptyCheck/large-schema-empty-check-time-8
        1000000000	         0.0000005 ns/op	       0 B/op	       0 allocs/op

It's now on the order of a couple instructions, apparently too short for Go benchmarking to analyze (it is definitely not that much shorter than a nanosecond) as for most non-empty schemas we will return "false" within reading the first few bytes of a file.

Addenda: Out of curiosity I changed the regexp to include a ^ at the beginning, and that vastly improved performance. Still, the new approach is clear that we're willing to try to parse almost any document, just not the defaults that some providers have used.

BenchmarkSchemaEmptyCheck/large-schema-empty-check-time-8
        1000000000	         0.0000130 ns/op	       0 B/op	       0 allocs/op
PASS

Again with the very tiny time per operation, a clock cycle on this computer is on the order of 0.25ns.

pulumi-bot · 2022-08-18T23:49:24Z