schemas: Break down schema generation #1071

radeksimko · 2022-09-13T12:13:24Z

Closes #990

This addresses a few different problems, as originally outlined in the linked issue.

Runtime Memory Usage

As tested on macOS (M1 Pro)

Previously (v0.29.2) empty config: ~572 MB
Previously (v0.29.2) config with hashicorp/random: ~572 MB
Previously (v0.29.2) config with hashicorp/aws: ~572 MB

After (this PR) empty config: ~9.8 MB
After (this PR) config with hashicorp/random: ~12 MB
After (this PR) config with hashicorp/aws: ~70 MB

Launch Time

Previously (v0.29.2) initialize request/response time: ~2 s
After (this PR) initialize request/response time: 1-3 ms

This is a result of lazily loading each provider only when it's necessary (when we encounter relevant provider requirement). The cost of "lazy-loading" a provider at runtime can be anywhere between <1ms (e.g. hashicorp/random) and ~130ms (e.g. hashicorp/aws) which should still be fast enough for the user to not notice a difference.

Schema Generation

Time to generate schemas

As measured in the GitHub Actions CI (Ubuntu latest)

Previously (recent CI run) 11-12 minutes
After (this PR) 3 minutes

This is mostly a result of running init for each provider individually and doing so in parallel. Terraform itself does not parallelise provider installation as part of terraform init.

Platform-Specific Provider Ignorelist

The schema generation, which runs as part of CI for every release and PR and can also run locally, is subject to OS/arch requirements. Previously we maintained a long list of providers which were known to be unavailable for certain platforms, just to make generation work on these platforms. Such a list may easily get outdated the moment it is committed as the Registry is the only source of truth (i.e. providers may release compatible artifacts the next day after we put it on the ignorelist).

This PR addresses the problem by breaking down the generation, such that we don't run init for a single giant config with all providers, but we first poke the Registry API and filter out any providers which we know are not available for the platform where we're running.

Broken Provider Installations

In order to obtain schema for each provider, we first have to install it. The process of installation can break at any time for a few reasons, of which the most commons ones are:

Provider maintainers manually yank or change artifacts or break release process in some other way
GitHub outage
Issues on the network path between GitHub Actions environment and GitHub Releases hosting the artifacts
Terraform Registry outage

This PR addresses most of these problems by retrying init for each provider individually (5 times, with 2 seconds backoff in between).

Additionally we also now treat even the final failures (after 5 unsuccessful attempts) as soft failures, which just cause schema for the provider to not be included, but the CI to succeed and schema embedding to still work. There is a downside to this: There could be a widespread temporary outage causing us to skip all providers at release time and releasing LS which has no embedded schemas. This could be mitigated by careful observation and potential follow-up release when outage is resolved.

We could also add some checks to verify that e.g. 80-90% of providers were installed correctly, but I'd prefer to leave this for another PR.

This avoids the retries of init in most cases which would be caused by capacity (network, CPU, memory) starvation

vsfsgen used GZIP compression by default but we abandoned it in favour of the stdlib embed package in #1070. This alone resulted in 80M binary size (compared to 18M before). Breaking down schemas to individual files would further increase binary size to 160M. In either case this is well over the limit of what we can pack into VSIX (VS Code extension) - currently 30MB. Using gzip compression can bring the size to 19M. Mentioned sizes reflect darwin/arm64, but differences on other platforms should be similar. Crucially, the compression doesn't seem to affect the time to load the (now compressed) file into memory. This remains between <1ms and 130ms.

dbanck

Nice work! Just a minor thing in the retry logic.

Displaying some kind of summary after running go generate ... would be nice, but we can add something later.

Tier     | Count |  Embedded | Errors |
---------|-------|-----------|--------|
official |    34 | 34 (100%) |      0 |
partner  |   230 | 200 (87%) |     30 |
---------|-------|-----------|--------|

internal/schemas/gen/gen.go

github-actions · 2022-11-04T03:31:28Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added the enhancement New feature or request label Sep 13, 2022

radeksimko self-assigned this Sep 13, 2022

radeksimko force-pushed the f-schemagen-breakdown branch 11 times, most recently from 92436e0 to 1809cbf Compare September 22, 2022 13:13

radeksimko force-pushed the f-schemagen-breakdown branch 7 times, most recently from 845f298 to c72cc2e Compare September 26, 2022 12:52

radeksimko added 2 commits September 26, 2022 15:19

registry: extract provider related logic & add tests

82d7bec

registry: move module logic into its own files

cf72b2b

radeksimko force-pushed the f-schemagen-breakdown branch 8 times, most recently from a987815 to 22f1564 Compare September 26, 2022 17:08

radeksimko force-pushed the f-schemagen-breakdown branch 11 times, most recently from ba9e389 to c0a1d67 Compare October 3, 2022 13:33

radeksimko added 4 commits October 3, 2022 16:56

schemas: Use absolute paths for safety/clarity

014bfa8

schemas: Account for tfexec.SetEnv() behaviour

afe08e4

schemas: Reduce how many providers to download in parallel

800a582

This avoids the retries of init in most cases which would be caused by capacity (network, CPU, memory) starvation

schemas: Account for misbehaving providers (retry)

80a4916

radeksimko force-pushed the f-schemagen-breakdown branch from c0a1d67 to 80a4916 Compare October 3, 2022 15:56

radeksimko added 2 commits October 3, 2022 21:39

schemas: work around some odd Windows ENV variables

a2b6811

radeksimko force-pushed the f-schemagen-breakdown branch from 2037051 to 48f92ca Compare October 4, 2022 13:10

ci: add 'du' to report size of all schemas

3c75a79

dbanck approved these changes Oct 4, 2022

View reviewed changes

internal/schemas/gen/gen.go Show resolved Hide resolved

internal/schemas/gen/gen.go Show resolved Hide resolved

radeksimko added 2 commits October 4, 2022 14:32

terraform/module: fix tests

3596c1c

schemas: fix off-by-one mistake

9c9940a

radeksimko merged commit ac22be0 into main Oct 4, 2022

radeksimko deleted the f-schemagen-breakdown branch October 4, 2022 15:00

This was referenced Oct 10, 2022

Failing walker tests when generated schemas are in place #1085

Closed

terraform-ls.exe uses ~500 MB of RAM for each vscode window #986

Closed

radeksimko mentioned this pull request Oct 20, 2022

Provide opt out from preloaded schemas #506

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schemas: Break down schema generation #1071

schemas: Break down schema generation #1071

radeksimko commented Sep 13, 2022 •

edited

Loading

dbanck left a comment

github-actions bot commented Nov 4, 2022

schemas: Break down schema generation #1071

schemas: Break down schema generation #1071

Conversation

radeksimko commented Sep 13, 2022 • edited Loading

Runtime Memory Usage

Launch Time

Schema Generation

Time to generate schemas

Platform-Specific Provider Ignorelist

Broken Provider Installations

dbanck left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 4, 2022

radeksimko commented Sep 13, 2022 •

edited

Loading