Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of gRPC maximum message sizes #106

Closed
alexsomesan opened this issue Sep 14, 2021 · 6 comments · Fixed by #139
Closed

Allow configuration of gRPC maximum message sizes #106

alexsomesan opened this issue Sep 14, 2021 · 6 comments · Fixed by #139
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@alexsomesan
Copy link
Member

terraform-plugin-go version

v0.3.0

Use cases

The Kubernetes manifest provider has to deal with some potentially huge resource configurations. Particularly, the CRD objects of the Strimzi project (Apache Kafka on K8s) are as big as 16K (10^3) lines of HCL.

Because of this, it can run into the gRPC message limits quite easily. This is most likely to happen in the UpgradeResourceState call first, because the previous state payload is transported in raw JSON for this call which is more verbose than the msgpack payload of other protocol calls.

An instance of this happening would look like this to the user:

│ Error: Plugin error
│
│   with kubernetes_manifest.customresourcedefinition_kafkas_kafka_strimzi_io,
│   on crds.tf line 1, in resource "kubernetes_manifest" "customresourcedefinition_kafkas_kafka_strimzi_io":
│    1: resource "kubernetes_manifest" "customresourcedefinition_kafkas_kafka_strimzi_io" {
│
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).UpgradeResourceState: rpc error: code = ResourceExhausted desc = grpc: received message larger than max
│ (4528743 vs. 4194304)

Attempted solutions

I have successfully worked around this issue by maxxing out the message size limits on the gRPC server that terraform-plugin-go instantiates. However, this would require us to consume terraform-plugin-go from a fork and constantly rebase on new releases. This would not be ideal.

The changes I had to make are available here: alexsomesan@6f9312a

Proposal

We would prefer for this mechanism be exposed to consumers of terraform-plugin-go so that we can configure practical limits as they are needed.

References

@alexsomesan alexsomesan added the enhancement New feature or request label Sep 14, 2021
@paddycarver
Copy link
Contributor

I think if we're going to bump gRPC request size limits, @hashicorp/terraform-core should at the very least be involved in the discussion, because we may have some unintended consequences for that.

My other concern is if we do this, it will impact ~every provider that upgrades the SDK version, because this will change SDKv2, too.

References:

@jbardin
Copy link
Member

jbardin commented Sep 15, 2021

This might be an OK solution for now, though if providers start regularly passing very large values around, we are definitely going to run into other issues. A new remote storage interface will definitely need to be streaming, but there's no chance of changing the provider interface in the near future. Since our grpc servers are only exposed locally, they are not protecting themselves from anything other than the terraform client, so there is no real attack vector to exploit this change.

My initial concern is that removes some of the protection we have from users putting unreasonably large values into their config. Granted the UX there is not very good, because they get a panic rather than an error indicating that a 16MB user_data file is really not a good idea.

@apparentlymart
Copy link
Contributor

One specific thing I think we'd need to contend with is how this would interact with older Terraform versions that don't offer this capability. We might perhaps need some kinda of reverse-capbility-detection handshake where Terraform Core announces to the provider that it supports this feature and the provider can then refuse to configure with an error message if it wants to require a Terraform Core version which supports this feature.

While we can fiddle with the specific limits here, this also seems like a prompt to reconsider some of our architecture assumptions. In particular, we generally assume that individual resource instance objects are relatively small and that assumption plays out in various different ways, including but not limited to:

  • The provider protocol, as we're discussing here.
  • The algorithms we use for processing data in memory for the Terraform language runtime, where we've previously seen noticeably poor performance for unusually large objects.
  • The way we serialize everything all together in one giant JSON blob for state storage and then expect to be able to stash that resulting blob in typical blob stores, which each have their own practical size limits.

This is not to say that we can't do something more tactical in the meantime to address an immediate problem, but I share @jbardin's concerns that raising this fixed limit is likely to just expose other limits, whether they also be fixed limits (like the state backend blob size limits) or effective limits (noticeably slow performance processing the resulting objects). It seems like we owe ourselves a discussion about whether our current design assumptions are valid and, if not, what new design assumptions are valid. (I don't think there's any world in which we can have no limits at all, because we are constrained by various practical concerns, but we could choose limits that better match the problems we're intending to solve.)

@jbardin
Copy link
Member

jbardin commented Sep 15, 2021

Unlike the (experimental) call size option we use for the GetSchema call, this is only a server-side option, so older versions of Terraform would be effected just the same as current versions. As Alex has shown, this change can technically be done successfully without Terraform's involvement at all, though all the other points still stand.

@Eric-Jckson
Copy link

I'm getting a similar error when trying to use the Terraform Provider Panos.

│ The plugin returned an unexpected error from │ plugin.(*GRPCProvider).PlanResourceChange: rpc error: code = │ ResourceExhausted desc = grpc: received message larger than max (4460288 │ vs. 4194304)

The provider allows us to automate the creation of firewall rules. We have a large amount of rules that we are hitting the GRPC error. Since the change for increasing the limit is still being discussed does a workaround currently exist?

@apparentlymart @jbardin thanks!

@bflad bflad added this to the v0.6.0 milestone Jan 14, 2022
bflad added a commit that referenced this issue Jan 14, 2022
Reference: #106

The rationale for choosing this particular value and making it a constant, rather than configurable, is included in the code.
@bflad bflad self-assigned this Jan 14, 2022
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants