Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/cloud/backend_state.go: upload json state #31241
internal/cloud/backend_state.go: upload json state #31241
Changes from 2 commits
45362fc
4bffb49
79f12b8
2173ae6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to get another opinion on the placement of these operations. It seems like some of this should have been initialized as part of the apply command and I'm wondering how we can leverage the config we already loaded because I assume all this is fairly expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this data has been initialized earlier as part of the apply command. For example we assign a value to schemas at the
(b *Local) opApply()
level:terraform/internal/backend/local/backend_apply.go
Line 83 in 79f12b8
It's within this method that we create a state manager, init the remoteClient and write/persist the state (which is the key operation for uploading state).
You'd think that maybe throughout this process *terraform.Context could have collected schemas data, but that doesn't seem to be the case. We reload the schemas
at different times, at different places. Though an expensive operation, maybe it's a safe thing?
I'll bring this concern to the core team to get another opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! 😀
I must confess that I've not yet been able to load all of the context about the overall goals here, and so for this comment I'm focusing on the specific architecture question this thread seems to be tackling and so I might be missing some bigger details but hopefully this comment is still useful to help explore some options here.
The main thing I'm noticing while I revisit this code is that we seem to have a bit of an abstraction inversion here: the
remote.State
state manager implementation aims to make it easier to implement remote state storage against generic blob stores, and so it tries to encapsulate the problem of JSON-serializing and JSON-deserializing the state so that theremote.Client
implementation (which is what thisremoteClient
struct is) needs to deal only with opaque[]byte
to read and write.If we look out at the bigger picture then, this current design has the following redundancy:
Cloud.StateMgr
instantiatesremoteClient
and returns it wrapped in aremote.State
.remote.State
serializes the state into a JSON snapshot and passes it to the client it's wrapping.remoteClient
then parses that just-created JSON, extracts data from it, and re-serializes it using the other serialization format.My sense then is that the
remote.Client
abstraction is hindering rather than helping now that we need access to the real*states.State
object in order to make the Terraform Cloud API request.If you look in package statemgr you can see the interface that the CLI layer expects backends to implement for state storage, which is unfortunately defined in terms of a bunch of other interfaces and so kinda hard to quickly review but the main thing to notice is that it transitively includes the
statemgr.Writer
interface which has methodWriteState(*states.State) error
that is what the CLI layer is actually going to call when it has a new state snapshot to share. That*states.State
object is not yet JSON encoded and so you can work with it directly without having to parse it first.Based on this, I think my first recommendation would be to consider moving away from using
remote.State
for this backend and to write a customstatemgr.Full
implementation in this package instead. I think you can probably copyremote.State
into here as a starting point since I expect it will still benefit from some of the same logic around tracking state serials and lineage, but crucially it can have a different implementation ofPersistState
that will avoid the need to re-parse the state because it will have access to the*states.State
snapshot already cached inside the state manager itself.This admittedly still doesn't answer how this new state manager can get access to the schemas without completely reloading them from scratch, but it at least removes one extra redundant layer from the design so that we don't need to fight an abstraction that has different goals.
I think one thing that makes this extra tricky is that when running in local operations mode with the cloud integration the "backend" that the CLI layer is talking to is a bit of a "turducken" of odd layers: a
cloud.Cloud
withlocal
set to alocal.Backend
with its nestedBackend
pointing back to the originalcloud.Cloud
. 😬This makes life considerably harder because I think it's the
local.Backend
sandwiched between the two pointers tocloud.Cloud
that is the one actually creating the Terraform Core context and thus loading the schemas, but it's thecloud.Cloud
that actually needs access to the schemas in order to implement itsStateMgr
method and pass the schemas into the state manager.My best idea right now is to consider changing the
statemgr.Writer
interface so thatWriteState
takes both the*states.State
as it has today and a reference to the schemas, with the meaning "write this state snapshot using these schemas". This is awkward because every otherstatemgr.Writer
implementation will just ignore that extra argument altogether, but perhaps that's okay because we have relatively few real implementations ofstatemgr.Full
anyway: thestatemgr.Filesystem
one used for local state, andremote.State
used for all of the state-storage-only backends. (there are a few others lurking for testing purposes only too, I think.)I think the trick here will be tracking down all of the places where we call
WriteState
on a state manager and making sure that they always have access to the current schemas too. If that's true without any deeper refactoring then this might be the least-messy path forward, although I'd welcome any other ideas or counterproposals from others on the team.