New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/cloud/backend_state.go: upload json state #31241
Conversation
As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show the proper types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version. Go-tfe client makes this api request to create a state version in the func (r *remoteClient) Put(state []byte) in backend_state.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial feedback
} | ||
|
||
return nil | ||
} | ||
|
||
func getSchemas(ctxOpts *terraform.ContextOpts, services *disco.Disco, state *states.State) (*terraform.Schemas, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to get another opinion on the placement of these operations. It seems like some of this should have been initialized as part of the apply command and I'm wondering how we can leverage the config we already loaded because I assume all this is fairly expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this data has been initialized earlier as part of the apply command. For example we assign a value to schemas at the (b *Local) opApply()
level:
schemas, moreDiags := lr.Core.Schemas(lr.Config, lr.InputState) |
It's within this method that we create a state manager, init the remoteClient and write/persist the state (which is the key operation for uploading state).
You'd think that maybe throughout this process *terraform.Context could have collected schemas data, but that doesn't seem to be the case. We reload the schemas
at different times, at different places. Though an expensive operation, maybe it's a safe thing?
I'll bring this concern to the core team to get another opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! 😀
I must confess that I've not yet been able to load all of the context about the overall goals here, and so for this comment I'm focusing on the specific architecture question this thread seems to be tackling and so I might be missing some bigger details but hopefully this comment is still useful to help explore some options here.
The main thing I'm noticing while I revisit this code is that we seem to have a bit of an abstraction inversion here: the remote.State
state manager implementation aims to make it easier to implement remote state storage against generic blob stores, and so it tries to encapsulate the problem of JSON-serializing and JSON-deserializing the state so that the remote.Client
implementation (which is what this remoteClient
struct is) needs to deal only with opaque []byte
to read and write.
If we look out at the bigger picture then, this current design has the following redundancy:
Cloud.StateMgr
instantiatesremoteClient
and returns it wrapped in aremote.State
.- When ask to persist state,
remote.State
serializes the state into a JSON snapshot and passes it to the client it's wrapping. - This particular remote client
remoteClient
then parses that just-created JSON, extracts data from it, and re-serializes it using the other serialization format.
My sense then is that the remote.Client
abstraction is hindering rather than helping now that we need access to the real *states.State
object in order to make the Terraform Cloud API request.
If you look in package statemgr you can see the interface that the CLI layer expects backends to implement for state storage, which is unfortunately defined in terms of a bunch of other interfaces and so kinda hard to quickly review but the main thing to notice is that it transitively includes the statemgr.Writer
interface which has method WriteState(*states.State) error
that is what the CLI layer is actually going to call when it has a new state snapshot to share. That *states.State
object is not yet JSON encoded and so you can work with it directly without having to parse it first.
Based on this, I think my first recommendation would be to consider moving away from using remote.State
for this backend and to write a custom statemgr.Full
implementation in this package instead. I think you can probably copy remote.State
into here as a starting point since I expect it will still benefit from some of the same logic around tracking state serials and lineage, but crucially it can have a different implementation of PersistState
that will avoid the need to re-parse the state because it will have access to the *states.State
snapshot already cached inside the state manager itself.
This admittedly still doesn't answer how this new state manager can get access to the schemas without completely reloading them from scratch, but it at least removes one extra redundant layer from the design so that we don't need to fight an abstraction that has different goals.
I think one thing that makes this extra tricky is that when running in local operations mode with the cloud integration the "backend" that the CLI layer is talking to is a bit of a "turducken" of odd layers: a cloud.Cloud
with local
set to a local.Backend
with its nested Backend
pointing back to the original cloud.Cloud
. 😬
This makes life considerably harder because I think it's the local.Backend
sandwiched between the two pointers to cloud.Cloud
that is the one actually creating the Terraform Core context and thus loading the schemas, but it's the cloud.Cloud
that actually needs access to the schemas in order to implement its StateMgr
method and pass the schemas into the state manager.
My best idea right now is to consider changing the statemgr.Writer
interface so that WriteState
takes both the *states.State
as it has today and a reference to the schemas, with the meaning "write this state snapshot using these schemas". This is awkward because every other statemgr.Writer
implementation will just ignore that extra argument altogether, but perhaps that's okay because we have relatively few real implementations of statemgr.Full
anyway: the statemgr.Filesystem
one used for local state, and remote.State
used for all of the state-storage-only backends. (there are a few others lurking for testing purposes only too, I think.)
I think the trick here will be tracking down all of the places where we call WriteState
on a state manager and making sure that they always have access to the current schemas too. If that's true without any deeper refactoring then this might be the least-messy path forward, although I'd welcome any other ideas or counterproposals from others on the team.
I will use a different branch for these changes: #31284 |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show much granularity for types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version with the Cloud integration. Go-tfe client makes this api request to create a state version in
func (r *remoteClient) Put(state []byte)
inbackend_state.go
Implementation details:
To get the json format of state we can use the method Marshal() from the jsonstate package:
func jsonstate.Marshal(sf *statefile.File, schemas *terraform.Schemas) ([]byte, error)
This method returns a slice of bytes which already contains the state formatted into json.
Besides requiring a statefile.File as an argument, this Marshal() method requires *terraform.Schemas, which provides info about registry providers and provisioners.
In the context of
func (r *remoteClient) Put(state []byte)
, we do not have information about schemas.One way to get data about schemas would be through
func (*terraform.Context).Schemas(config *configs.Config, state *states.State) (*terraform.Schemas, tfdiags.Diagnostics)
. But now this method requires us to have a *terraform.Context and *configs.Config.Getting either schemas or a terraform.Context + a config into that
func (r *remoteClient) Put(state []byte)
method is not that simple, we cannot pass new parameters into this method because the signature for this method is shared by other types of backend besides Cloud.We can pass down this data through the remoteClient struct though, which is the pointer receiver for this Put() method. Each backend type has its own remoteClient receiver so if we any changes to it for the cloud backend, then the changes stay in isolation.
The remoteClient for Cloud was defined as:
But I am changing its
stateUploadErr
to bestateUpload
NEXT STEPS:
-Fix test for cloud integration: the mock *Cloud backend needs a contextOps field with fake provider data
-Check with team on their thoughts about the approach/solution I am going for