internal/cloud/backend_state.go: upload json state #31241

uturunku1 · 2022-06-13T23:01:13Z

As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show much granularity for types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version with the Cloud integration. Go-tfe client makes this api request to create a state version in func (r *remoteClient) Put(state []byte) in backend_state.go

Implementation details:

To get the json format of state we can use the method Marshal() from the jsonstate package:

func jsonstate.Marshal(sf *statefile.File, schemas *terraform.Schemas) ([]byte, error)

This method returns a slice of bytes which already contains the state formatted into json.

Besides requiring a statefile.File as an argument, this Marshal() method requires *terraform.Schemas, which provides info about registry providers and provisioners.

In the context of func (r *remoteClient) Put(state []byte), we do not have information about schemas.

One way to get data about schemas would be through func (*terraform.Context).Schemas(config *configs.Config, state *states.State) (*terraform.Schemas, tfdiags.Diagnostics). But now this method requires us to have a *terraform.Context and *configs.Config.

Getting either schemas or a terraform.Context + a config into that func (r *remoteClient) Put(state []byte) method is not that simple, we cannot pass new parameters into this method because the signature for this method is shared by other types of backend besides Cloud.

We can pass down this data through the remoteClient struct though, which is the pointer receiver for this Put() method. Each backend type has its own remoteClient receiver so if we any changes to it for the cloud backend, then the changes stay in isolation.

The remoteClient for Cloud was defined as:

type remoteClient struct {
	client       *tfe.Client
	lockInfo     *statemgr.LockInfo
	organization string
	runID        string
	stateUploadErr  bool
	workspace    *tfe.Workspace
	forcePush    bool
}

But I am changing its stateUploadErr to be stateUpload

type stateUpload struct {
	ctx        *terraform.ContextOpts
	services   *disco.Disco
	hasErrored bool
}

NEXT STEPS:
-Fix test for cloud integration: the mock *Cloud backend needs a contextOps field with fake provider data
-Check with team on their thoughts about the approach/solution I am going for

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1202250075737069

As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show the proper types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version. Go-tfe client makes this api request to create a state version in the func (r *remoteClient) Put(state []byte) in backend_state.go

brandonc

Some initial feedback

.gitignore

internal/cloud/backend_state.go

brandonc · 2022-06-14T15:56:43Z

internal/cloud/backend_state.go

 	}

 	return nil
 }

+func getSchemas(ctxOpts *terraform.ContextOpts, services *disco.Disco, state *states.State) (*terraform.Schemas, error) {


I'd love to get another opinion on the placement of these operations. It seems like some of this should have been initialized as part of the apply command and I'm wondering how we can leverage the config we already loaded because I assume all this is fairly expensive.

All of this data has been initialized earlier as part of the apply command. For example we assign a value to schemas at the (b *Local) opApply() level:

terraform/internal/backend/local/backend_apply.go

Line 83 in 79f12b8

schemas, moreDiags := lr.Core.Schemas(lr.Config, lr.InputState)

which calls (*Local) localRun()
It's within this method that we create a state manager, init the remoteClient and write/persist the state (which is the key operation for uploading state).
You'd think that maybe throughout this process *terraform.Context could have collected schemas data, but that doesn't seem to be the case. We reload the schemas
at different times, at different places. Though an expensive operation, maybe it's a safe thing?
I'll bring this concern to the core team to get another opinion.

Hi there! 😀

I must confess that I've not yet been able to load all of the context about the overall goals here, and so for this comment I'm focusing on the specific architecture question this thread seems to be tackling and so I might be missing some bigger details but hopefully this comment is still useful to help explore some options here.

The main thing I'm noticing while I revisit this code is that we seem to have a bit of an abstraction inversion here: the remote.State state manager implementation aims to make it easier to implement remote state storage against generic blob stores, and so it tries to encapsulate the problem of JSON-serializing and JSON-deserializing the state so that the remote.Client implementation (which is what this remoteClient struct is) needs to deal only with opaque []byte to read and write.

If we look out at the bigger picture then, this current design has the following redundancy:

Cloud.StateMgr instantiates remoteClient and returns it wrapped in a remote.State.

When ask to persist state, remote.State serializes the state into a JSON snapshot and passes it to the client it's wrapping.

This particular remote client remoteClient then parses that just-created JSON, extracts data from it, and re-serializes it using the other serialization format.

My sense then is that the remote.Client abstraction is hindering rather than helping now that we need access to the real *states.State object in order to make the Terraform Cloud API request.

If you look in package statemgr you can see the interface that the CLI layer expects backends to implement for state storage, which is unfortunately defined in terms of a bunch of other interfaces and so kinda hard to quickly review but the main thing to notice is that it transitively includes the statemgr.Writer interface which has method WriteState(*states.State) error that is what the CLI layer is actually going to call when it has a new state snapshot to share. That *states.State object is not yet JSON encoded and so you can work with it directly without having to parse it first.

Based on this, I think my first recommendation would be to consider moving away from using remote.State for this backend and to write a custom statemgr.Full implementation in this package instead. I think you can probably copy remote.State into here as a starting point since I expect it will still benefit from some of the same logic around tracking state serials and lineage, but crucially it can have a different implementation of PersistState that will avoid the need to re-parse the state because it will have access to the *states.State snapshot already cached inside the state manager itself.

This admittedly still doesn't answer how this new state manager can get access to the schemas without completely reloading them from scratch, but it at least removes one extra redundant layer from the design so that we don't need to fight an abstraction that has different goals.

I think one thing that makes this extra tricky is that when running in local operations mode with the cloud integration the "backend" that the CLI layer is talking to is a bit of a "turducken" of odd layers: a cloud.Cloud with local set to a local.Backend with its nested Backend pointing back to the original cloud.Cloud. 😬

This makes life considerably harder because I think it's the local.Backend sandwiched between the two pointers to cloud.Cloud that is the one actually creating the Terraform Core context and thus loading the schemas, but it's the cloud.Cloud that actually needs access to the schemas in order to implement its StateMgr method and pass the schemas into the state manager.

My best idea right now is to consider changing the statemgr.Writer interface so that WriteState takes both the *states.State as it has today and a reference to the schemas, with the meaning "write this state snapshot using these schemas". This is awkward because every other statemgr.Writer implementation will just ignore that extra argument altogether, but perhaps that's okay because we have relatively few real implementations of statemgr.Full anyway: the statemgr.Filesystem one used for local state, and remote.State used for all of the state-storage-only backends. (there are a few others lurking for testing purposes only too, I think.)

I think the trick here will be tracking down all of the places where we call WriteState on a state manager and making sure that they always have access to the current schemas too. If that's true without any deeper refactoring then this might be the least-messy path forward, although I'd welcome any other ideas or counterproposals from others on the team.

uturunku1 · 2022-06-21T03:30:49Z

I will use a different branch for these changes: #31284

github-actions · 2022-07-22T02:38:15Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

vercel bot deployed to Preview June 13, 2022 23:03 View deployment

check for nil

4bffb49

vercel bot deployed to Preview June 13, 2022 23:54 View deployment

add panic in case of nil

79f12b8

brandonc reviewed Jun 14, 2022

View reviewed changes

vercel bot deployed to Preview June 14, 2022 15:57 View deployment

remove duplicate of error message

2173ae6

vercel bot deployed to Preview June 14, 2022 21:00 View deployment

This was referenced Jun 17, 2022

drafting a different solution #31265

Closed

internal/cloud/state.go: upload jsonState to TFC #31284

Closed

uturunku1 closed this Jun 21, 2022

annawinkler mentioned this pull request Jul 20, 2022

Include JSONStateOutputs #31466

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/cloud/backend_state.go: upload json state #31241

internal/cloud/backend_state.go: upload json state #31241

uturunku1 commented Jun 13, 2022 •

edited

brandonc left a comment

brandonc Jun 14, 2022

uturunku1 Jun 14, 2022

apparentlymart Jun 15, 2022 •

edited

uturunku1 commented Jun 21, 2022

github-actions bot commented Jul 22, 2022

internal/cloud/backend_state.go: upload json state #31241

internal/cloud/backend_state.go: upload json state #31241

Conversation

uturunku1 commented Jun 13, 2022 • edited

brandonc left a comment

Choose a reason for hiding this comment

brandonc Jun 14, 2022

Choose a reason for hiding this comment

uturunku1 Jun 14, 2022

Choose a reason for hiding this comment

apparentlymart Jun 15, 2022 • edited

Choose a reason for hiding this comment

uturunku1 commented Jun 21, 2022

github-actions bot commented Jul 22, 2022

uturunku1 commented Jun 13, 2022 •

edited

apparentlymart Jun 15, 2022 •

edited