Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/cloud/backend_state.go: upload json state #31241

Closed
wants to merge 4 commits into from

Conversation

uturunku1
Copy link
Contributor

@uturunku1 uturunku1 commented Jun 13, 2022

As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show much granularity for types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version with the Cloud integration. Go-tfe client makes this api request to create a state version in func (r *remoteClient) Put(state []byte) in backend_state.go

Implementation details:

To get the json format of state we can use the method Marshal() from the jsonstate package:

func jsonstate.Marshal(sf *statefile.File, schemas *terraform.Schemas) ([]byte, error)

This method returns a slice of bytes which already contains the state formatted into json.

Besides requiring a statefile.File as an argument, this Marshal() method requires *terraform.Schemas, which provides info about registry providers and provisioners.

In the context of func (r *remoteClient) Put(state []byte), we do not have information about schemas.

One way to get data about schemas would be through func (*terraform.Context).Schemas(config *configs.Config, state *states.State) (*terraform.Schemas, tfdiags.Diagnostics). But now this method requires us to have a *terraform.Context and *configs.Config.

Getting either schemas or a terraform.Context + a config into that func (r *remoteClient) Put(state []byte) method is not that simple, we cannot pass new parameters into this method because the signature for this method is shared by other types of backend besides Cloud.

We can pass down this data through the remoteClient struct though, which is the pointer receiver for this Put() method. Each backend type has its own remoteClient receiver so if we any changes to it for the cloud backend, then the changes stay in isolation.

The remoteClient for Cloud was defined as:

type remoteClient struct {
	client       *tfe.Client
	lockInfo     *statemgr.LockInfo
	organization string
	runID        string
	stateUploadErr  bool
	workspace    *tfe.Workspace
	forcePush    bool
}

But I am changing its stateUploadErr to be stateUpload

type stateUpload struct {
	ctx        *terraform.ContextOpts
	services   *disco.Disco
	hasErrored bool
}

NEXT STEPS:
-Fix test for cloud integration: the mock *Cloud backend needs a contextOps field with fake provider data
-Check with team on their thoughts about the approach/solution I am going for


As part of the Terraform Outputs project, we want the platform to store a different format of the state. The one is currently getting stored is a simplified version of the state that users see when they do the cmd terraform show -json. Specifically the outputs data does not show the proper types for each output value. To create alignment and consistency of how users see the state in the CLI and the platform, this PR will help out by sending the json format of state whenever a user creates a state version. Go-tfe client makes this api request to create a state version in the func (r *remoteClient) Put(state []byte) in backend_state.go
Copy link
Contributor

@brandonc brandonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback

.gitignore Outdated Show resolved Hide resolved
internal/cloud/backend_state.go Outdated Show resolved Hide resolved
}

return nil
}

func getSchemas(ctxOpts *terraform.ContextOpts, services *disco.Disco, state *states.State) (*terraform.Schemas, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to get another opinion on the placement of these operations. It seems like some of this should have been initialized as part of the apply command and I'm wondering how we can leverage the config we already loaded because I assume all this is fairly expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this data has been initialized earlier as part of the apply command. For example we assign a value to schemas at the (b *Local) opApply() level:

schemas, moreDiags := lr.Core.Schemas(lr.Config, lr.InputState)
which calls (*Local) localRun()
It's within this method that we create a state manager, init the remoteClient and write/persist the state (which is the key operation for uploading state).
You'd think that maybe throughout this process *terraform.Context could have collected schemas data, but that doesn't seem to be the case. We reload the schemas
at different times, at different places. Though an expensive operation, maybe it's a safe thing?
I'll bring this concern to the core team to get another opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there! 😀

I must confess that I've not yet been able to load all of the context about the overall goals here, and so for this comment I'm focusing on the specific architecture question this thread seems to be tackling and so I might be missing some bigger details but hopefully this comment is still useful to help explore some options here.

The main thing I'm noticing while I revisit this code is that we seem to have a bit of an abstraction inversion here: the remote.State state manager implementation aims to make it easier to implement remote state storage against generic blob stores, and so it tries to encapsulate the problem of JSON-serializing and JSON-deserializing the state so that the remote.Client implementation (which is what this remoteClient struct is) needs to deal only with opaque []byte to read and write.

If we look out at the bigger picture then, this current design has the following redundancy:

My sense then is that the remote.Client abstraction is hindering rather than helping now that we need access to the real *states.State object in order to make the Terraform Cloud API request.

If you look in package statemgr you can see the interface that the CLI layer expects backends to implement for state storage, which is unfortunately defined in terms of a bunch of other interfaces and so kinda hard to quickly review but the main thing to notice is that it transitively includes the statemgr.Writer interface which has method WriteState(*states.State) error that is what the CLI layer is actually going to call when it has a new state snapshot to share. That *states.State object is not yet JSON encoded and so you can work with it directly without having to parse it first.

Based on this, I think my first recommendation would be to consider moving away from using remote.State for this backend and to write a custom statemgr.Full implementation in this package instead. I think you can probably copy remote.State into here as a starting point since I expect it will still benefit from some of the same logic around tracking state serials and lineage, but crucially it can have a different implementation of PersistState that will avoid the need to re-parse the state because it will have access to the *states.State snapshot already cached inside the state manager itself.


This admittedly still doesn't answer how this new state manager can get access to the schemas without completely reloading them from scratch, but it at least removes one extra redundant layer from the design so that we don't need to fight an abstraction that has different goals.

I think one thing that makes this extra tricky is that when running in local operations mode with the cloud integration the "backend" that the CLI layer is talking to is a bit of a "turducken" of odd layers: a cloud.Cloud with local set to a local.Backend with its nested Backend pointing back to the original cloud.Cloud. 😬

This makes life considerably harder because I think it's the local.Backend sandwiched between the two pointers to cloud.Cloud that is the one actually creating the Terraform Core context and thus loading the schemas, but it's the cloud.Cloud that actually needs access to the schemas in order to implement its StateMgr method and pass the schemas into the state manager.

My best idea right now is to consider changing the statemgr.Writer interface so that WriteState takes both the *states.State as it has today and a reference to the schemas, with the meaning "write this state snapshot using these schemas". This is awkward because every other statemgr.Writer implementation will just ignore that extra argument altogether, but perhaps that's okay because we have relatively few real implementations of statemgr.Full anyway: the statemgr.Filesystem one used for local state, and remote.State used for all of the state-storage-only backends. (there are a few others lurking for testing purposes only too, I think.)

I think the trick here will be tracking down all of the places where we call WriteState on a state manager and making sure that they always have access to the current schemas too. If that's true without any deeper refactoring then this might be the least-messy path forward, although I'd welcome any other ideas or counterproposals from others on the team.

@uturunku1
Copy link
Contributor Author

I will use a different branch for these changes: #31284

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants