Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/cloud/state.go: upload jsonState to TFC #31284

Closed
wants to merge 21 commits into from

Conversation

uturunku1
Copy link
Contributor

@uturunku1 uturunku1 commented Jun 21, 2022

Description

The main goal of these changes is to send the state in json format (as shown by the cmd terraform show -json) up to the platform whenever a state version is created. In other words, when we do the api request client.StateVersions.Create(ctx, r.workspace.ID, options), we want to send out that json state as part of the tfe.StateVersionCreateOptions.

For achieving this goal I am calling func jsonstate.Marshal(sf *statefile.File, schemas *terraform.Schemas) ([]byte, error) to access that json state.

Implementation Details

To be able to call jsonState.Marshal we need two very important arguments: *statefile.File and *terraform.Schemas

Changes required for Schemas:

Schemas was not available within the cloud integration.

To make schemas accessible, I am adding Schemas as a second argument to the method stateMgr.WriteState(). This method takes as an argument a *states.State object and records the latest snapshot of a state. Now, additionally to caching the latest state, it will take a *terraform.Schemas and associate it with the State. These changes apply specifically to the cloud.State:

s.schemas = schemas

Unfortunately other state managers (stateMgr), that are not the cloud integration state manager, are also relying on WriteState() method and they will have to inherent the extra argument for schemas. In those cases, we are ignoring the argument and doing nothing with it.

func (s *Filesystem) WriteState(state *states.State, schemas *terraform.Schemas) error {
// TODO: this should use a more robust method of writing state, by first
// writing to a temp file on the same filesystem, and renaming the file over
// the original.
defer s.mutex()()
if s.readFile == nil {
err := s.refreshState()
if err != nil {
return err
}
}
return s.writeState(state, nil)

An extra step I needed to take to make schemas available to all the cloud state manager calling WriteState(), was to load schemas in a serious of steps:

1.load a *configs.Config
2.load a *terraform.Context
3. collect *terraform.Schemas with (*terraform.Context).Schemas() which takes as argument a *configs.Config and *states.State

Most places have a *states.State already available but not necessarily a config. As we would be repeating the same steps outlined above for other commands, like "import", "move" and "show", I extracted the same logic into this helper function: https://github.com/hashicorp/terraform/blob/cc06e9b0d48a7139a7c10d07190d4cf3330affa8/internal/command/helper.go

Notice that I did not load schemas for certain terraform commands that invoke writeState() like "taint", "untaint" and other deprecated and/or minor commands.

I chatted with Martin Atkins to find out if its okay to ignore loading schemas on those cmds. He responded that those cmds "only need to tweak some metadata on a resource instance and so have no need to modify the actual data that would be subject to the provider schema. I think my question here then would be what you'd want to send in the Terraform Cloud API for that case. Is it acceptable for those commands to generate state snapshots that wouldn't have associated JSON state export format attached to them?"

I think it is acceptable for those cases to not have an external JSON state for those cases. But I'd love to hear Brandon's (the Lead project) thoughts on this, as well as opinion of any of my reviewers.

Changes required for State:

To upload a new state version to the platform, we were using this method

func (r *remoteClient) Put(state []byte) error {

Which belongs to the pointer receiver *remoteClient. The remote.Client implementation, as well as the remote.State state manager implementation is common to all remote backend options (not just TFC). So, for example, S3 would also take a slice of bytes to upload to a remote state storage:

func (c *RemoteClient) Put(data []byte) error {

This implementations do not serve the needs of the Cloud integration. Dealing with that slice of bytes that is the state is very limiting for what we want to do with the state. We want that state, we want to parse it to extract data from it and then re-serialize it again using that json serializer to get the jsonState. It'd be much convenient to deal directly with the *states.State in its raw form, rather than parsing and serializing back and forth.

For those purposes, I needed to ignore the common implementations of the remote.Client and the remote.State.

I created a new type State within the Cloud package. This State is still casted as an implementation of an statemgr.Full except that it has customized fields and methods that target our needs. This whole new file is taking care of that work:

https://github.com/hashicorp/terraform/blob/cc06e9b0d48a7139a7c10d07190d4cf3330affa8/internal/cloud/state.go

Some of those methods have being copied from the previous remote.State but the other change are the combination of pouring over the logic from cloud/backend_state into this file and making adjustments to writeState() and persistState().

How to smoke test

  1. Use a config file that uses the "cloud" configuration block and that has changes that can be apply with terraform
  2. before running terraform apply, go to the workspace settings in the platform and select the local execution mode instead of the remote one.
    3)pull these PR changes locally and run "go install".
    4)then using this build, run ~/go/bin/terraform apply
    5)The new state version should upload successfully. Of course the state uploaded to the platform will not reflect yet the new external json state because we are missing that part of the implementation.
    6)But if you wanted to manually verify that we are successfully sending the jsonState in the createOptions to go-tfe, you could add a printing line within this method to expose jsonState (fmt.Println(string(jsonState)))

Other paths that I'd love for my PR reviewers to test out is that other commands have not broken due to my changes. If you can run operations like "import", "replace-provider" or any other cmds affected by my work, that'd be super helpful!

Additional notes

This was the my first attempt to solve this same PR's concern but in a easier way: #31241
The reasons why I went with these larger changes that you see in this PR can be found in this comment: #31241 (comment)

These commits will get squash after PR approval.

…replace usage of remoteClient which does not serve specific needs for the integration
@uturunku1 uturunku1 changed the title Add jsonstate to cloudbackendstate3 upload jsonState to TFC Jun 21, 2022
@uturunku1 uturunku1 changed the title upload jsonState to TFC internal/cloud/state.go: upload jsonState to TFC Jun 21, 2022
…g called as WriteState and PersistState separately
@brandonc
Copy link
Contributor

brandonc commented Jun 24, 2022

I smoke tested this and it seems to work for apply operations!

I noticed that ExtState was not being sent when migrating state from local to cloud. Is this because it's using the 'wrong' state manager? I'm not exactly sure how the new cloud state manager is selected. Are there any other places that could be using the old shared backend state manager?

I put together a larger integration test with the rest of the json state stuff and found that our API is transforming ExtState keys from dashes to underscores because it assumes the raw message is jsonapi. We may have to modify the interface and perhaps base64 encode the contents.

@uturunku1
Copy link
Contributor Author

I smoke tested this and it seems to work for apply operations!

I noticed that ExtState was not being sent when migrating state from local to cloud. Is this because it's using the 'wrong' state manager? I'm not exactly sure how the new cloud state manager is selected. Are there any other places that could be using the old shared backend state manager?

I put together a larger integration test with the rest of the json state stuff and found that our API is transforming ExtState keys from dashes to underscores because it assumes the raw message is jsonapi. We may have to modify the interface and perhaps base64 encode the contents.

This is great, this is exactly the feedback I was hoping to get from additional smoke testing. As you can see, there are many paths for updating/uploading state. I tried accounting for many of them and smoke testing them, but I was sure that I may be missing paths. So I am happy you help me found one of them. I'll take a closer look to other places that could be sharing the old backend state manager.
But if you get the chance, please continue smoke testing to help me find other potential cases not taken into account!

@@ -45,7 +45,7 @@ func RefreshAndRead(mgr Storage) (*states.State, error) {
// is required, call WriteState and PersistState on the state manager directly
// and handle their errors.
func WriteAndPersist(mgr Storage, state *states.State) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uturunku1 I wonder if we should also accept schema here and pass through to WriteState. That would feel more consistent with the interface as well as make a couple of caller migrations unnecessary.

@@ -62,5 +65,5 @@ type Writer interface {
// The caller must ensure that the given state object is not concurrently
// modified while a WriteState call is in progress. WriteState itself
// will never modify the given state.
WriteState(*states.State) error
WriteState(*states.State, *terraform.Schemas) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ I will document why schemas are present and when they are not needed

@brandonc
Copy link
Contributor

I'm going to revise this to serialize just outputs schema when creating new state versions.

@brandonc brandonc closed this Jun 30, 2022
@brandonc brandonc deleted the add-jsonstate-to-cloudbackendstate3 branch June 30, 2022 16:56
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants