Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for state in cloud object storage (S3, GCS, Azure) #2455

Open
wants to merge 11 commits into
base: master
from

Conversation

@Place1
Copy link

Place1 commented Feb 16, 2019

This PR introduces support for storing pulumi state in cloud object storage. The implementation makes use of go-cloud's blob api

The following backends are now supported:

  • S3: pulumi login s3://my-pulumi-state-bucket
  • Azure Blob Storage pulumi login azblob://my-pulumi-state-bucket
  • Google Cloud Storage pulumi login gs://my-pulumi-state-bucket

Of course the original file storage is unchanged

  • File: pulumi login file://my-file-state

The local file storage layout is unchanged and so this should not be a breaking change for existing users. Go Cloud's file:// driver does create <file>.attr files though, i.e. when .pulumi/stacks/my-stack.json is created go-cloud will also create a .pulumi/stacks/my-stack.json.attr, I'm not sure if this has the potential to be an issue, from my testing it's fine 馃槃

Authentication for the cloud backends is handled via go-cloud and is standard for the respective ecosystem. go-cloud has docs on this that pulumi will probably need to reference or directly quote in their docs.

the cloud storage is not safe to be used concurrently, i plan to add support for a distributed lock (lock file) in the bucket so that only 1 concurrent deployment can be in action at a time - this might need to happen as future work though because it's a little bit tricky (how do you handle timeouts/failures for example).

the file layout in the cloud storage bucket is identical to the existing layout from the file:// backend as a result of this change just being an extension to that filestate backend.

this PR is in essence the smallest possible change to support these backends. no work has been done to refactor the codebase to make sure things are in the correct place, named more appropriately or designed to be more performant.

@Place1

This comment has been minimized.

Copy link
Author

Place1 commented Feb 16, 2019

This PR should address #1966

Place1 added some commits Feb 16, 2019

@Place1 Place1 referenced this pull request Feb 17, 2019

Open

S3 state backend #1966

@jjcollinge

This comment has been minimized.

Copy link

jjcollinge commented Feb 17, 2019

Now go-cloud supports Azure, GCP, AWS this is probably the preferred route over my initial PR (#2026). I can't check whether go-cloud also supports blob locking/leasing right now but it's something that's important for any multi-user use cases. Granted, this might be out the scope of this PR though.

@Place1

This comment has been minimized.

Copy link
Author

Place1 commented Feb 18, 2019

Oh I didn鈥檛 even know about your PR @jjcollinge, sorry to just barge in here. I searched and must have overlooked it.

I decided to use go-cloud simply because it was referenced in #1966

@joeduffy

This comment has been minimized.

Copy link
Member

joeduffy commented Feb 18, 2019

This is great 馃帀 Thank you for doing this. It's a long weekend here at PulumiHQ, but we'll make sure someone takes a look shortly. I am a big fan of the approach you've taken -- go-cloud is awesome!

@jjcollinge

This comment has been minimized.

Copy link

jjcollinge commented Feb 18, 2019

@Place1 no worries! I think go-cloud is the right way to go. I was planning on using it on my PR but it didn't have support for Azure yet and that's what I needed. So it's great this work has been revisited!

@piclemx

This comment has been minimized.

Copy link

piclemx commented Feb 18, 2019

@jjcollinge I think you right for the locking mechanism. I think this could be done using go-cloud also. But it's a great first start. Thanks @Place1

@Place1

This comment has been minimized.

Copy link
Author

Place1 commented Feb 19, 2019

@piclemx thanks, i think the locking should be done as a follow up PR because it's a little tricky due to different consistency models between Google, Azure and AWS.

  • GCS: strong consistency docs
  • AzBlob: strong consistency docs
  • S3: eventual consistency docs

With GCS and AzBlob we could probably just implement a basic lock file approach (glossing over the zombie lock file problem here) because strong read-after-write consistency allows this to work.

S3 would require a different solution though. Terraform handles this by using a DynamoDB table, which i assume is used as a workaround to S3's eventually consistent storage model. Personally this solution kinda sucks for a CLI tool though. It's yet-another-thing-to-deploy before i can use a tool to start deploying... I unfortunately don't have any better ideas. S3 does have strong consistency for read-after-write operations for PUTS of new objects, perhaps this is enough to work with?

@piclemx

This comment has been minimized.

Copy link

piclemx commented Feb 19, 2019

@Place1 Yeah, i think we should follow the terraform way of doing it. I have read the same thing for S3. But like I said maybe as a first start that should be enough. We will see if everything works or we have some problems with s3.

@Place1

This comment has been minimized.

Copy link
Author

Place1 commented Feb 20, 2019

@piclemx after thinking about it a little, the worst that can happen is that the state gets locked for longer than it really is; because s3 has strongly consistent read-after-write for new objects, a created lockfile will correctly lockout other concurrent users. Eventually consistent deletes of the lockfile just make other users who receive a stale read on the lock think that the state is still locked when in reality it isn't - but that's not really a disaster, just slow.

Place1 added some commits Mar 3, 2019

renamed IsLocalBackendURL function to IsFileStateBackendURL, replaced鈥
鈥 localBackendURL with supportedSchemes slice
@srs

This comment has been minimized.

Copy link

srs commented Mar 11, 2019

Hi. Just polling the state of this. Any timeline on when this could be merged into master?

Thanks.

@ellismg

This comment has been minimized.

Copy link
Member

ellismg commented Mar 27, 2019

Hi. Just polling the state of this. Any timeline on when this could be merged into master?

Sorry for letting this drag on so long. This is 100% on me. I wanted to explicitly say that we're excited about landing these changes, and there isn't a conflict of interest between the open source effort and the company behind Pulumi. The delay is just because we've been focused on other stuff. So, again, sorry for letting this drag and we'll try to do better. It is one of my commitments to merge this during this sprint.

One thing we would like to do as part of this larger effort is add some instrumentation to collect some data on the number of folks using state storage outside of the Pulumi service, so we both understand how popular the tool is as well what individual storage providers get used. So, as part of this sprint we'll also be adding a small event to pulumi login which will record the following information:

  1. The protocol of the backend being used (local filesystem, s3, azure, etc).
  2. A uniquely generated machine id stored in ~/.pulumi. We envision using a V4 UUID (so randomly generated, not derived from any machine specific information). We want to this so we can try to get a sense of unique users. We recognize that this really only works on developer machines (for example, when using Pulumi in CI, ~/.pulumi will not persist across runs, so each login will look like a new unique user, but being able to at least get some level of bucketing from long lived machines will be helpful for us)

When we do this, we'll provide a way to opt out via an environment variable, and won't collect any additional data on any other code-path.

I'm doing a pass right now (in general everything looks great and we love how its able to leverage the google library). Since things have dragged on so long, I'll also help deal with any conflicts trying to get this up to speed with master.

Thanks again for the submission and your continued patience here.

@ellismg
Copy link
Member

ellismg left a comment

A small request to replace the newly added context.Background()'s with context.TODO()'s so were honest about the debt here.

I don't really have concerns about the additional metadata files that this storage library is going to add for the local case, but I do want to confirm that for folks using the local backend today, upgrading doesn't hork them (since these files will not be present).

Overall, this looks really great, so thank you for both the addition of this feature and your patience on landing this.

I'm comfortable landing this without any form of admission control, but that can come on in a follow on PR if you or someone else are interested in adding it. For now, we'll just note in the changelog that when using these storage systems, pulumi makes no attempt to ensure multiple updates for a single stack are not run concurrently, and you need some larger orchestrator to handle that, if needed.

}
}

bucket, err := blob.OpenBucket(context.Background(), url)

This comment has been minimized.

Copy link
@ellismg

ellismg Mar 27, 2019

Member

Could we use context.TODO() in all of these places, so we can go back later and actually contextify all of these codepaths?

return errors.Wrapf(err, "writing destination object during rename of %s", source)
}

err = bucket.Delete(context.Background(), source)

This comment has been minimized.

Copy link
@ellismg

ellismg Mar 27, 2019

Member

Just wondering if swallowing the error here is the right thing. Logically, the rename operation failed, and so I think this should fail as well. My concern here is along the following lines: A user does an operation that requires us to rename a file. We call this function and then file is copied but the old version is not deleted. The overall operation completes correctly, but the next time things run, we see state where both the old and new file exist. This would be surprising since the previous operation seemed to pass.

For example, In some places we'll call backupTarget which bottoms out here. If the delete fails, we now have the new and old versions present. IIRC, In at least one case, we call backupTarget in order to "remove" a stack, and if the delete here fails the overall pulumi stack rm invocation would pass, but the next invocation of pulumi stack ls would list it, which would be confusing.

This comment has been minimized.

Copy link
@ellismg

ellismg Mar 27, 2019

Member

On a related note, I wonder if we should try to delete the destination file in the case where removing the older file failed. I can see pros and cons of either approach, curious on your thoughts here.

@ellismg ellismg self-assigned this Mar 27, 2019

@ellismg ellismg added this to the 0.22 milestone Mar 27, 2019

@ellismg ellismg added the priority/P1 label Mar 27, 2019

func IsLocalBackendURL(url string) bool {
return strings.HasPrefix(url, localBackendURLPrefix)
func IsFileStateBackendURL(url string) bool {
for _, scheme := range supportedSchemes {

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

FYI, I filed google/go-cloud#1695 to add a function to gocloud.dev/blob that determines if a scheme is supported; then you could replace this with something like:

u, err := url.Parse(urlstr)
if err != nil {
  return false // ? or maybe true to force the error?
}
return blob.DefaultMux().IsValidScheme(u.Scheme)
if strings.HasPrefix(url, "file://") {
// For file:// backend, ensure a relative path is resolved. fileblob only supports absolute paths.
localPath, _ := filepath.Abs(strings.TrimPrefix(url, "file://"))
url = path.Join("file://", localPath)

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

path.Join is not what you want here; it drops duplicate slashes:

https://play.golang.org/p/lgnymhswC54

The OpenBucket example here may be helpful.

// On Unix, append the dir to "file://".
// On Windows, convert "\" to "/" and add a leading "/":
dirpath := filepath.ToSlash(dir)
if os.PathSeparator != '/' && !strings.HasPrefix(dirpath, "/") {
    dirpath = "/" + dirpath
}

b, err := blob.OpenBucket(ctx, "file://"+dirpath)
...

This comment has been minimized.

Copy link
@retr0h

retr0h Mar 28, 2019

path.Join could be used, but this is probably a cheesy way to accomplish this.

https://play.golang.org/p/Ih_7Fqm_wzz

This comment has been minimized.

Copy link
@vangent

vangent Mar 28, 2019

https://play.golang.org/p/5wbCMuJpF7W
is closer. But it doesn't do the special Windows things.

This comment has been minimized.

Copy link
@retr0h

retr0h Mar 28, 2019

oh, that's neat!

@@ -401,7 +422,7 @@ func (b *localBackend) apply(ctx context.Context, kind apitype.UpdateKind, stack
fmt.Printf(
op.Opts.Display.Color.Colorize(
colors.SpecHeadline+"Permalink: "+
colors.Underline+colors.BrightBlue+"file://%s"+colors.Reset+"\n"), stack.(*localStack).Path())
colors.Underline+colors.BrightBlue+"%s%s"+colors.Reset+"\n"), b.urlScheme(), stack.(*localStack).Path())

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

What kind of a link is this? Is the user expected to be able to click on it? The gocloud.dev URLs work for gocloud.dev APIs, but not in your browser. If you need the latter, you can use Bucket.SignedURL.

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

And what does Path() return?

if err != nil {
return nil, errors.Wrap(err, "could not list bucket")
}
files = append(files, file)

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

Nit: would it be more clear to drop directories here rather than in the caller?

}

for _, file := range files {
err = bucket.Delete(context.Background(), file.Key)

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

See my comment above; you are not filtering out directories here.

return nil
}

// renameObject renames an object in a bucket. the rename requires a download/upload of the object due to a go-cloud API limitation

This comment has been minimized.

Copy link
@vangent
// newer ones. Loop backwards so we added the newest updates to the array we will return first.
for i := len(allFiles) - 1; i >= 0; i-- {
file := allFiles[i]
filepath := path.Join(dir, file.Name())
filepath := path.Join(dir, objectName(file))

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

Is this needed? IIUC, path.Join(dir, objectName(file)) == file.Key...?

if err = os.MkdirAll(backupDir, 0700); err != nil {
return err
// Ensure the backup directory exists (only required for local filesystem).
if b.urlScheme() == "file://" {

This comment has been minimized.

Copy link
@vangent

vangent Mar 27, 2019

Is this needed? I believe fileblob will do it for you as part of the write:
https://github.com/google/go-cloud/blob/master/blob/fileblob/fileblob.go#L435

@joeduffy

This comment has been minimized.

Copy link
Member

joeduffy commented Apr 14, 2019

@ellismg Could you provide an ETA overall for when you expect this to be merged?

Place1 added some commits Apr 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.