Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Add support for state in cloud object storage (S3, GCS, Azure) #2455
This PR introduces support for storing pulumi state in cloud object storage. The implementation makes use of go-cloud's blob api
The following backends are now supported:
Of course the original file storage is unchanged
The local file storage layout is unchanged and so this should not be a breaking change for existing users. Go Cloud's file:// driver does create
Authentication for the cloud backends is handled via go-cloud and is standard for the respective ecosystem. go-cloud has docs on this that pulumi will probably need to reference or directly quote in their docs.
the cloud storage is not safe to be used concurrently, i plan to add support for a distributed lock (lock file) in the bucket so that only 1 concurrent deployment can be in action at a time - this might need to happen as future work though because it's a little bit tricky (how do you handle timeouts/failures for example).
the file layout in the cloud storage bucket is identical to the existing layout from the file:// backend as a result of this change just being an extension to that filestate backend.
this PR is in essence the smallest possible change to support these backends. no work has been done to refactor the codebase to make sure things are in the correct place, named more appropriately or designed to be more performant.
Now go-cloud supports Azure, GCP, AWS this is probably the preferred route over my initial PR (#2026). I can't check whether go-cloud also supports blob locking/leasing right now but it's something that's important for any multi-user use cases. Granted, this might be out the scope of this PR though.
@piclemx thanks, i think the locking should be done as a follow up PR because it's a little tricky due to different consistency models between Google, Azure and AWS.
With GCS and AzBlob we could probably just implement a basic lock file approach (glossing over the zombie lock file problem here) because strong read-after-write consistency allows this to work.
S3 would require a different solution though. Terraform handles this by using a DynamoDB table, which i assume is used as a workaround to S3's eventually consistent storage model. Personally this solution kinda sucks for a CLI tool though. It's yet-another-thing-to-deploy before i can use a tool to start deploying... I unfortunately don't have any better ideas. S3 does have strong consistency for read-after-write operations for PUTS of new objects, perhaps this is enough to work with?
@piclemx after thinking about it a little, the worst that can happen is that the state gets locked for longer than it really is; because s3 has strongly consistent read-after-write for new objects, a created lockfile will correctly lockout other concurrent users. Eventually consistent deletes of the lockfile just make other users who receive a stale read on the lock think that the state is still locked when in reality it isn't - but that's not really a disaster, just slow.
referenced this pull request
Mar 27, 2019
Sorry for letting this drag on so long. This is 100% on me. I wanted to explicitly say that we're excited about landing these changes, and there isn't a conflict of interest between the open source effort and the company behind Pulumi. The delay is just because we've been focused on other stuff. So, again, sorry for letting this drag and we'll try to do better. It is one of my commitments to merge this during this sprint.
One thing we would like to do as part of this larger effort is add some instrumentation to collect some data on the number of folks using state storage outside of the Pulumi service, so we both understand how popular the tool is as well what individual storage providers get used. So, as part of this sprint we'll also be adding a small event to
When we do this, we'll provide a way to opt out via an environment variable, and won't collect any additional data on any other code-path.
I'm doing a pass right now (in general everything looks great and we love how its able to leverage the google library). Since things have dragged on so long, I'll also help deal with any conflicts trying to get this up to speed with master.
Thanks again for the submission and your continued patience here.
ellismg left a comment
A small request to replace the newly added
I don't really have concerns about the additional metadata files that this storage library is going to add for the local case, but I do want to confirm that for folks using the local backend today, upgrading doesn't hork them (since these files will not be present).
Overall, this looks really great, so thank you for both the addition of this feature and your patience on landing this.
I'm comfortable landing this without any form of admission control, but that can come on in a follow on PR if you or someone else are interested in adding it. For now, we'll just note in the changelog that when using these storage systems, pulumi makes no attempt to ensure multiple updates for a single stack are not run concurrently, and you need some larger orchestrator to handle that, if needed.