-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support locking on filestate logins #2697
Conversation
I wonder if we should try to support |
@ellismg I wasn't aware of that command but this definitely seems like something important to support. Does a |
FWIW I read the code its just not clear what happens once the cancel command has been sent to the API. Perhaps in the interest of getting this in sooner than later, I implement cancel as a lock removal step for filestate backends and a more intelligent notification/cancelling system is implemented in a separate PR? |
I might be forgetting something super-important here, but these are the two main side effects of running
Also note that a I would strongly suggest that we ensure For example, a naive solution would be to just record the PID for the process that takes the lock. And then have I don't have a super-elegant approach off the top of my head. But adding a hook to check the current status of that lock file before/after updating the checkpoint file in the |
I agree @chrsmith, what probably makes the most sense is to update the lock with some information saying to abort the process, and delete the lock. Meanwhile the snapshot process should check the lock for existence and an abort message. I'm suggesting the "abort message" to get around CAP issues with cloud providers, mainly in S3. Some quick looks into the snapshot process makes me think that it's going to take me a bit of time to get familiar with that code and implement something like that in a concise way. In interest of getting this merged sooner than later, can we separate some of the cancel support from this MR? What could a minimum |
@bigkraig I think to start we just need to document what happens when you end in up in this state. Can the error message that you get when you try to start an update (and another is already in flight) at least tell you what file you need to delete if you are sure the other update has completed? |
@ellismg Looks like this now
|
229b997
to
795b001
Compare
@ellismg friendly ping |
pkg/backend/filestate/lock.go
Outdated
err := b.bucket.Delete(context.TODO(), b.lockPath(stackRef.Name())) | ||
if err != nil { | ||
logging.Errorf("there was a problem deleting the lock at %v: %v", | ||
filepath.Join(b.url, b.lockPath(stackRef.Name())), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this not return an error as well. basically, at this point, things are in a very bad state, and it looks like manual cleanup is necessary. We may have to even explicitly state that that's the case.
pkg/backend/filestate/backend.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
defer b.Unlock(stackRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm personally not a fan of this pattern where each method needs to be prefixed with this code. It feels brittle to me since it would be easy to forget. I would prefer we do something similar to how we created an intercepting Bucket
wrapper around go-cloud's abstraction. In other words, i'd like to see something akin to:
type lockeableBackend struct {
localBackend *localBackend;
}
// all the methods will then look like this:
func (b *lockeableBackend) CreateStack(ctx context.Context, stackRef backend.StackReference, opts interface{}) (backend.Stack, error) {
err := b.Lock(stackRef)
if err != nil {
return nil, err
}
defer b.Unlock(stackRef)
return b.localBackend.CreateStack(ctx, stackRef, opts);
This means that we can easily see the pattern here and ensure that every operation that needs to be is appropriate locked. It also keeps the actual localBackend code blissfully unaware of locking, and it means we don't have to constantly be auditing it to make sure some new member does the right thing.
With this new pattern, if we needed to add a new localBackend member and we needed to expose it to someone else, we would have to go through this interface and we would have to ask ourselves "what's the locking rule here". That's a big plus in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this. The defer b.Unlock
is incompatible with returning an error from Unlock
as mentioned earlier. Can we forego the error return there in favor of this pattern?
You accidentally added a large binary file to the review. can you rewrite so it doesn't become part of history? Thanks! |
47a9985
to
44351f1
Compare
4ee609b
to
1e48dd0
Compare
Any idea when would this be released? |
@bigkraig I've opened #6437 which build on the work in this PR but updates it to work with latest Pulumi, fixes some implementation issues, adds initial basic test coverage, and moves the locking support behind an environment variable initially so that this can be rolled out safely. I've created this in a branch in the |
PR is now waiting for a maintainer to run the acceptance tests. Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR |
PR is now waiting for a maintainer to run the acceptance tests. Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR |
will this be released soon? |
/run-acceptance-tests |
Please view the results of the PR Build + Acceptance Tests Run Here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, but I wonder if we could actually make the way the lock prefixes work more granular?
PR is now waiting for a maintainer to run the acceptance tests. Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR |
/run-acceptance-tests |
Please view the results of the PR Build + Acceptance Tests Run Here |
Thanks @bigkraig for your work (and patience!) on this PR! 🎉 🚀 |
@lukehoban Should this flag be documented here? https://www.pulumi.com/docs/reference/cli/environment-variables/ |
Ping on this for anyone still following this. This wasn't the most pleasant way of finding out that we can have state lock |
@ringods My bad, and thank you for clarifying. |
When using the filestate backend (local files and cloud buckets) there is no protection to prevent two processes from managing the same stack simultaneously.
This PR creates a
locks
directory in the management directory that stores lock files for a stack. Each backend implementation gets its own UUID that is joined with the stack name. The feature is currently available behind thePULUMI_SELF_MANAGED_STATE_LOCKING=1
environment variable flag.@Place1 This follows the distributed locking idea you had in #2455.