Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Initial import of working mutex. #1
Conversation
davecheney
reviewed
Jun 16, 2016
| +) | ||
| + | ||
| +var ( | ||
| + ErrTimeout = errors.New("timeout acquiring mutex") |
fwereade
Jun 16, 2016
It implies that we expect people to take specific action in response to this error, and I don't actually know what that would be. Tim?
howbazaar
Jun 16, 2016
Owner
Acquiring a mutex can fail for other reasons. Yes we don't expect them to fail for those reasons, but they are valid none the less. The caller can at least check that the cause of the error was the Timeout they expected, or something more fatal. That was my reasoning
davecheney
reviewed
Jun 16, 2016
| + "github.com/juju/errors" | ||
| +) | ||
| + | ||
| +type impl struct { |
davecheney
reviewed
Jun 16, 2016
| @@ -0,0 +1,49 @@ | ||
| +// Copyright 2016 Canonical Ltd. |
davecheney
reviewed
Jun 16, 2016
| +} | ||
| + | ||
| +func acquire(name string) (Releaser, error) { | ||
| + fd, err := syscall.Open(flockName(name), syscall.O_CREAT|syscall.O_RDONLY, 0600) |
mjs
Jun 16, 2016
•
Won't a mode of 0600 mean that only processes belonging to the user which first created the lock will be able to interact with it? That's probably OK for the Juju use case but is a bit surprising for a machine-wide lock.
I guess we need to decide whether cross-account interactions are a thing we want this to support.
davecheney
Jun 16, 2016
0600 is probably ok. Not being able to read, or even better, write to the file someone else has an fslock on is good insurance.
howbazaar
Jun 16, 2016
Owner
In reality, all the system flocks are interacted with by root as the user running jujud, or as a user for the config store. As @axw mentioned, the creator of the config store mutex should give it a user based name.
davecheney
reviewed
Jun 16, 2016
| + if err != nil { | ||
| + syscall.Close(fd) | ||
| + | ||
| + if err == syscall.EWOULDBLOCK { |
davecheney
Jun 16, 2016
Why do you call our EWOULDBLOCK here ? Why not just return errors.Annotatef(err, "unable to aquire file lock on %q", name)
howbazaar
Jun 16, 2016
Owner
Because we need to know the difference between the lock currently being locked, and other errors. If it is currently locked, the outer acquire function retries.
davecheney
reviewed
Jun 16, 2016
| +} | ||
| + | ||
| +// Release implements Releaser. | ||
| +func (i *impl) Release() error { |
fwereade
Jun 16, 2016
I think it should just error, not panic; but I think we should consciously error if invoked twice, instead of depending on the syscall.
Dave, what's the benefit of a panic here? All I see are costs, basically...
davecheney
Jun 16, 2016
I disagree. If you cannot release a lock you held, then by definition things are pretty fucked up. You cannot release the lock, and worse, no other process on the machine can aquire the lock.
This is such an extreme condition I think
a. we cannot allow the caller to ignore the error, and in fact I don't think there is a way they can handle it safely. Anything that quietly tucks it into a log file and goes through the internal worker restart process will not fix the sitaution.
b. All our locks are tied to the process, the invariant is, no matter what, killing the process will release the lock. So, if the lock cannot be released (I don't even know how this would happen), then by definition the way to release the lock is to kill the process via panic, or even os.Exit(1) (to make sure the panic is not caught)
fwereade
Jun 16, 2016
I anticipate that the common scenario is a double-release (and I don't know how a real release would fail either); that doesn't seem like a good reason to take down the whole process.
I could probably be swayed into agreeing that an error actually releasing the lock might be nuke-worthy; but the common simple case is murphy's law (of interfaces): if it can be used wrong, it will be used wrong. I'm much more concerned that people will write code paths that rarely double-release than I am that release will actually fail...
davecheney
Jun 16, 2016
How about this then
type Releaser interface {
// Release releases the mutex.
// Release may be called multiple times, but only the first
// call will release this instance of the mutex.
// Release is unable to release the mutex successfully it
// will call panic to forcibly release the mutex.
Release()
}Some justification
- If Release is idempotent, ie it releases the lock then has no effect, it should be clear that calling it mulitple times won't release another instance of the same lock. Ie.
var spec Spec = ...
r := mutex.Acquire(spec)
r.Release()
r2 := mutex.Acquire(spec)
r.Release() // does not thing, and does not affect r2
- My comments about panic'ing if the mutex cannot be released are the same as elsewhere.
howbazaar
Jun 16, 2016
Owner
Actually I quite like @davecheney's interface definition above. The whole point of the mutex is to have a system lock. If it can't be removed for some reason, then taking down the process to release the lock has a certain form of sanity.
|
Need to add a test for different named locks not interfering. |
davecheney
reviewed
Jun 16, 2016
| +} | ||
| + | ||
| +func flockName(name string) string { | ||
| + currentUser := os.Getenv("USER") |
davecheney
Jun 16, 2016
USER is not going to be set very often. Why not drop this and just make it /tmp/juju-mutex-+name
mjs
Jun 16, 2016
•
Again, whether you want the username in there depends on whether this should support cross-account interactions.
If you do want the username in there then os/user.Current() is a more reliable approach. It even works on Windows.
howbazaar
Jun 16, 2016
Owner
I think that the caller should care about splitting out the name, not the mutex.
davecheney
reviewed
Jun 16, 2016
| + | ||
| +const prefix = "@/var/lib/juju/mutex-" | ||
| + | ||
| +type impl struct { |
davecheney
reviewed
Jun 16, 2016
| + } | ||
| + l, err := net.ListenUnix("unix", addr) | ||
| + if err != nil { | ||
| + return nil, errors.Wrap(err, ErrLocked) |
axw
reviewed
Jun 16, 2016
| + if currentUser == "" { | ||
| + currentUser = "client" | ||
| + } | ||
| + return "/tmp/juju-" + currentUser + "-" + name |
axw
Jun 16, 2016
Member
This means different behaviour to the other implementations: name becomes scoped to $USER. It's meant to be machine scoped, right? If the caller wants it scoped by user, then they should include $USER in name.
Also I think the name could do with a bit more context: maybe "/tmp/juju-mutex-" as a prefix. Finally, you should probably use os.TempDir() instead of hard-coding to /tmp.
davecheney
Jun 16, 2016
Excellent point. If this lock is supposed to be user scoped, then the caller should include this in the name of the mutex.
davecheney
reviewed
Jun 16, 2016
| + | ||
| +// Release implements Releaser. | ||
| +func (i *impl) Release() error { | ||
| + return i.socket.Close() |
fwereade
Jun 16, 2016
I agree we shouldn't double-close the socket, but why do we need to panic instead of just informing the client that they're an idiot?
davecheney
reviewed
Jun 16, 2016
| +type Releaser interface { | ||
| + // Release makes the mutex available for others and invalidates the instance. | ||
| + // Subsequent calls to Release will error. | ||
| + Release() error |
davecheney
Jun 16, 2016
•
Why shold release return an error ? is anyone going to check it ? And what would you do if releasing a lock failed ?
Ie, if you could not release a mutex that is held across the whole machine, the process should dump, 'cos that's the only way to recover the lock.
I suggest making release not return an error and panic if it fails.
mjs
Jun 16, 2016
•
Dave: That makes me pretty uncomfortable. Sure, ignore the error if you want but not giving the caller a choice on what should happen (even if only to log the problem and then exit) seems dicey to me.
davecheney
Jun 16, 2016
I disagree. If you cannot release a lock you held, then by definition things are pretty fucked up. You cannot release the lock, and worse, no other process on the machine can aquire the lock.
This is such an extreme condition I think
a. we cannot allow the caller to ignore the error, and in fact I don't think there is a way they can handle it safely. Anything that quietly tucks it into a log file and goes through the internal worker restart process will not fix the sitaution.
b. All our locks are tied to the process, the invariant is, no matter what, killing the process will release the lock. So, if the lock cannot be released (I don't even know how this would happen), then by definition the way to release the lock is to kill the process via panic, or even os.Exit(1) (to make sure the panic is not caught)
fwereade
Jun 16, 2016
Yeah, I don't think that one little utility package really gets to decide that the whole process is unviable just because (most likely scenario) someone deferred a second Release or something.
I know that many people are pretty goddamn lazy about defer ThingThatErrors(), and that does piss me off, but I don't think it's a very good reason to panic... I do think we should expect and anticipate pathologically stupid usage, and surely we're in a perfect position to DTRT by intercepting extra releases?
davecheney
Jun 16, 2016
Calling Release twice should panic for the same reason that closing a channel twice panics. This is a programming error, and laziness begats laziness.
fwereade
Jun 16, 2016
I'm trying to remember a time where I've thought "golly, that panic was a good idea, I'm glad it took down the controller" rather than "GAAAAH FFS". Inability to release a global lock it holds probably does qualify; and if the client was doing something super-evil that we somehow couldn't prevent, I might appreciate that; but a double-release (that we can trivially detect and deflect without panicking) is entirely predictable.
(counterargument: if we don't impose extreme consequences for double-releasing, people will just do it anyway and ignore all errors... and, well, yeah, they could, but that would just be shitty code, which I think is (sadly) unexceptional and (happily) at least addressable.)
davecheney
Jun 16, 2016
I'll concede the double-release argument if we can agree that a failure to release a lock is panic worthy.
davecheney
reviewed
Jun 16, 2016
| + Release() error | ||
| +} | ||
| + | ||
| +// Acquirer defines the standard method to acquire a mutex. |
davecheney
Jun 16, 2016
Please delete this. If some other code needs this it should define it in its package.
davecheney
reviewed
Jun 16, 2016
| + Cancel <-chan struct{} | ||
| +} | ||
| + | ||
| +// Acquire implements Acquirer. |
mjs
Jun 16, 2016
I agree that there should only be one way, either Spec.Acquire or Acquire. Sticking with Spec.Acquire has the benefit of easier mocking via an interface when testing something that takes one of these.
davecheney
Jun 16, 2016
+1, there should be one way. And if it's via Spec.Acquire, then please rename Spec to Mutex so you have
var m mutex.Mutex {
/// spec stuff
}
r, err := mutex.Acquire()
if err { ... }
r.Relaese()
fwereade
Jun 16, 2016
I'd slightly prefer Acquire(spec), FWIW; but I don't really mind.
Strongly favour having just one way to do it.
davecheney
Jun 16, 2016
•
@fwereade same here. One of the design decisions that thumper and I talked about was never having a Mutex which is not locked -- that is, if you have a value which implements Releaser your process holds the lock.
Currently a lot of the fslock code set's up the lock (to coin a phrase) then passes around that *fslock inside a config or a context or some other structure to the point that the calling code wants to grab the lock.
This has the downside that if your function receives an *fslock, you don't know if it's locked or not. And this is a big problem, because this lock is machine wide. Any operation like IsLocked() bool is inherently racy because between the time of asking "is it locked" and deciding "right, it's not locked, I'll go lock it" or worse, assume that it is not locked while the code does something, the lock could be take by any other process on the machine.
The mutex.Spec in this type avoids this ambiguity as it is the "setup" for a lock, and only when you want to grab the lock to you pass the spec to mutex.Acquire() and either grab the lock or don't. That is to say, if you have a Spec, you don't hold the lock, all you have is a description of the lock you'd like to hold.
So, with that said, I'd prefer mutex.Acquire(Spec), rather than func (m *Mutex) Acquire(). Because the latter is oh so tempting to add an IsLocked method.
davecheney
reviewed
Jun 16, 2016
| + impl, err := acquire(spec.Name) | ||
| + if err == nil { | ||
| + return impl, nil | ||
| + } else if errors.Cause(err) != ErrLocked { |
davecheney
Jun 16, 2016
I think the logic should be to try to aquire the lock until the timeout occurs. Any error triggers a retry.
mjs
Jun 16, 2016
•
Why? Then the code is almost certainly going to be spinning around, butting against problem isn't going to go away. Better report any unexpected error immediately.
davecheney
reviewed
Jun 16, 2016
| + case <-timeout: | ||
| + return nil, errors.Trace(ErrTimeout) | ||
| + case <-spec.Cancel: | ||
| + return nil, errors.Trace(ErrCancelled) |
davecheney
commented
Jun 16, 2016
|
Looks pretty good to me, but I want to see the API surface reduced, specifically I want to see the error values unexported. |
|
Key rationale to have an Acquirer interface is to be able to mock out the Mutex. To have something that wants a mutex.Acquirer, but doesn't actually want to interact with the system. I have removed the two ways to do things, and stopped exporting errLocked. I do still think it is valuable though to have ErrTimeout and ErrCancelled.exported. |
davecheney
commented
Jun 16, 2016
•
What is the requirement to mock a mutex? If you want to have a dummy mutex then make up a random spec, heck write a testing helper, |
davecheney
reviewed
Jun 17, 2016
| + | ||
| +// Release implements Releaser. | ||
| +func (m *mutex) Release() { | ||
| + if m.socket == nil { |
davecheney
Jun 17, 2016
Do you support this ?
r := mutex.Acquire()
go r.Release()
r.Release()You may need to add a lock to prevent two Release's happening concurrently as m.socket is not nil'd out until after the call to close.
davecheney
reviewed
Jun 17, 2016
| + | ||
| +// Release implements Releaser. | ||
| +func (m *mutex) Release() { | ||
| + if m.fd == 0 { |
howbazaar commentedJun 16, 2016
Tested on linux, my mac book air, and on windows 10.