Initial import of working mutex. #1

Merged
merged 1 commit into from Jun 17, 2016

Conversation

Projects
None yet
5 participants
Owner

howbazaar commented Jun 16, 2016

Tested on linux, my mac book air, and on windows 10.

+)
+
+var (
+ ErrTimeout = errors.New("timeout acquiring mutex")
@davecheney

davecheney Jun 16, 2016

Why do these need to be exported ?

@fwereade

fwereade Jun 16, 2016

It implies that we expect people to take specific action in response to this error, and I don't actually know what that would be. Tim?

@howbazaar

howbazaar Jun 16, 2016

Owner

Acquiring a mutex can fail for other reasons. Yes we don't expect them to fail for those reasons, but they are valid none the less. The caller can at least check that the cause of the error was the Timeout they expected, or something more fatal. That was my reasoning

@fwereade

fwereade Jun 17, 2016

makes sense to me

impl_flock.go
+ "github.com/juju/errors"
+)
+
+type impl struct {
impl_flock.go
@@ -0,0 +1,49 @@
+// Copyright 2016 Canonical Ltd.
@davecheney

davecheney Jun 16, 2016

rename to mutex_flock.go please

impl_flock.go
+}
+
+func acquire(name string) (Releaser, error) {
+ fd, err := syscall.Open(flockName(name), syscall.O_CREAT|syscall.O_RDONLY, 0600)
@davecheney

davecheney Jun 16, 2016

Why not use os.Create(); then os.Fd() ?

@howbazaar

howbazaar Jun 16, 2016

Owner

Because I copied it from elsewhere.

@mjs

mjs Jun 16, 2016

Won't a mode of 0600 mean that only processes belonging to the user which first created the lock will be able to interact with it? That's probably OK for the Juju use case but is a bit surprising for a machine-wide lock.

I guess we need to decide whether cross-account interactions are a thing we want this to support.

@davecheney

davecheney Jun 16, 2016

0600 is probably ok. Not being able to read, or even better, write to the file someone else has an fslock on is good insurance.

@howbazaar

howbazaar Jun 16, 2016

Owner

In reality, all the system flocks are interacted with by root as the user running jujud, or as a user for the config store. As @axw mentioned, the creator of the config store mutex should give it a user based name.

impl_flock.go
+ if err != nil {
+ syscall.Close(fd)
+
+ if err == syscall.EWOULDBLOCK {
@davecheney

davecheney Jun 16, 2016

Why do you call our EWOULDBLOCK here ? Why not just return errors.Annotatef(err, "unable to aquire file lock on %q", name)

@howbazaar

howbazaar Jun 16, 2016

Owner

Because we need to know the difference between the lock currently being locked, and other errors. If it is currently locked, the outer acquire function retries.

impl_flock.go
+}
+
+// Release implements Releaser.
+func (i *impl) Release() error {
@davecheney

davecheney Jun 16, 2016

This should panic if release is called twice.

@howbazaar

howbazaar Jun 16, 2016

Owner

Errors are fine, panics are bad.

@fwereade

fwereade Jun 16, 2016

I think it should just error, not panic; but I think we should consciously error if invoked twice, instead of depending on the syscall.

Dave, what's the benefit of a panic here? All I see are costs, basically...

@davecheney

davecheney Jun 16, 2016

I disagree. If you cannot release a lock you held, then by definition things are pretty fucked up. You cannot release the lock, and worse, no other process on the machine can aquire the lock.

This is such an extreme condition I think

a. we cannot allow the caller to ignore the error, and in fact I don't think there is a way they can handle it safely. Anything that quietly tucks it into a log file and goes through the internal worker restart process will not fix the sitaution.

b. All our locks are tied to the process, the invariant is, no matter what, killing the process will release the lock. So, if the lock cannot be released (I don't even know how this would happen), then by definition the way to release the lock is to kill the process via panic, or even os.Exit(1) (to make sure the panic is not caught)

@fwereade

fwereade Jun 16, 2016

I anticipate that the common scenario is a double-release (and I don't know how a real release would fail either); that doesn't seem like a good reason to take down the whole process.

I could probably be swayed into agreeing that an error actually releasing the lock might be nuke-worthy; but the common simple case is murphy's law (of interfaces): if it can be used wrong, it will be used wrong. I'm much more concerned that people will write code paths that rarely double-release than I am that release will actually fail...

@davecheney

davecheney Jun 16, 2016

How about this then

type Releaser interface {
        // Release releases the mutex.
        // Release may be called multiple times, but only the first
        // call will release this instance of the mutex.
        // Release is unable to release the mutex successfully it
        // will call panic to forcibly release the mutex.
        Release()
}

Some justification

  1. If Release is idempotent, ie it releases the lock then has no effect, it should be clear that calling it mulitple times won't release another instance of the same lock. Ie.
var spec Spec = ...
r := mutex.Acquire(spec)
r.Release()
r2 := mutex.Acquire(spec)
r.Release() // does not thing, and does not affect r2
  1. My comments about panic'ing if the mutex cannot be released are the same as elsewhere.
@howbazaar

howbazaar Jun 16, 2016

Owner

Actually I quite like @davecheney's interface definition above. The whole point of the mutex is to have a system lock. If it can't be removed for some reason, then taking down the process to release the lock has a certain form of sanity.

Owner

howbazaar commented Jun 16, 2016

Need to add a test for different named locks not interfering.

impl_flock.go
+}
+
+func flockName(name string) string {
+ currentUser := os.Getenv("USER")
@davecheney

davecheney Jun 16, 2016

USER is not going to be set very often. Why not drop this and just make it /tmp/juju-mutex-+name

@mjs

mjs Jun 16, 2016

Again, whether you want the username in there depends on whether this should support cross-account interactions.

If you do want the username in there then os/user.Current() is a more reliable approach. It even works on Windows.

@howbazaar

howbazaar Jun 16, 2016

Owner

I think that the caller should care about splitting out the name, not the mutex.

impl_linux.go
+
+const prefix = "@/var/lib/juju/mutex-"
+
+type impl struct {
impl_linux.go
+ }
+ l, err := net.ListenUnix("unix", addr)
+ if err != nil {
+ return nil, errors.Wrap(err, ErrLocked)
@davecheney

davecheney Jun 16, 2016

This is different to the flock implementation, please make it the same.

impl_flock.go
+ if currentUser == "" {
+ currentUser = "client"
+ }
+ return "/tmp/juju-" + currentUser + "-" + name
@axw

axw Jun 16, 2016

Member

This means different behaviour to the other implementations: name becomes scoped to $USER. It's meant to be machine scoped, right? If the caller wants it scoped by user, then they should include $USER in name.

Also I think the name could do with a bit more context: maybe "/tmp/juju-mutex-" as a prefix. Finally, you should probably use os.TempDir() instead of hard-coding to /tmp.

@davecheney

davecheney Jun 16, 2016

Excellent point. If this lock is supposed to be user scoped, then the caller should include this in the name of the mutex.

@mjs

mjs Jun 16, 2016

This ties in to what I've been saying above.

impl_linux.go
+
+// Release implements Releaser.
+func (i *impl) Release() error {
+ return i.socket.Close()
@davecheney

davecheney Jun 16, 2016

this must panic if release is called twice.

@fwereade

fwereade Jun 16, 2016

I agree we shouldn't double-close the socket, but why do we need to panic instead of just informing the client that they're an idiot?

@davecheney

davecheney Jun 16, 2016

At least nil out the socke.

@fwereade

fwereade Jun 16, 2016

yeah, strong +1 to writing these to expect and handle misuse

mutex.go
+type Releaser interface {
+ // Release makes the mutex available for others and invalidates the instance.
+ // Subsequent calls to Release will error.
+ Release() error
@davecheney

davecheney Jun 16, 2016

Why shold release return an error ? is anyone going to check it ? And what would you do if releasing a lock failed ?

Ie, if you could not release a mutex that is held across the whole machine, the process should dump, 'cos that's the only way to recover the lock.

I suggest making release not return an error and panic if it fails.

@mjs

mjs Jun 16, 2016

Dave: That makes me pretty uncomfortable. Sure, ignore the error if you want but not giving the caller a choice on what should happen (even if only to log the problem and then exit) seems dicey to me.

@davecheney

davecheney Jun 16, 2016

I disagree. If you cannot release a lock you held, then by definition things are pretty fucked up. You cannot release the lock, and worse, no other process on the machine can aquire the lock.

This is such an extreme condition I think

a. we cannot allow the caller to ignore the error, and in fact I don't think there is a way they can handle it safely. Anything that quietly tucks it into a log file and goes through the internal worker restart process will not fix the sitaution.

b. All our locks are tied to the process, the invariant is, no matter what, killing the process will release the lock. So, if the lock cannot be released (I don't even know how this would happen), then by definition the way to release the lock is to kill the process via panic, or even os.Exit(1) (to make sure the panic is not caught)

@fwereade

fwereade Jun 16, 2016

Yeah, I don't think that one little utility package really gets to decide that the whole process is unviable just because (most likely scenario) someone deferred a second Release or something.

I know that many people are pretty goddamn lazy about defer ThingThatErrors(), and that does piss me off, but I don't think it's a very good reason to panic... I do think we should expect and anticipate pathologically stupid usage, and surely we're in a perfect position to DTRT by intercepting extra releases?

@davecheney

davecheney Jun 16, 2016

Calling Release twice should panic for the same reason that closing a channel twice panics. This is a programming error, and laziness begats laziness.

@fwereade

fwereade Jun 16, 2016

I'm trying to remember a time where I've thought "golly, that panic was a good idea, I'm glad it took down the controller" rather than "GAAAAH FFS". Inability to release a global lock it holds probably does qualify; and if the client was doing something super-evil that we somehow couldn't prevent, I might appreciate that; but a double-release (that we can trivially detect and deflect without panicking) is entirely predictable.

(counterargument: if we don't impose extreme consequences for double-releasing, people will just do it anyway and ignore all errors... and, well, yeah, they could, but that would just be shitty code, which I think is (sadly) unexceptional and (happily) at least addressable.)

@davecheney

davecheney Jun 16, 2016

I'll concede the double-release argument if we can agree that a failure to release a lock is panic worthy.

mutex.go
+ Release() error
+}
+
+// Acquirer defines the standard method to acquire a mutex.
@davecheney

davecheney Jun 16, 2016

Please delete this. If some other code needs this it should define it in its package.

mutex.go
+ Cancel <-chan struct{}
+}
+
+// Acquire implements Acquirer.
@davecheney

davecheney Jun 16, 2016

Why does this exist ? Now there are two ways of doing the same thing.

@mjs

mjs Jun 16, 2016

I agree that there should only be one way, either Spec.Acquire or Acquire. Sticking with Spec.Acquire has the benefit of easier mocking via an interface when testing something that takes one of these.

@davecheney

davecheney Jun 16, 2016

+1, there should be one way. And if it's via Spec.Acquire, then please rename Spec to Mutex so you have

var m mutex.Mutex {
   /// spec stuff
}

r, err := mutex.Acquire()
if err { ... }

r.Relaese()
@fwereade

fwereade Jun 16, 2016

I'd slightly prefer Acquire(spec), FWIW; but I don't really mind.

Strongly favour having just one way to do it.

@davecheney

davecheney Jun 16, 2016

@fwereade same here. One of the design decisions that thumper and I talked about was never having a Mutex which is not locked -- that is, if you have a value which implements Releaser your process holds the lock.

Currently a lot of the fslock code set's up the lock (to coin a phrase) then passes around that *fslock inside a config or a context or some other structure to the point that the calling code wants to grab the lock.

This has the downside that if your function receives an *fslock, you don't know if it's locked or not. And this is a big problem, because this lock is machine wide. Any operation like IsLocked() bool is inherently racy because between the time of asking "is it locked" and deciding "right, it's not locked, I'll go lock it" or worse, assume that it is not locked while the code does something, the lock could be take by any other process on the machine.

The mutex.Spec in this type avoids this ambiguity as it is the "setup" for a lock, and only when you want to grab the lock to you pass the spec to mutex.Acquire() and either grab the lock or don't. That is to say, if you have a Spec, you don't hold the lock, all you have is a description of the lock you'd like to hold.

So, with that said, I'd prefer mutex.Acquire(Spec), rather than func (m *Mutex) Acquire(). Because the latter is oh so tempting to add an IsLocked method.

@fwereade

fwereade Jun 17, 2016

yep, heartily concur.

mutex.go
+ impl, err := acquire(spec.Name)
+ if err == nil {
+ return impl, nil
+ } else if errors.Cause(err) != ErrLocked {
@davecheney

davecheney Jun 16, 2016

I think the logic should be to try to aquire the lock until the timeout occurs. Any error triggers a retry.

@mjs

mjs Jun 16, 2016

Why? Then the code is almost certainly going to be spinning around, butting against problem isn't going to go away. Better report any unexpected error immediately.

@davecheney

davecheney Jun 16, 2016

Ok, but please make it errLocked so that doesn't bubble up to the caller.

mutex.go
+ case <-timeout:
+ return nil, errors.Trace(ErrTimeout)
+ case <-spec.Cancel:
+ return nil, errors.Trace(ErrCancelled)
@davecheney

davecheney Jun 16, 2016

Why does the calling code care about this ?

Looks pretty good to me, but I want to see the API surface reduced, specifically I want to see the error values unexported.

Owner

howbazaar commented Jun 16, 2016

Key rationale to have an Acquirer interface is to be able to mock out the Mutex. To have something that wants a mutex.Acquirer, but doesn't actually want to interact with the system.

I have removed the two ways to do things, and stopped exporting errLocked.

I do still think it is valuable though to have ErrTimeout and ErrCancelled.exported.

davecheney commented Jun 16, 2016

Key rationale to have an Acquirer interface is to be able to mock out the Mutex. To have something that wants a mutex.Acquirer, but doesn't actually want to interact with the system.

What is the requirement to mock a mutex? If you want to have a dummy mutex then make up a random spec, heck write a testing helper, func UniqueSpec() Spec, that does it for you.

mutex_linux.go
+
+// Release implements Releaser.
+func (m *mutex) Release() {
+ if m.socket == nil {
@davecheney

davecheney Jun 17, 2016

Do you support this ?

r := mutex.Acquire()
go r.Release()
r.Release()

You may need to add a lock to prevent two Release's happening concurrently as m.socket is not nil'd out until after the call to close.

mutex_flock.go
+
+// Release implements Releaser.
+func (m *mutex) Release() {
+ if m.fd == 0 {
@davecheney

davecheney Jun 17, 2016

See comment about other release implementation

@howbazaar howbazaar merged commit 59c26ee into juju:master Jun 17, 2016

@howbazaar howbazaar deleted the howbazaar:initial-work branch Jun 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment