pkg/lock: Verify file exists after lock is taken #3615

shobhit85 · 2017-03-14T17:58:29Z

Flock syscall does not check if file/dir pointed to by given fd exists
before taking shared/exclusive locks after blocking for a while on
exclusive in case of taking a shared lock and vice-versa.

Here verification is done by matching inode number pointed to
by fd used for locking and new fd created by reopening the file.

ghost · 2017-03-14T17:58:33Z

Can one of the admins verify this patch?

jonboulle · 2017-03-14T18:06:31Z

@rktbot ok to test

euank · 2017-03-14T18:32:22Z

Is there a specific issue this fixes? I can definitely believe there's somewhere this logic is wrong, but I'm curious if there's a motivating breakage.

I know it used to be wrong in the gc code, but that was fixed differently (via better error handling in its code).

shobhit85 · 2017-03-14T18:45:39Z

Not sure what could be broken in rkt due to this bug but we are using lock package at Apcera and have found the bug where blocking calls to Flock return with successful lock even though file has been deleted i.e. fd used for locking points to a nonexistent file.

lucab · 2017-03-14T20:26:19Z

@shobhit85 I think you should rebase against current master.

Flock syscall does not check if file/dir pointed to by given fd exists before taking shared/exclusive locks after blocking for a while on exclusive in case of taking a shared lock and vice-versa. Here verification is done by matching inode number pointed to by fd used for locking and new fd created by reopening the file.

shobhit85 · 2017-03-14T21:17:37Z

Rebased against latest master.

shobhit85 · 2017-03-15T01:41:15Z

Not sure why it's broken on semaphoreci.

lucab · 2017-03-15T08:20:59Z

@shobhit85 just a flake (no space left on device), I've re-triggered it.

euank

I'm on the fence about whether this change is the right one.

This doesn't actually add any new guarantee.

Before or after this change it's perfectly possible that the file does not exist or has been overwritten when the function returns.

All this does is make the race window much smaller for the blocking lock calls (moving it to between verifySameFile and the caller's code).

Arguably the check being done here should be the caller's responsibility since the only way for it to be completely correct is for the caller to be aware of this possibility and handle it appropriately.

On the other hand, because of the blocking nature of the calls, this does make the race much harder to hit.

As a data-point for how this is handled in rkt btw:

Pod paths include uuids, so overwriting/Inode changes just won't happen.
During the cleanup phase (ExclusiveLock -> delete), the pod in question not existing isn't considered an error, so an exclusive lock taken after the pod's deleted isn't treated as an issue.

euank · 2017-03-16T05:43:38Z

pkg/lock/file.go

@@ -64,6 +64,10 @@ func TryExclusiveLock(path string, lockType LockType) (*FileLock, error) {
 	if err != nil {
 		return nil, err
 	}
+	if err = verifySameFile(l, path); err != nil {


I don't think this check really adds anything in the Try variants of the functions.

lucab · 2017-03-16T09:59:48Z

Arguably the check being done here should be the caller's responsibility since the only way for it to be completely correct is for the caller to be aware of this possibility and handle it appropriately.

I think I agree with @euank here. In particular, if flock succedeed the caller is returned a fd, which in @shobhit85 case may point to an open-but-unlinked file/dir and can still be used to:

perform file/dir I/O on it
call fstat on it to check if device/inode/type/mode matches with some external reference

I don't see many other ways to eliminate such a race from within library code. The only improvement I can see right now to this module is to add additional functions which directly take an fd instead of internally performing a path-based open.

shobhit85 · 2017-03-16T19:21:37Z

@euank @lucab I agree there is still the race. But change is not to eliminate the race as this the caller's responsibility to check what actions have been taken on a file before it acquired the lock on it. Locks are advisory locks only.

The basic use case that this change is trying to fix is, let's say there are two threads, T1 and T2 and T1 has the lock on a file and doing something with it (may be deleting it). In meanwhile T2 is blocked on the lock (using fd referring the same file). As soon as T1 is done, T2 gets the lock (flock returns). T2 assumes file is there as lock is taken. Why? because caller asked for the lock using path.

As this library is path based, user is not aware (or should not require the awareness) of the internals that the lock is taken using fd and file being referred by the fd might be gone. So caller assumes locking would fail if path doesn't exist.

The race where third thread could still delete a file without taking a lock while T1, T2 are syncing on the locks, is still there because locks are advisory only. There is no race if threads in question are using locks to operate on a file.

jonboulle mentioned this pull request Mar 14, 2017

pkg/lock: Close the file descriptor on unsuccessful locking #3616

Open

shobhit85 force-pushed the pkg-lock-invalid-fd branch from 225e629 to 0d49706 Compare March 14, 2017 21:15

euank reviewed Mar 16, 2017

View reviewed changes

lucab added component/stage0 needs/more-information labels Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/lock: Verify file exists after lock is taken #3615

pkg/lock: Verify file exists after lock is taken #3615

shobhit85 commented Mar 14, 2017

ghost commented Mar 14, 2017

jonboulle commented Mar 14, 2017

euank commented Mar 14, 2017

shobhit85 commented Mar 14, 2017

lucab commented Mar 14, 2017

shobhit85 commented Mar 14, 2017

shobhit85 commented Mar 15, 2017

lucab commented Mar 15, 2017

euank left a comment •

edited

euank Mar 16, 2017

lucab commented Mar 16, 2017

shobhit85 commented Mar 16, 2017

pkg/lock: Verify file exists after lock is taken #3615

Are you sure you want to change the base?

pkg/lock: Verify file exists after lock is taken #3615

Conversation

shobhit85 commented Mar 14, 2017

ghost commented Mar 14, 2017

jonboulle commented Mar 14, 2017

euank commented Mar 14, 2017

shobhit85 commented Mar 14, 2017

lucab commented Mar 14, 2017

shobhit85 commented Mar 14, 2017

shobhit85 commented Mar 15, 2017

lucab commented Mar 15, 2017

euank left a comment • edited

Choose a reason for hiding this comment

euank Mar 16, 2017

Choose a reason for hiding this comment

lucab commented Mar 16, 2017

shobhit85 commented Mar 16, 2017

euank left a comment •

edited