Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

pkg/lock: Verify file exists after lock is taken #3615

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shobhit85
Copy link

Flock syscall does not check if file/dir pointed to by given fd exists
before taking shared/exclusive locks after blocking for a while on
exclusive in case of taking a shared lock and vice-versa.

Here verification is done by matching inode number pointed to
by fd used for locking and new fd created by reopening the file.

@ghost
Copy link

ghost commented Mar 14, 2017

Can one of the admins verify this patch?

@jonboulle
Copy link
Contributor

@rktbot ok to test

@euank
Copy link
Member

euank commented Mar 14, 2017

Is there a specific issue this fixes? I can definitely believe there's somewhere this logic is wrong, but I'm curious if there's a motivating breakage.

I know it used to be wrong in the gc code, but that was fixed differently (via better error handling in its code).

@shobhit85
Copy link
Author

Not sure what could be broken in rkt due to this bug but we are using lock package at Apcera and have found the bug where blocking calls to Flock return with successful lock even though file has been deleted i.e. fd used for locking points to a nonexistent file.

@lucab
Copy link
Member

lucab commented Mar 14, 2017

@shobhit85 I think you should rebase against current master.

Flock syscall does not check if file/dir pointed to by given fd exists
before taking shared/exclusive locks after blocking for a while on
exclusive in case of taking a shared lock and vice-versa.

Here verification is done by matching inode number pointed to
by fd used for locking and new fd created by reopening the file.
@shobhit85
Copy link
Author

Rebased against latest master.

@shobhit85
Copy link
Author

Not sure why it's broken on semaphoreci.

@lucab
Copy link
Member

lucab commented Mar 15, 2017

@shobhit85 just a flake (no space left on device), I've re-triggered it.

Copy link
Member

@euank euank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about whether this change is the right one.

This doesn't actually add any new guarantee.

Before or after this change it's perfectly possible that the file does not exist or has been overwritten when the function returns.

All this does is make the race window much smaller for the blocking lock calls (moving it to between verifySameFile and the caller's code).

Arguably the check being done here should be the caller's responsibility since the only way for it to be completely correct is for the caller to be aware of this possibility and handle it appropriately.

On the other hand, because of the blocking nature of the calls, this does make the race much harder to hit.

As a data-point for how this is handled in rkt btw:

  1. Pod paths include uuids, so overwriting/Inode changes just won't happen.
  2. During the cleanup phase (ExclusiveLock -> delete), the pod in question not existing isn't considered an error, so an exclusive lock taken after the pod's deleted isn't treated as an issue.

@@ -64,6 +64,10 @@ func TryExclusiveLock(path string, lockType LockType) (*FileLock, error) {
if err != nil {
return nil, err
}
if err = verifySameFile(l, path); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check really adds anything in the Try variants of the functions.

@lucab
Copy link
Member

lucab commented Mar 16, 2017

Arguably the check being done here should be the caller's responsibility since the only way for it to be completely correct is for the caller to be aware of this possibility and handle it appropriately.

I think I agree with @euank here. In particular, if flock succedeed the caller is returned a fd, which in @shobhit85 case may point to an open-but-unlinked file/dir and can still be used to:

  • perform file/dir I/O on it
  • call fstat on it to check if device/inode/type/mode matches with some external reference

I don't see many other ways to eliminate such a race from within library code. The only improvement I can see right now to this module is to add additional functions which directly take an fd instead of internally performing a path-based open.

@shobhit85
Copy link
Author

@euank @lucab I agree there is still the race. But change is not to eliminate the race as this the caller's responsibility to check what actions have been taken on a file before it acquired the lock on it. Locks are advisory locks only.

The basic use case that this change is trying to fix is, let's say there are two threads, T1 and T2 and T1 has the lock on a file and doing something with it (may be deleting it). In meanwhile T2 is blocked on the lock (using fd referring the same file). As soon as T1 is done, T2 gets the lock (flock returns). T2 assumes file is there as lock is taken. Why? because caller asked for the lock using path.

As this library is path based, user is not aware (or should not require the awareness) of the internals that the lock is taken using fd and file being referred by the fd might be gone. So caller assumes locking would fail if path doesn't exist.

The race where third thread could still delete a file without taking a lock while T1, T2 are syncing on the locks, is still there because locks are advisory only. There is no race if threads in question are using locks to operate on a file.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants