Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lock failed when program restart after crash #25

Closed
HeChuanXUPT opened this issue Jan 30, 2018 · 6 comments
Closed

lock failed when program restart after crash #25

HeChuanXUPT opened this issue Jan 30, 2018 · 6 comments

Comments

@HeChuanXUPT
Copy link
Contributor

The lockfile won't be deleted when my program crash.
And the pid in the lockfile may be reused by other process(not my program).
Then my program will be locked failed unless delete the lockfile.
System: Windows

Can lockfile check process name or path by pid ?

@tim-seoss
Copy link

tim-seoss commented Feb 15, 2018

The same issue is present on Linux. This race also gets lost frequently for system services (which start up during system boot, and so often have long-lived processes with similar PIDs present on subsequent boots), which then fail on the next system boot.

You could fix this by:

. Storing as much as you can (on the given OS) of:

  • hostname
  • process start time
  • process executable path
  • pid

You then:

Always consider locks valid if the hostname is different from that stored in the lock file (since you can't tell if the process is still running)

If the hostname matches, then only consider the lock valid if the PID, process start time, and executable (or as many of those as you have on the given OS) is present.

Maybe store the lock file contents as JSON?

I can create some proof-of-concept code for Linux if you're interested?

@nightlyone
Copy link
Owner

Thanks @tim-seoss for the offer! Would like to see such code. I esp. look for ideas on how to parse existing lock files not having that information and do the migration transparently.

@Aulilino
Copy link

Aulilino commented Jan 9, 2019

First, thanks a lot for your lib.

I am confused about this issue. The core of "lock file" is LOCK the file by current process, not just write the pid into file. Due to the file is locked, the certain instance of program must be running. Besides the process dies, the lock will disappear and the other instance can run and relock the file.

By reading the code, it seems lockfile does not use the mechanism above. Maybe we should check the process which pid written in the pid file is the certain process or not, using the lib.

@tim-seoss
Copy link

@Aulilino To clarify, the lib currently (or at least when I last checked a year ago) made the following assumption:

  1. If there is a process in existence with the same PID as in the lock file, then the lock is still valid (because the process which created the file must still be running, and therefore the lock should be honoured).
  2. If there is not a process in existence with the same PID as in the lock file, then the lock is no longer valid, and can be reissued to the calling process.

These assumptions break if:

. The process ID has been reallocated to another process entirely.
. The PID gets reallocated to another invocation of the same binary (e.g. typically because of a reboot, but this can also happy less frequently by chance in normal operation).
. The filesystem containing the lock is visible across PID namespaces (e.g. via a network filesystem, or container boundary), so that a still-running process which holds the lock is potentially invisible to the calling process.

@nightlyone
Copy link
Owner

@tim-seoss thanks for the analysis. The last point about pid namespaces might be a deal breaker for many people.

So I believe this should be documented in a limitations section of the readme as well as in the package doc.

Care to send a pull request for either of these?

@nightlyone
Copy link
Owner

frozen due to age.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants