New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HttpKernel] Use flock() for HttpCache's lock files #19300

Merged
merged 1 commit into from Jul 28, 2016

Conversation

Projects
None yet
4 participants
@mpdude
Contributor

mpdude commented Jul 6, 2016

Q A
Branch? 2.7
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? no
Fixed tickets #16777, #15813 and #16312 are also related
License MIT
Doc PR

When a PHP process crashes or terminates (maybe the OOM killer kicks in or other bad things ™️ happen) while the HttpCache holds a .lck file, that lock file may not get unlink()ed.

The result is that other requests trying to access this cache entry will see a few seconds delay while waiting for the lock; they will eventually continue but send 503 status codes along with the response. The sudden buildup of PHP processes caused by the additional delay may cause further problems (sudden load increase).

As LockHandler is using flock()-based locking, locks should be released by the OS when the PHP process terminates.

I wrote this as bugfix against 2.7 because every once in a while I encounter situations (not always reproducible) where .lock files are left over and keep the cache locked.

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 6, 2016

Contributor

Appveyor tests fail because HttpCacheTestCase tries to clear directories and lacks permissions to unlink LockHandler's lockfiles.

Contributor

mpdude commented Jul 6, 2016

Appveyor tests fail because HttpCacheTestCase tries to clear directories and lacks permissions to unlink LockHandler's lockfiles.

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 6, 2016

Contributor

@nicolas-grekas you seem to be the master of tests here. How would you approach this?

Contributor

mpdude commented Jul 6, 2016

@nicolas-grekas you seem to be the master of tests here. How would you approach this?

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 6, 2016

Member

On Windows, a file can't be unlinked until all handlers to it have been closed.

Member

nicolas-grekas commented Jul 6, 2016

On Windows, a file can't be unlinked until all handlers to it have been closed.

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 17, 2016

Member

Not sure this is the right approach: a real lock file will create another race condition unless the lock file is not unlinked (the race being when a lock is acquired but another process unlinks the just locked file).
Let's assume we don't want to leave lock files behind, we can't use flock() to fix the race...
Would a shutdown registered function help instead? Or do you really encounter a situation where your process is abruptly killed? Then what about adding a timeout on lock files, based on filemtime()?

Member

nicolas-grekas commented Jul 17, 2016

Not sure this is the right approach: a real lock file will create another race condition unless the lock file is not unlinked (the race being when a lock is acquired but another process unlinks the just locked file).
Let's assume we don't want to leave lock files behind, we can't use flock() to fix the race...
Would a shutdown registered function help instead? Or do you really encounter a situation where your process is abruptly killed? Then what about adding a timeout on lock files, based on filemtime()?

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 17, 2016

Contributor

@nicolas-grekas Not sure I got your above comment right 😦

My intention was to use LockHandler because it applies flock() based locking, i. e. the lock will be released (by the OS) when the process holding it crashes/terminates/exits.

In fact LockHandler leaves the file used for flock() behind, but that should not matter.

Regarding the Windows tests, for some reason LockHandler seems to still hold references to a file when some tearDown() method tries to clean up a directory.

Do you have any idea how I could debug this? I don't have a Windows machine with PHP set up anywhere near... :-(

Contributor

mpdude commented Jul 17, 2016

@nicolas-grekas Not sure I got your above comment right 😦

My intention was to use LockHandler because it applies flock() based locking, i. e. the lock will be released (by the OS) when the process holding it crashes/terminates/exits.

In fact LockHandler leaves the file used for flock() behind, but that should not matter.

Regarding the Windows tests, for some reason LockHandler seems to still hold references to a file when some tearDown() method tries to clean up a directory.

Do you have any idea how I could debug this? I don't have a Windows machine with PHP set up anywhere near... :-(

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 18, 2016

Member

In fact LockHandler leaves the file used for flock() behind, but that should not matter.

We need to be sure of that. It will leave standalone files forever, and their number can grow large. Does it really not matter?

If we assume yes (which need confirmation), then maybe use flock directly here? The LockHandler only adds boilerplate to me in this case.

Member

nicolas-grekas commented Jul 18, 2016

In fact LockHandler leaves the file used for flock() behind, but that should not matter.

We need to be sure of that. It will leave standalone files forever, and their number can grow large. Does it really not matter?

If we assume yes (which need confirmation), then maybe use flock directly here? The LockHandler only adds boilerplate to me in this case.

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 18, 2016

Contributor

I am using LockHandler in various other places and yes, it leaves sf.{someId}.lock files in the filesystem also after you release a lock and/or the process terminates. Also, LockHandler does not call unlink() anywhere.

This PR places the lock files in the same directory as the cache entry in question. So I'd say yes, this can double the amount of files present in the http_cache directory.

a real lock file will create another race condition unless the lock file is not unlinked (the race being when a lock is acquired but another process unlinks the just locked file).

Now I get that: You cannot safely unlink() the lock file, because you'd need to release the lock first; and after that, you cannot tell whether another process obtained another lock that prevents us from deleting the file. Probably that's why LockHandler just leaves the files alone.

The LockHandler only adds boilerplate to me in this case.

My assumption was that all it does was to handle the edge-cases and platform inconsistencies of using file locking, tested and packed up behind a simple API. Don't you think I'd have to reinvent all this if I refrain from using LockHandler?

Contributor

mpdude commented Jul 18, 2016

I am using LockHandler in various other places and yes, it leaves sf.{someId}.lock files in the filesystem also after you release a lock and/or the process terminates. Also, LockHandler does not call unlink() anywhere.

This PR places the lock files in the same directory as the cache entry in question. So I'd say yes, this can double the amount of files present in the http_cache directory.

a real lock file will create another race condition unless the lock file is not unlinked (the race being when a lock is acquired but another process unlinks the just locked file).

Now I get that: You cannot safely unlink() the lock file, because you'd need to release the lock first; and after that, you cannot tell whether another process obtained another lock that prevents us from deleting the file. Probably that's why LockHandler just leaves the files alone.

The LockHandler only adds boilerplate to me in this case.

My assumption was that all it does was to handle the edge-cases and platform inconsistencies of using file locking, tested and packed up behind a simple API. Don't you think I'd have to reinvent all this if I refrain from using LockHandler?

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 18, 2016

Member

Then what about locking the cache file itself instead of creating a dedicated lock file? Using flock directly also should be better, there are no special inconsistencies to deal with in LockHandler

Member

nicolas-grekas commented Jul 18, 2016

Then what about locking the cache file itself instead of creating a dedicated lock file? Using flock directly also should be better, there are no special inconsistencies to deal with in LockHandler

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 18, 2016

Contributor

Note to self: flock() also supports shared "reader" locks, might be an additional advantage here.

Contributor

mpdude commented Jul 18, 2016

Note to self: flock() also supports shared "reader" locks, might be an additional advantage here.

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 19, 2016

Member

Looks like a matter of correctly implementing it to me and not fall into the dangerous traps, isn't it?

Member

nicolas-grekas commented Jul 19, 2016

Looks like a matter of correctly implementing it to me and not fall into the dangerous traps, isn't it?

@mpdude

This comment has been minimized.

Show comment
Hide comment
@mpdude

mpdude Jul 19, 2016

Contributor

Locking was only applied when an existing cache entry was updated, so no need for shared (read) locks.

Contributor

mpdude commented Jul 19, 2016

Locking was only applied when an existing cache entry was updated, so no need for shared (read) locks.

@mpdude mpdude changed the title from Use LockHandler to manage HttpCache's lock files to Use flock() for HttpCache's lock files Jul 20, 2016

@nicolas-grekas nicolas-grekas changed the title from Use flock() for HttpCache's lock files to [HttpKernel] Use flock() for HttpCache's lock files Jul 26, 2016

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas
Member

nicolas-grekas commented Jul 26, 2016

See proposal at nicolas-grekas#5

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 26, 2016

Member

👍 :)

Member

nicolas-grekas commented Jul 26, 2016

👍 :)

@stof

This comment has been minimized.

Show comment
Hide comment
@stof

stof Jul 26, 2016

Member

👍

Member

stof commented Jul 26, 2016

👍

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Jul 28, 2016

Member

Thank you @mpdude.

Member

nicolas-grekas commented Jul 28, 2016

Thank you @mpdude.

@nicolas-grekas nicolas-grekas merged commit 2668edd into symfony:2.7 Jul 28, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
fabbot.io Your code looks good.
Details

nicolas-grekas added a commit that referenced this pull request Jul 28, 2016

bug #19300 [HttpKernel] Use flock() for HttpCache's lock files (mpdude)
This PR was merged into the 2.7 branch.

Discussion
----------

[HttpKernel] Use flock() for HttpCache's lock files

| Q             | A
| ------------- | ---
| Branch?       | 2.7
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | no
| Fixed tickets | #16777, #15813 and #16312 are also related
| License       | MIT
| Doc PR        |

When a PHP process crashes or terminates (maybe the OOM killer kicks in or other bad things ™️ happen) while the `HttpCache` holds a `.lck` file, that lock file may not get `unlink()`ed.

The result is that other requests trying to access this cache entry will see a few seconds delay while waiting for the lock; they will eventually continue but send 503 status codes along with the response. The sudden buildup of PHP processes caused by the additional delay may cause further problems (sudden load increase).

As `LockHandler` is using `flock()`-based locking, locks should be released by the OS when the PHP process terminates.

I wrote this as bugfix against 2.7 because every once in a while I encounter situations (not always reproducible) where `.lock` files are left over and keep the cache locked.

Commits
-------

2668edd [HttpKernel] Use flock() for HttpCache's lock files

@mpdude mpdude deleted the webfactory:use-lock-file-handler branch Jul 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment