flock #19

earthgecko · 2014-06-10T20:15:36Z

Great effort, whisper can be a pain.

I have always been led to believe that "Graphite will lock the Whisper file using flock() during updates so that no other Graphite process reads it mid-update and finds inconsistent data.

Unfortunately, flock() is only an advisory lock and rsync ignores it. This means that if Graphite updates a particular Whisper file while rsync is in the middle of backing it up, the backup copy might be inconsistent." from the boys SwiftStack - @smerritt

Having looked through the code I do not see any reference to flock in anyway. Is this just trivial because of the context of healing to local whisper file? If so, adding an ssh key option pull request is in order for me I think.

jssjr · 2014-06-10T20:50:49Z

Unfortunately, flock() is only an advisory lock and rsync ignores it. This means that if Graphite updates a particular Whisper file while rsync is in the middle of backing it up, the backup copy might be inconsistent.

Really interesting observation. And one that I hadn't considered.

If rsync is ignoring flock() on specific files, then it seems reasonable that the file can be transmitted in an inconsistent state. However, since whisper will wrap the update transaction in a lock and the sync job will rsync the source file to /tmp on the target host in a batch, then merge the values into the target database, this risk only affects us if whisper is inside a create or update transaction when rsync reads and sends the file.

I'm honestly not sure how to solve this and continue to use rsync, if rsync isn't willing to respect a shared lock. Do you have any ideas here?

Removing the rsync dependency and having carbon-sync use the carbonlink protocol directly to pull data is an option, but one that will require significant work and tuning.

earthgecko · 2014-06-10T21:04:50Z

Well if there was a pip libflockit (https://github.com/smerritt/flockit) it could be added to requirements :) Or perhaps something similar with python fcntl.flock()

On either end at the moment we do:
flockit.so rsync .wsp file to tmp location -> rsync .wsp to remote tmp location -> flockit.so rsync tmp_remote .wsp to pyschical .wsp file

env LD_PRELOAD=/usr/lib/libflockit.so FLOCKIT_FILE_PREFIX=$WHISPER_PATH rsync $OPTION_THINGS <src> <dest> would do it all.

But seeing as the file on the dest is being replaced and healed in-situ locally there should be no need to use libflockit.so on the dest unless it is a file that carbon is updating via a relay, etc (e.g. exists).

jssjr · 2014-06-10T21:07:18Z

Are you suggesting we apply a shared lock around the batch of files to sync in https://github.com/graphite-project/whisper/blob/master/whisper.py#L540-L542? I worry about that causing problems by blocking the local carbons, but with a sufficiently small batch size it might be fine.

earthgecko · 2014-06-10T21:10:42Z

Yeah we flock rsync per file. Figured it was the only way to ensure the we are not creating a Schrödinger's cat type scenario, without validating every timeseries data set somehow.

jssjr · 2014-06-10T21:18:16Z

I'd be OK with that, but I wonder if it should be default. I'm still worried that since we transfer files in a batch of 1000 at a time to avoid the overhead of setting up the connection for every file we'll cause the carbon processes on the source node to block for an unreasonable amount of time. I'm unsure how to handle this.

jssjr · 2014-06-10T21:30:19Z

Want to send over a PR with your ideas? We can riff on that and get this solved.

earthgecko · 2014-06-11T09:45:00Z

Forked, no quick promises though, have libcloud, fog and skyline pull requests that need to be done that are lagging. This social coding malarkey, who thought that was a good idea ;)

filippog · 2015-02-02T10:30:58Z

hi,
thanks for carbonate! I was thinking about locking too, related to this whisper.LOCK seems to be set by carbon when WHISPER_LOCK_WRITES is set in graphite's config (see graphite-project/whisper#97) however carbonate doesn't explicitly lock if I'm reading correctly so this is also possible I think:

expand a cluster ring from N to M
metrics start flowing to their new position in the ring
carbon-sync is ran to sync and merge metrics
- meanwhile the local carbon-cache also writes to the same files while carbon-sync is backfilling
whisper files are corrupted due to lack of locking (?)

jssjr · 2015-02-07T13:06:54Z

@filippog - That seems accurate. I believe we want to duplicate the WHISPER_LOCK_WRITES directive in the carbonate config file, modify the import so we have access to LOCK and set it accordingly so update_many() calls flock before making changes. Does this sound right? Want to take a stab at a PR or should I?

filippog · 2015-02-07T15:30:37Z

sounds good @jssjr ! I've split this into #47 so we can followup there, I'll take a stab at a PR shortly!

our graphite clustering plan involves using carbonate while carbon-cache is running, this can potentially lead to corrupted whisper files if locking isn't used, see also graphite-project/carbonate#19 Bug: T86316 Change-Id: I76b064acf3b7ccad17313a4f05c3b72b3b01b798

deniszh · 2017-03-19T19:45:25Z

Fixed in #71

jssjr added the bug label Jun 10, 2014

filippog mentioned this issue Feb 7, 2015

option to use whisper.LOCK during sync/fill #47

Closed

jjneely mentioned this issue Nov 29, 2015

Usage without S3 jjneely/whisper-backup#7

Closed

deniszh mentioned this issue Mar 18, 2017

Intermittent rsync errors #61

Closed

deniszh closed this as completed Mar 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flock #19

flock #19

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 11, 2014

filippog commented Feb 2, 2015

jssjr commented Feb 7, 2015

filippog commented Feb 7, 2015

deniszh commented Mar 19, 2017

flock #19

flock #19

Comments

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 10, 2014

jssjr commented Jun 10, 2014

jssjr commented Jun 10, 2014

earthgecko commented Jun 11, 2014

filippog commented Feb 2, 2015

jssjr commented Feb 7, 2015

filippog commented Feb 7, 2015

deniszh commented Mar 19, 2017