Fix race condition in the linux:shrinker splat test. #196

stevenj · 2012-12-03T08:19:28Z

The Linux Shrinker splat test has a race condition.
There is no guarantee that when the call to write "2" to /proc/sys/vm/drop_caches returns that the kernel has called the shrinker which has been defined.
The patch puts a wait event on this so that either the shrinker is run OR a 1 second timeout occurs.
If the timeout occurs then there is some sort of error cause it shouldn't take 1 second for the kernel to run through the shrinker callback.
Once the shrinker has run and shrunk down to 0, then the main test resumes and the race condition is fixed.
The only issue with this fix I can think of is if it is feasible for it to take longer than 1 second to execute, but I don't think that is likely.

It also fixes a small problem in the failsafe code which would trigger if the linux:shrinker test was run multiple times. This changes takes my error rate from about 1 error in runs 6, to 0 in 10,000 runs.

This fixes issues: #182 & #96
It also fixes one of the failures reported in: #125

…s and can report failure persistently

behlendorf · 2012-12-05T20:52:02Z

@stevenj Can you please cleanup and re push these comment based on the comments.

There is no guarantee that when the call to write "2" to /proc/sys/vm/drop_caches returns that the kernel has called the shrinker which has been defined.

Also I'm not convinced of this. My reading of the upstream kernel source suggest the caller will block in the handler until most of the cache has been reclaimed which is why the test was originally written this way. See https://github.com/torvalds/linux/blob/master/fs/drop_caches.c#L52

stevenj · 2012-12-05T23:48:30Z

But you are not synchronously calling anything. You are echoing 2 to a virtual file system. Are you sure the echo will not return until the cache flush happens? I figured it was reasonable not to be synchronous in this case, because you have to consider, what does bash do, what does echo do, what does all the io redirection do, and what does the vfs handling do, is there caching or asynchronicity introduced by any of that? I don't know the answer to these BUT I definitely had it happening where the echo would return in some cases before any reclaiming occurred, so I highly doubt it is synchronous as implemented. Initially all i did was put a msleep after the echo and the problem went away, I use a wait queue because that is more reliable than just an arbitrary sleep, its also faster. I also know this method is used elsewhere in zfs, it is on my list to check those and make sure they are not expecting synchronicity, when it doesn't seem guaranteed in this test case.

I will fix the patch and resubmit tonight. Thanks for the comments.

behlendorf · 2012-12-11T23:21:46Z

@stevenj How about this #200. It's your original patch with some minor tweaking/cleanup. It certainly does solve the issue and obviously won't cause harm. I haven't dug too deeply in to if it really is synchronous or not.

behlendorf · 2013-01-10T17:27:18Z

@stevenj See commit 8842263, @nedbass identified the reason why the user mode helpers are racier on newer kernels. However, since we still have the issue with stdio buffering your improvements are probably needed as well.

Steven Johnson added 2 commits December 3, 2012 16:16

Fix race condition in the linux:shrinker splat test.

79ecf65

Make sure linux:shrinker failsafe test resets between successive test…

01c5d2c

…s and can report failure persistently

behlendorf closed this Dec 11, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition in the linux:shrinker splat test. #196

Fix race condition in the linux:shrinker splat test. #196

stevenj commented Dec 3, 2012

behlendorf commented Dec 5, 2012

stevenj commented Dec 5, 2012

behlendorf commented Dec 11, 2012

behlendorf commented Jan 10, 2013

Fix race condition in the linux:shrinker splat test. #196

Fix race condition in the linux:shrinker splat test. #196

Conversation

stevenj commented Dec 3, 2012

behlendorf commented Dec 5, 2012

stevenj commented Dec 5, 2012

behlendorf commented Dec 11, 2012

behlendorf commented Jan 10, 2013