BUG: Fix race condition with new FFT cache #7712

krischer · 2016-06-07T07:56:34Z

This is a follow up to #7686 and fixes a race condition introduced with that PR.

Setting and getting items is now protected by a lock ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.

A comment: The cache behaves like a dictionary whose values are lists. Each list contains the twiddle factors for an FFT of a certain length, potentially multiple times. Duplicating the twiddle factors is necessary in a threaded case as they can only be used by a single thread concurrently. Multiple parallel threads calculating FFTs of equal lengths (an IMHO quite realistic use case) would otherwise either have to wait for the other threads to finish or constantly recalculate the twiddle factors, both greatly impacting performance. The cache always behaved like this even when it was just a simple dictionary.

charris · 2016-06-07T14:44:51Z

numpy/fft/helper.py

-        return value
+        with self._lock:
+            # pop + add to move it to the end.
+            value = self._dict.pop(key)


How do you know that key is in _dict? Could it not have been pruned?

It will just raise a KeyError which is the same behavior as calling __getitem__() of a dictionary with a non-existing key.

Lots of methods are missing but at least __getitem__() and __setitem__() mimic the behavior of a normal dict making the new cache a drop-in replacement for the old simple dictionary cache.

The case of a missing key is handled in the _raw_fft() method in fftpack.py.

charris · 2016-06-07T15:09:09Z

Looking at the code, I think it would be better and simpler to grab the lock around the two accesses in fftpack._raw_fft.

charris · 2016-06-07T15:51:47Z

Hmm, also looks like the list associated with the key can contain multiple wsave items. I wonder if this affects the pruning? In any case, I think the line putting the wsave back in _raw_fft might should be

    fft_cache.setdefault(n, []).append(wsave)

Just in case n was evicted while we were doing the fft.

krischer · 2016-06-07T16:12:20Z

Hmm, also looks like the list associated with the key can contain multiple wsave items. I wonder if this affects the pruning? In any case, I think the line putting the wsave back in _raw_fft might should be
fft_cache.setdefault(n, []).append(wsave)
Just in case n was evicted while we were doing the fft.

That does not work - see my comment in the first post. wsave must only be used by one thread in parallel - I did not check the C code why that is the case but I tried to change it and it causes the code to crash.
The cache including pruning can deal with this.

EDIT: I thought your snippet referred to the retrieval of items from the cache...sorry about that...all these threads are really messing with my brain...

I still don't think it would work for the following reason: there might be a race happening after the setdefault() but before the append() call. The GIL is released in between. It would work if wrapped in a lock but so does the current solution.

Looking at the code, I think it would be better and simpler to grab the lock around the two accesses in fftpack._raw_fft.

Sure. I don't have a strong opinion either way.

One thing that just came to my mind: What happens with the locks from threading when the process is forked, i.e. when multiprocessing is employed? It seems to work fine on some tests on my machine but is this guaranteed on all platforms?

I know you did not like it the first time I proposed it but we could revert this entire PR and change

fft_cache[n].append(wsave)

in fft/fftpack.py to

try:
    fft_cache[n].append(wsave)
except KeyError:
    pass

Then we don't need the locks and it is the only place where a race condition can occur - one thread calls __getattr__() which internally temporarily removes the item and puts it back in - a well timed other thread can try to access it in between. Everything else is protected by the GIL. We'd thus get rid of all the locks at the cost of the occasionally missed cache write.

charris · 2016-06-07T16:39:39Z

I think grabbing the lock in _raw_fft gets rid of the problems with using setdefault in both accesses and also has the virtue that one needs to expend fewer little gray cells worrying about what is safe or what happens inside the cache.

As to forks or equivalent, I think this may vary between operating systems and the details of the fork. It's been a while since I did that (10 years?), so I don't remember much about it.

charris · 2016-06-07T17:37:47Z

try:
    fft_cache[n].append(wsave)
except KeyError:
    pass

Looks OK if we accept not caching wsave on occasion. Assuming the key lookup is atomic, which is in cPython, I believe. I'd still feel better putting all accesses under a lock.

There is now a lock around cache accesses ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.

krischer · 2016-06-07T19:21:50Z

Simplicity is always a good argument :-) The locks are now placed around the cache accesses in fftpack.py. Commits are also rebased and squashed.

charris · 2016-06-07T19:45:24Z

numpy/fft/fftpack.py

+        # again after it has been used. Multiple threads might create multiple
+        # copies of the wsave array. This is intentional and a limitation of
+        # the current C code.
+        with _cache_lock:


I think _cache_lock should be a public member of FFTCache so that every cache has its own lock. This would then be with fft_cache.lock.

I thought about this but decided against it in the end for a couple of reasons:

(1) It makes the code slightly awkward - the locks are members on the cache objects but are acquired and released outside of it. In that case I'd prefer the previous solution where the cache managed its lock.

(2) Performance wise it has no measurable effect as all the operations in the lock are super quick. Also use cases where both real and complex FFTs are called a lot in a threaded context are probably rare.

But its your call. Let me know.

OK, how about just having the two public functions getitem and setitem, and have both hold the internal lock. That would do the same thing.

njsmith · 2016-06-07T20:42:08Z

Honestly I'd probably drop the whole dict emulation thing, and just rewrite it as

class FFTCache(object):
    def get_twiddle_factors(self, n):
        ...
    def return_twiddle_factors(self, n):
        ...

with all logic and locking centralized into a single screen of code.

The goal is to make the code not correct, but obviously correct. Anything involving clever micro-optimizations around the GIL and logic spread out over multiple files is not worth it.

charris · 2016-06-08T00:15:30Z

Maybe put_twiddle_factors.

krischer · 2016-06-08T08:12:24Z

I ended up using put_twiddle_factors and pop_twiddle_factors.

charris · 2016-06-08T14:30:21Z

numpy/fft/helper.py

+            The retrieved twiddle factors if available, else None.
+        """
+        with self._lock:
+            if n not in self._dict or not self._dict[n]:


Maybe

try: all_values = self._dict.pop(n) value = all_values.pop() except (KeyError, IndexError): return None ...

I prefer the current solution as its IMHO a bit clearer. Performance is also not a concern at all as the maximum size of the dictionary is bounded.

charris · 2016-06-08T14:34:30Z

LGTM, thanks. Just one suggestion which you can do with as you please.

charris · 2016-06-09T15:07:10Z

Merged, thanks @krischer . Nice documentation too.

charris reviewed Jun 7, 2016
View reviewed changes

charris added 00 - Bug component: numpy.fft labels Jun 7, 2016

charris added this to the 1.12.0 release milestone Jun 7, 2016

BUG: Fix race condition with new FFT cache

b0efcba

There is now a lock around cache accesses ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.

krischer force-pushed the fft-cache-race-condition branch from 9b8a6f5 to b0efcba Compare June 7, 2016 19:19

charris reviewed Jun 7, 2016
View reviewed changes

Dropping dict like behavior and relying on explicit methods.

820d527

charris reviewed Jun 8, 2016
View reviewed changes

charris merged commit d4a39b0 into numpy:master Jun 9, 2016

krischer deleted the fft-cache-race-condition branch June 9, 2016 16:30

rgommers mentioned this pull request Oct 12, 2016

Problems with multiprocessing and numpy.fft #8140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix race condition with new FFT cache #7712

BUG: Fix race condition with new FFT cache #7712

krischer commented Jun 7, 2016

charris Jun 7, 2016

krischer Jun 7, 2016

charris commented Jun 7, 2016

charris commented Jun 7, 2016

krischer commented Jun 7, 2016 •

edited

charris commented Jun 7, 2016

charris commented Jun 7, 2016

krischer commented Jun 7, 2016

charris Jun 7, 2016

krischer Jun 7, 2016

charris Jun 7, 2016

njsmith commented Jun 7, 2016

charris commented Jun 8, 2016

krischer commented Jun 8, 2016

charris Jun 8, 2016

krischer Jun 9, 2016

charris commented Jun 8, 2016

charris commented Jun 9, 2016

BUG: Fix race condition with new FFT cache #7712

BUG: Fix race condition with new FFT cache #7712

Conversation

krischer commented Jun 7, 2016

charris Jun 7, 2016

Choose a reason for hiding this comment

krischer Jun 7, 2016

Choose a reason for hiding this comment

charris commented Jun 7, 2016

charris commented Jun 7, 2016

krischer commented Jun 7, 2016 • edited

charris commented Jun 7, 2016

charris commented Jun 7, 2016

krischer commented Jun 7, 2016

charris Jun 7, 2016

Choose a reason for hiding this comment

krischer Jun 7, 2016

Choose a reason for hiding this comment

charris Jun 7, 2016

Choose a reason for hiding this comment

njsmith commented Jun 7, 2016

charris commented Jun 8, 2016

krischer commented Jun 8, 2016

charris Jun 8, 2016

Choose a reason for hiding this comment

krischer Jun 9, 2016

Choose a reason for hiding this comment

charris commented Jun 8, 2016

charris commented Jun 9, 2016

krischer commented Jun 7, 2016 •

edited