Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix race condition with new FFT cache #7712

Merged
merged 2 commits into from Jun 9, 2016

Conversation

krischer
Copy link
Contributor

@krischer krischer commented Jun 7, 2016

This is a follow up to #7686 and fixes a race condition introduced with that PR.

Setting and getting items is now protected by a lock ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.

A comment: The cache behaves like a dictionary whose values are lists. Each list contains the twiddle factors for an FFT of a certain length, potentially multiple times. Duplicating the twiddle factors is necessary in a threaded case as they can only be used by a single thread concurrently. Multiple parallel threads calculating FFTs of equal lengths (an IMHO quite realistic use case) would otherwise either have to wait for the other threads to finish or constantly recalculate the twiddle factors, both greatly impacting performance. The cache always behaved like this even when it was just a simple dictionary.

return value
with self._lock:
# pop + add to move it to the end.
value = self._dict.pop(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that key is in _dict? Could it not have been pruned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will just raise a KeyError which is the same behavior as calling __getitem__() of a dictionary with a non-existing key.

Lots of methods are missing but at least __getitem__() and __setitem__() mimic the behavior of a normal dict making the new cache a drop-in replacement for the old simple dictionary cache.

The case of a missing key is handled in the _raw_fft() method in fftpack.py.

@charris
Copy link
Member

charris commented Jun 7, 2016

Looking at the code, I think it would be better and simpler to grab the lock around the two accesses in fftpack._raw_fft.

@charris
Copy link
Member

charris commented Jun 7, 2016

Hmm, also looks like the list associated with the key can contain multiple wsave items. I wonder if this affects the pruning? In any case, I think the line putting the wsave back in _raw_fft might should be

    fft_cache.setdefault(n, []).append(wsave)

Just in case n was evicted while we were doing the fft.

@krischer
Copy link
Contributor Author

krischer commented Jun 7, 2016

Hmm, also looks like the list associated with the key can contain multiple wsave items. I wonder if this affects the pruning? In any case, I think the line putting the wsave back in _raw_fft might should be

fft_cache.setdefault(n, []).append(wsave)

Just in case n was evicted while we were doing the fft.

That does not work - see my comment in the first post. wsave must only be used by one thread in parallel - I did not check the C code why that is the case but I tried to change it and it causes the code to crash.
The cache including pruning can deal with this.

EDIT: I thought your snippet referred to the retrieval of items from the cache...sorry about that...all these threads are really messing with my brain...

I still don't think it would work for the following reason: there might be a race happening after the setdefault() but before the append() call. The GIL is released in between. It would work if wrapped in a lock but so does the current solution.

Looking at the code, I think it would be better and simpler to grab the lock around the two accesses in fftpack._raw_fft.

Sure. I don't have a strong opinion either way.

One thing that just came to my mind: What happens with the locks from threading when the process is forked, i.e. when multiprocessing is employed? It seems to work fine on some tests on my machine but is this guaranteed on all platforms?

I know you did not like it the first time I proposed it but we could revert this entire PR and change

fft_cache[n].append(wsave)

in fft/fftpack.py to

try:
    fft_cache[n].append(wsave)
except KeyError:
    pass

Then we don't need the locks and it is the only place where a race condition can occur - one thread calls __getattr__() which internally temporarily removes the item and puts it back in - a well timed other thread can try to access it in between. Everything else is protected by the GIL. We'd thus get rid of all the locks at the cost of the occasionally missed cache write.

@charris
Copy link
Member

charris commented Jun 7, 2016

I think grabbing the lock in _raw_fft gets rid of the problems with using setdefault in both accesses and also has the virtue that one needs to expend fewer little gray cells worrying about what is safe or what happens inside the cache.

As to forks or equivalent, I think this may vary between operating systems and the details of the fork. It's been a while since I did that (10 years?), so I don't remember much about it.

@charris
Copy link
Member

charris commented Jun 7, 2016

try:
    fft_cache[n].append(wsave)
except KeyError:
    pass

Looks OK if we accept not caching wsave on occasion. Assuming the key lookup is atomic, which is in cPython, I believe. I'd still feel better putting all accesses under a lock.

There is now a lock around cache accesses ensuring thread safety. The
size of a cache entry is now also calculated by summing over all arrays
in the list.
@krischer
Copy link
Contributor Author

krischer commented Jun 7, 2016

Simplicity is always a good argument :-) The locks are now placed around the cache accesses in fftpack.py. Commits are also rebased and squashed.

# again after it has been used. Multiple threads might create multiple
# copies of the wsave array. This is intentional and a limitation of
# the current C code.
with _cache_lock:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _cache_lock should be a public member of FFTCache so that every cache has its own lock. This would then be with fft_cache.lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but decided against it in the end for a couple of reasons:

(1) It makes the code slightly awkward - the locks are members on the cache objects but are acquired and released outside of it. In that case I'd prefer the previous solution where the cache managed its lock.

(2) Performance wise it has no measurable effect as all the operations in the lock are super quick. Also use cases where both real and complex FFTs are called a lot in a threaded context are probably rare.

But its your call. Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, how about just having the two public functions getitem and setitem, and have both hold the internal lock. That would do the same thing.

@njsmith
Copy link
Member

njsmith commented Jun 7, 2016

Honestly I'd probably drop the whole dict emulation thing, and just rewrite it as

class FFTCache(object):
    def get_twiddle_factors(self, n):
        ...
    def return_twiddle_factors(self, n):
        ...

with all logic and locking centralized into a single screen of code.

The goal is to make the code not correct, but obviously correct. Anything involving clever micro-optimizations around the GIL and logic spread out over multiple files is not worth it.

@charris
Copy link
Member

charris commented Jun 8, 2016

Maybe put_twiddle_factors.

@krischer
Copy link
Contributor Author

krischer commented Jun 8, 2016

I ended up using put_twiddle_factors and pop_twiddle_factors.

The retrieved twiddle factors if available, else None.
"""
with self._lock:
if n not in self._dict or not self._dict[n]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

try:
    all_values = self._dict.pop(n)
    value = all_values.pop()
except (KeyError, IndexError):
    return None
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the current solution as its IMHO a bit clearer. Performance is also not a concern at all as the maximum size of the dictionary is bounded.

@charris
Copy link
Member

charris commented Jun 8, 2016

LGTM, thanks. Just one suggestion which you can do with as you please.

@charris charris merged commit d4a39b0 into numpy:master Jun 9, 2016
@charris
Copy link
Member

charris commented Jun 9, 2016

Merged, thanks @krischer . Nice documentation too.

@krischer krischer deleted the fft-cache-race-condition branch June 9, 2016 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants