New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix race condition with new FFT cache #7712
Conversation
return value | ||
with self._lock: | ||
# pop + add to move it to the end. | ||
value = self._dict.pop(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that key
is in _dict
? Could it not have been pruned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will just raise a KeyError
which is the same behavior as calling __getitem__()
of a dictionary with a non-existing key.
Lots of methods are missing but at least __getitem__()
and __setitem__()
mimic the behavior of a normal dict
making the new cache a drop-in replacement for the old simple dictionary cache.
The case of a missing key is handled in the _raw_fft()
method in fftpack.py
.
Looking at the code, I think it would be better and simpler to grab the lock around the two accesses in |
Hmm, also looks like the list associated with the key can contain multiple
Just in case |
That does not work - see my comment in the first post. EDIT: I thought your snippet referred to the retrieval of items from the cache...sorry about that...all these threads are really messing with my brain... I still don't think it would work for the following reason: there might be a race happening after the
Sure. I don't have a strong opinion either way. One thing that just came to my mind: What happens with the locks from I know you did not like it the first time I proposed it but we could revert this entire PR and change fft_cache[n].append(wsave) in try:
fft_cache[n].append(wsave)
except KeyError:
pass Then we don't need the locks and it is the only place where a race condition can occur - one thread calls |
I think grabbing the lock in As to forks or equivalent, I think this may vary between operating systems and the details of the fork. It's been a while since I did that (10 years?), so I don't remember much about it. |
Looks OK if we accept not caching |
There is now a lock around cache accesses ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.
9b8a6f5
to
b0efcba
Compare
Simplicity is always a good argument :-) The locks are now placed around the cache accesses in |
# again after it has been used. Multiple threads might create multiple | ||
# copies of the wsave array. This is intentional and a limitation of | ||
# the current C code. | ||
with _cache_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _cache_lock
should be a public member of FFTCache
so that every cache has its own lock. This would then be with fft_cache.lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but decided against it in the end for a couple of reasons:
(1) It makes the code slightly awkward - the locks are members on the cache objects but are acquired and released outside of it. In that case I'd prefer the previous solution where the cache managed its lock.
(2) Performance wise it has no measurable effect as all the operations in the lock are super quick. Also use cases where both real and complex FFTs are called a lot in a threaded context are probably rare.
But its your call. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, how about just having the two public functions getitem
and setitem
, and have both hold the internal lock. That would do the same thing.
Honestly I'd probably drop the whole dict emulation thing, and just rewrite it as
with all logic and locking centralized into a single screen of code. The goal is to make the code not correct, but obviously correct. Anything involving clever micro-optimizations around the GIL and logic spread out over multiple files is not worth it. |
Maybe |
I ended up using |
The retrieved twiddle factors if available, else None. | ||
""" | ||
with self._lock: | ||
if n not in self._dict or not self._dict[n]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
try:
all_values = self._dict.pop(n)
value = all_values.pop()
except (KeyError, IndexError):
return None
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the current solution as its IMHO a bit clearer. Performance is also not a concern at all as the maximum size of the dictionary is bounded.
LGTM, thanks. Just one suggestion which you can do with as you please. |
Merged, thanks @krischer . Nice documentation too. |
This is a follow up to #7686 and fixes a race condition introduced with that PR.
Setting and getting items is now protected by a lock ensuring thread safety. The size of a cache entry is now also calculated by summing over all arrays in the list.
A comment: The cache behaves like a dictionary whose values are lists. Each list contains the twiddle factors for an FFT of a certain length, potentially multiple times. Duplicating the twiddle factors is necessary in a threaded case as they can only be used by a single thread concurrently. Multiple parallel threads calculating FFTs of equal lengths (an IMHO quite realistic use case) would otherwise either have to wait for the other threads to finish or constantly recalculate the twiddle factors, both greatly impacting performance. The cache always behaved like this even when it was just a simple dictionary.