Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken thread safety #88

Closed
eindenbom opened this issue Jan 23, 2024 · 20 comments · Fixed by #90
Closed

Broken thread safety #88

eindenbom opened this issue Jan 23, 2024 · 20 comments · Fixed by #90

Comments

@eindenbom
Copy link

lazy_loader loaded modules are not thread-safe: when concurrently accessed from several threads all but the first thread get partially loaded module missing most of functionality.

For example librosa uses lazy_loader to load resampy and gets the following error:

AttributeError: module 'resampy' has no attribute 'resample'

To Reproduce
Run the following snippet:

import librosa
import threading
import numpy as np


def resample():
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")


for _ in range(5):
    threading.Thread(target=resample).start()

Expected behavior
No errors reported.

Actual output

Traceback (most recent call last):
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
          File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
self._target(*self._args, **self._kwargs)self._target(*self._args, **self._kwargs)

  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    self._target(*self._args, **self._kwargs)
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: module 'resampy' has no attribute 'resample'
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: module 'resampy' has no attribute 'resample'AttributeError
: module 'resampy' has no attribute 'resample'
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: module 'resampy' has no attribute 'resample'
python-BaseException
python-BaseException
python-BaseException
python-BaseException
Exception in thread Exception in thread Thread-8:
Exception in thread Traceback (most recent call last):
Thread-5:
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 932, in _bootstrap_inner
  File "C:\Build\envs\dec5\lib\threading.py", line 932, in _bootstrap_inner
Thread-9:
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
    self.run()
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
self.run()
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    self._target(*self._args, **self._kwargs)
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    self._target(*self._args, **self._kwargs)
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: AttributeError: module 'resampy' has no attribute 'resample'
module 'resampy' has no attribute 'resample'    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")

  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
python-BaseException
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: module 'resampy' has no attribute 'resample'
Exception in thread Thread-6:
Traceback (most recent call last):
  File "C:\Build\envs\dec5\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Build\envs\dec5\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "D:\source\internal_ai_deepaudio\training\break_lazy_load.py", line 7, in resample
    librosa.resample(np.zeros(1000), orig_sr=24_000, target_sr=16_000, res_type="kaiser_best")
  File "C:\Build\envs\dec5\lib\site-packages\librosa\core\audio.py", line 677, in resample
    y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=axis)
AttributeError: module 'resampy' has no attribute 'resample'

Software versions

Windows-10-10.0.22621-SP0
Python 3.8.0 (default, Nov  6 2019, 16:00:02) [MSC v.1916 64 bit (AMD64)]
NumPy 1.23.5
SciPy 1.10.1
librosa 0.10.1
INSTALLED VERSIONS
------------------
python: 3.8.0 (default, Nov  6 2019, 16:00:02) [MSC v.1916 64 bit (AMD64)]

librosa: 0.10.1

audioread: 3.0.1
numpy: 1.23.5
scipy: 1.10.1
sklearn: 1.3.2
joblib: 1.3.2
decorator: 5.1.1
numba: 0.58.1
soundfile: 0.12.1
pooch: v1.8.0
soxr: 0.3.7
typing_extensions: installed, no version number available
lazy_loader: installed, no version number available
msgpack: 1.0.7

numpydoc: None
sphinx: None
sphinx_rtd_theme: None
matplotlib: None
sphinx_multiversion: None
sphinx_gallery: None
mir_eval: None
ipython: None
sphinxcontrib.rsvgconverter: None
pytest: None
pytest_mpl: None
pytest_cov: None
samplerate: None
resampy: 0.4.2
presets: None
packaging: 23.2
@stefanv
Copy link
Member

stefanv commented Jan 25, 2024

I had to repeat it a few times to trigger this behavior, but I see it.

@stefanv
Copy link
Member

stefanv commented Jan 25, 2024

Standalone reproducer for a different race condition:

import threading
import time
import lazy_loader as lazy


def repeat_lazy():
    time.sleep(0.5)
    np = lazy.load('numpy')


for _ in range(50):
    threading.Thread(target=repeat_lazy).start()

stefanv added a commit to stefanv/lazy_loader that referenced this issue Jan 25, 2024
@stefanv
Copy link
Member

stefanv commented Jan 25, 2024

@eindenbom Can you give #90 a try?

@eindenbom
Copy link
Author

I do not think #90 fixes the same race condition as reported in this issue. You are fixing a possible race (I am not sure that there is any) in lazy_loader.load(), while in librosa it is executed at librosa load time and therefore protected by module load lock.

The race I have reported is in importlib.util._LazyModule.__getattr__(), line 228: self.__class__ = types.ModuleType.
The __class__ meta attribute is replaced BEFORE actual module is loaded. This creates a race condition if module attribute is requested concurrently from another thread.

@stefanv
Copy link
Member

stefanv commented Jan 25, 2024

The tests confirm that the other race condition exists, but you are right that #90 does not address this issue.

@effigies
Copy link
Collaborator

I do not think #90 fixes the same race condition as reported in this issue. You are fixing a possible race (I am not sure that there is any) in lazy_loader.load(), while in librosa it is executed at librosa load time and therefore protected by module load lock.

The race I have reported is in importlib.util._LazyModule.__getattr__(), line 228: self.__class__ = types.ModuleType. The __class__ meta attribute is replaced BEFORE actual module is loaded. This creates a race condition if module attribute is requested concurrently from another thread.

Agreed, this is a CPython bug:

import importlib
import threading
import sys

spec = importlib.util.find_spec('http')
module = importlib.util.module_from_spec(spec)
http = sys.modules['http'] = module

loader = importlib.util.LazyLoader(spec.loader)                                                     
loader.exec_module(module)

def check(): 
    return http.HTTPStatus.ACCEPTED == 202                                                          
        
for _ in range(5):  
    threading.Thread(target=check).start()

This looks painful to fix, since attempting to attach a lock to the _LazyModule class or instance would result in a recursive call to __getattribute__. Avoiding that is why self.__class__ is replaced before doing anything else. A global lock or a dict[ModuleType, Lock] table are the only things I can think of.

@stefanv
Copy link
Member

stefanv commented Jan 30, 2024

Thanks for investigating, @effigies! Adding time.sleep(0.2) into check makes failure more reliable for me.

@stefanv
Copy link
Member

stefanv commented Jan 30, 2024

Can we simply lock up exec_module?

import importlib
import threading
import sys
import time

spec = importlib.util.find_spec('http')
module = importlib.util.module_from_spec(spec)
http = sys.modules['http'] = module

lock = threading.Lock()

def lock_func(f):
    def locked(*args, **kwargs):
        with lock:
            return f(*args, **kwargs)
    return locked

loader = importlib.util.LazyLoader(spec.loader)
loader.exec_module = lock_func(loader.exec_module)
loader.exec_module(module)


def check():
    time.sleep(0.2)
    return http.HTTPStatus.ACCEPTED == 202

for _ in range(5):
    threading.Thread(target=check).start()

@effigies
Copy link
Collaborator

I don't think so, because exec_module happens at lazy load time. The race condition happens at the actual attribute lookup.

@stefanv
Copy link
Member

stefanv commented Jan 30, 2024

Yes, but look at the above example, which wraps exec_module in a lock.

@effigies
Copy link
Collaborator

In [1]: import importlib
   ...: import threading
   ...: import sys
   ...: import time
   ...: 
   ...: spec = importlib.util.find_spec('http')
   ...: module = importlib.util.module_from_spec(spec)
   ...: http = sys.modules['http'] = module
   ...: 
   ...: lock = threading.Lock()
   ...: 
   ...: def lock_func(f):
   ...:     def locked(*args, **kwargs):
   ...:         with lock:
   ...:             return f(*args, **kwargs)
   ...:     return locked
   ...: 
   ...: loader = importlib.util.LazyLoader(spec.loader)
   ...: loader.exec_module = lock_func(loader.exec_module)
   ...: loader.exec_module(module)
   ...: 
   ...: 
   ...: def check():
   ...:     time.sleep(0.2)
   ...:     return http.HTTPStatus.ACCEPTED == 202
   ...: 
   ...: for _ in range(5):
   ...:     threading.Thread(target=check).start()
   ...: 

Exception in thread Thread-2 (check):
Traceback (most recent call last):
  File "/home/chris/mambaforge/envs/default/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/chris/mambaforge/envs/default/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-1-1b0c7fd1f576>", line 25, in check
Exception in thread Thread-5 (check):
Traceback (most recent call last):
  File "/home/chris/mambaforge/envs/default/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/chris/mambaforge/envs/default/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-1-1b0c7fd1f576>", line 25, in check
AttributeError: module 'http' has no attribute 'HTTPStatus'
AttributeError: module 'http' has no attribute 'HTTPStatus'

@stefanv
Copy link
Member

stefanv commented Jan 30, 2024

Hrm, yes, looks like I have to use lock_func(spec.loader.exec_module), but not sure how that alters behavior.

@effigies
Copy link
Collaborator

The loader isn't where the race is. I might be wrong, but I don't think that will protect the actual critical operations.

@stefanv
Copy link
Member

stefanv commented Jan 30, 2024

Right, so we'll have to ensure "attribute access + lazy loading" becomes an atomic operation.

@effigies
Copy link
Collaborator

I went ahead and opened python/cpython#114763. I think I have a fix.

@effigies
Copy link
Collaborator

effigies commented Jan 31, 2024

@eindenbom If you're prepared to build your whole dependency tree for 3.13-dev, you could test out python/cpython#114781. I suspect it can be rebased on the 3.12 branch as well, if you'd prefer.

@stefanv stefanv reopened this Feb 6, 2024
@stefanv
Copy link
Member

stefanv commented Feb 24, 2024

Thanks to @effigies, this issue has been addressed by python/cpython#114781.

@eindenbom This should soon be backported to the 3.11 and 3.12 source trees.

@effigies
Copy link
Collaborator

Should be out in 3.11.9 (python/cpython#115871 - ETA April 1) and 3.12.3 (python/cpython#115870 - ETA April 9).

@stefanv
Copy link
Member

stefanv commented Feb 27, 2024

@effigies Does this affect / make unnecessary #90?

@effigies
Copy link
Collaborator

No, my read is that these are two independent races.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants