Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily Disable OpenMP support for libsox #1026

Merged
merged 7 commits into from
Nov 20, 2020

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Nov 13, 2020

Sox effect functions and subprocess cause issues like #1021.
In this PR, I added the tests that mimic this issue. This test properly fails in my development env (see the log bellow), however, it somehow does not fail in our CI. Furthermore, all the tests in test/torchaudio_unittest/sox_effects/dataset_test.py can fail with segmentation fault on my local env, but it does not happen in our CI. My environment is very similar to the CI settings (Docker/Ubuntu + Anaconda), and I can reproduce the issue on my working env but I have not figured out the key difference between my local env and CI.

What are common to these failure cases include;

  • Run sox_effects in subprocess.
  • Subprocesses are launched with fork method.

What we know is that;

The above suggests that there is some sort of interference OpenMP of PyTorch (or MKL) and libsox.
This PR disables OpenMP support of libsox so that, at least users won't have an issue using these functionality while we investigate the issue further.

error report
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /scratch/moto/torchaudio
plugins: hypothesis-5.18.0
collected 3 items

test/torchaudio_unittest/sox_effect/dataset_test.py FFFatal Python error: Segmentation fault

Current thread 0x00007f55ba56a740 (most recent call first):
  File "/scratch/moto/torchaudio/torchaudio/sox_effects/sox_effects.py", line 150 in apply_effects_tensor
  File "/scratch/moto/torchaudio/test/torchaudio_unittest/sox_effect/dataset_test.py", line 134 in speed
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/process.py", line 239 in _process_worker
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/process.py", line 108 in run
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/process.py", line 315 in _bootstrap
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/popen_fork.py", line 75 in _launch
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/popen_fork.py", line 19 in __init__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/context.py", line 276 in _Popen
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/process.py", line 121 in start
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/process.py", line 608 in _adjust_process_count
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/process.py", line 584 in _start_queue_management_thread
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/process.py", line 645 in submit
  File "/scratch/moto/torchaudio/test/torchaudio_unittest/sox_effect/dataset_test.py", line 156 in <listcomp>
  File "/scratch/moto/torchaudio/test/torchaudio_unittest/sox_effect/dataset_test.py", line 156 in test_executor
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/unittest/case.py", line 633 in _callTestMethod
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/unittest/case.py", line 676 in run
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/unittest/case.py", line 736 in __call__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/unittest.py", line 231 in runtest
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 135 in pytest_runtest_call
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 217 in <lambda>
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 244 in from_call
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 216 in call_runtest_hook
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 186 in call_and_report
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 100 in runtestprotocol
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/runner.py", line 85 in pytest_runtest_protocol
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/main.py", line 272 in pytest_runtestloop
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/main.py", line 247 in _main
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/main.py", line 191 in wrap_session
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/_pytest/config/__init__.py", line 124 in main
  File "/home/moto/conda/envs/PY3.8-cuda101/bin/pytest", line 11 in <module>
F                                                                                                                                                                                [100%]

================================================================================================================== FAILURES ==================================================================================================================
_______________________________________________________________________________________________ TestSoxEffectsDataset.test_apply_effects_file ________________________________________________________________________________________________

self = <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f559e9a4430>, timeout = 5.0

    def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):
        # Tries to fetch data from `self._data_queue` once for a given timeout.
        # This can also be used as inner loop of fetching without timeout, with
        # the sender status as the loop condition.
        #
        # This raises a `RuntimeError` if any worker died expectedly. This error
        # can come from either the SIGCHLD handler in `_utils/signal_handling.py`
        # (only for non-Windows platforms), or the manual check below on errors
        # and timeouts.
        #
        # Returns a 2-tuple:
        #   (bool: whether successfully get data, any: data if successful else None)
        try:
>           data = self._data_queue.get(timeout=timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:956:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.queues.Queue object at 0x7f559e9a4550>, block = True, timeout = 4.999965248629451

    def get(self, block=True, timeout=None):
        if self._closed:
            raise ValueError(f"Queue {self!r} is closed")
        if block and timeout is None:
            with self._rlock:
                res = self._recv_bytes()
            self._sem.release()
        else:
            if block:
                deadline = time.monotonic() + timeout
            if not self._rlock.acquire(block, timeout):
                raise Empty
            try:
                if block:
                    timeout = deadline - time.monotonic()
>                   if not self._poll(timeout):

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/queues.py:107:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.connection.Connection object at 0x7f559e9a45e0>, timeout = 4.999965248629451

    def poll(self, timeout=0.0):
        """Whether there is any input available to be read"""
        self._check_closed()
        self._check_readable()
>       return self._poll(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.connection.Connection object at 0x7f559e9a45e0>, timeout = 4.999965248629451

    def _poll(self, timeout):
>       r = wait([self], timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:424:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

object_list = [<multiprocessing.connection.Connection object at 0x7f559e9a45e0>], timeout = 4.999965248629451

    def wait(object_list, timeout=None):
        '''
        Wait till an object in object_list is ready/readable.

        Returns list of those objects in object_list which are ready/readable.
        '''
        with _WaitSelector() as selector:
            for obj in object_list:
                selector.register(obj, selectors.EVENT_READ)

            if timeout is not None:
                deadline = time.monotonic() + timeout

            while True:
>               ready = selector.select(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:931:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <selectors.PollSelector object at 0x7f559e9617c0>, timeout = 5000

    def select(self, timeout=None):
        # This is shared between poll() and epoll().
        # epoll() has a different signature and handling of timeout parameter.
        if timeout is None:
            timeout = None
        elif timeout <= 0:
            timeout = 0
        else:
            # poll() has a resolution of 1 millisecond, round away from
            # zero to wait *at least* timeout seconds.
            timeout = math.ceil(timeout * 1e3)
        ready = []
        try:
>           fd_event_list = self._selector.poll(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/selectors.py:415:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

signum = 17, frame = <frame at 0x7f559e96a640, file '/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/selectors.py', line 417, code select>

    def handler(signum, frame):
        # This following call uses `waitid` with WNOHANG from C side. Therefore,
        # Python can still get and update the process status successfully.
>       _error_if_any_worker_fails()
E       RuntimeError: DataLoader worker (pid 17123) is killed by signal: Segmentation fault.

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py:66: RuntimeError

The above exception was the direct cause of the following exception:

self = <torchaudio_unittest.sox_effect.dataset_test.TestSoxEffectsDataset testMethod=test_apply_effects_file>

    def test_apply_effects_file(self):
        sample_rate = 12000
        flist = self._generate_dataset()
        dataset = RandomPerturbationFile(flist, sample_rate)
        loader = torch.utils.data.DataLoader(
            dataset, batch_size=32, num_workers=16,
            worker_init_fn=init_random_seed,
        )
>       for batch in loader:

test/torchaudio_unittest/sox_effect/dataset_test.py:104:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:519: in __next__
    data = self._next_data()
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1152: in _next_data
    idx, data = self._get_data()
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1118: in _get_data
    success, data = self._try_get_data()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f559e9a4430>, timeout = 5.0

    def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):
        # Tries to fetch data from `self._data_queue` once for a given timeout.
        # This can also be used as inner loop of fetching without timeout, with
        # the sender status as the loop condition.
        #
        # This raises a `RuntimeError` if any worker died expectedly. This error
        # can come from either the SIGCHLD handler in `_utils/signal_handling.py`
        # (only for non-Windows platforms), or the manual check below on errors
        # and timeouts.
        #
        # Returns a 2-tuple:
        #   (bool: whether successfully get data, any: data if successful else None)
        try:
            data = self._data_queue.get(timeout=timeout)
            return (True, data)
        except Exception as e:
            # At timeout and error, we manually check whether any worker has
            # failed. Note that this is the only mechanism for Windows to detect
            # worker failures.
            failed_workers = []
            for worker_id, w in enumerate(self._workers):
                if self._workers_status[worker_id] and not w.is_alive():
                    failed_workers.append(w)
                    self._mark_worker_as_unavailable(worker_id)
            if len(failed_workers) > 0:
                pids_str = ', '.join(str(w.pid) for w in failed_workers)
>               raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
E               RuntimeError: DataLoader worker (pid(s) 17123, 17124) exited unexpectedly

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:969: RuntimeError
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.

______________________________________________________________________________________________ TestSoxEffectsDataset.test_apply_effects_tensor _______________________________________________________________________________________________

self = <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f559c8abb80>, timeout = 5.0

    def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):
        # Tries to fetch data from `self._data_queue` once for a given timeout.
        # This can also be used as inner loop of fetching without timeout, with
        # the sender status as the loop condition.
        #
        # This raises a `RuntimeError` if any worker died expectedly. This error
        # can come from either the SIGCHLD handler in `_utils/signal_handling.py`
        # (only for non-Windows platforms), or the manual check below on errors
        # and timeouts.
        #
        # Returns a 2-tuple:
        #   (bool: whether successfully get data, any: data if successful else None)
        try:
>           data = self._data_queue.get(timeout=timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:956:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.queues.Queue object at 0x7f559c8abd00>, block = True, timeout = 4.999982295557857

    def get(self, block=True, timeout=None):
        if self._closed:
            raise ValueError(f"Queue {self!r} is closed")
        if block and timeout is None:
            with self._rlock:
                res = self._recv_bytes()
            self._sem.release()
        else:
            if block:
                deadline = time.monotonic() + timeout
            if not self._rlock.acquire(block, timeout):
                raise Empty
            try:
                if block:
                    timeout = deadline - time.monotonic()
>                   if not self._poll(timeout):

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/queues.py:107:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.connection.Connection object at 0x7f559c8abd60>, timeout = 4.999982295557857

    def poll(self, timeout=0.0):
        """Whether there is any input available to be read"""
        self._check_closed()
        self._check_readable()
>       return self._poll(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.connection.Connection object at 0x7f559c8abd60>, timeout = 4.999982295557857

    def _poll(self, timeout):
>       r = wait([self], timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:424:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

object_list = [<multiprocessing.connection.Connection object at 0x7f559c8abd60>], timeout = 4.999982295557857

    def wait(object_list, timeout=None):
        '''
        Wait till an object in object_list is ready/readable.

        Returns list of those objects in object_list which are ready/readable.
        '''
        with _WaitSelector() as selector:
            for obj in object_list:
                selector.register(obj, selectors.EVENT_READ)

            if timeout is not None:
                deadline = time.monotonic() + timeout

            while True:
>               ready = selector.select(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/multiprocessing/connection.py:931:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <selectors.PollSelector object at 0x7f559c624ee0>, timeout = 5000

    def select(self, timeout=None):
        # This is shared between poll() and epoll().
        # epoll() has a different signature and handling of timeout parameter.
        if timeout is None:
            timeout = None
        elif timeout <= 0:
            timeout = 0
        else:
            # poll() has a resolution of 1 millisecond, round away from
            # zero to wait *at least* timeout seconds.
            timeout = math.ceil(timeout * 1e3)
        ready = []
        try:
>           fd_event_list = self._selector.poll(timeout)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/selectors.py:415:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

signum = 17, frame = <frame at 0x7f559c82e440, file '/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/selectors.py', line 417, code select>

    def handler(signum, frame):
        # This following call uses `waitid` with WNOHANG from C side. Therefore,
        # Python can still get and update the process status successfully.
>       _error_if_any_worker_fails()
E       RuntimeError: DataLoader worker (pid 17143) is killed by signal: Segmentation fault.

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py:66: RuntimeError

The above exception was the direct cause of the following exception:

self = <torchaudio_unittest.sox_effect.dataset_test.TestSoxEffectsDataset testMethod=test_apply_effects_tensor>

    def test_apply_effects_tensor(self):
        sample_rate = 12000
        signals = self._generate_signals()
        dataset = RandomPerturbationTensor(signals, sample_rate)
        loader = torch.utils.data.DataLoader(
            dataset, batch_size=32, num_workers=16,
            worker_init_fn=init_random_seed,
        )
>       for batch in loader:

test/torchaudio_unittest/sox_effect/dataset_test.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:519: in __next__
    data = self._next_data()
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1152: in _next_data
    idx, data = self._get_data()
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1118: in _get_data
    success, data = self._try_get_data()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f559c8abb80>, timeout = 5.0

    def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):
        # Tries to fetch data from `self._data_queue` once for a given timeout.
        # This can also be used as inner loop of fetching without timeout, with
        # the sender status as the loop condition.
        #
        # This raises a `RuntimeError` if any worker died expectedly. This error
        # can come from either the SIGCHLD handler in `_utils/signal_handling.py`
        # (only for non-Windows platforms), or the manual check below on errors
        # and timeouts.
        #
        # Returns a 2-tuple:
        #   (bool: whether successfully get data, any: data if successful else None)
        try:
            data = self._data_queue.get(timeout=timeout)
            return (True, data)
        except Exception as e:
            # At timeout and error, we manually check whether any worker has
            # failed. Note that this is the only mechanism for Windows to detect
            # worker failures.
            failed_workers = []
            for worker_id, w in enumerate(self._workers):
                if self._workers_status[worker_id] and not w.is_alive():
                    failed_workers.append(w)
                    self._mark_worker_as_unavailable(worker_id)
            if len(failed_workers) > 0:
                pids_str = ', '.join(str(w.pid) for w in failed_workers)
>               raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
E               RuntimeError: DataLoader worker (pid(s) 17143, 17144, 17145, 17146) exited unexpectedly

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/utils/data/dataloader.py:969: RuntimeError
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.

___________________________________________________________________________________________________ TestProcessPoolExecutor.test_executor ____________________________________________________________________________________________________

self = <torchaudio_unittest.sox_effect.dataset_test.TestProcessPoolExecutor testMethod=test_executor>

    def test_executor(self):
        """Test that apply_effects_tensor with speed + rate does not crush

        https://github.com/pytorch/audio/issues/1021
        """
        executor = ProcessPoolExecutor(1)
        futures = [executor.submit(speed, path) for path in self.flist]
        for future in futures:
>           future.result()

test/torchaudio_unittest/sox_effect/dataset_test.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/_base.py:439: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Future at 0x7f559c6ef6d0 state=finished raised BrokenProcessPool>

    def __get_result(self):
        if self._exception:
>           raise self._exception
E           concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/concurrent/futures/_base.py:388: BrokenProcessPool
========================================================================================================== short test summary info ===========================================================================================================
FAILED test/torchaudio_unittest/sox_effect/dataset_test.py::TestSoxEffectsDataset::test_apply_effects_file - RuntimeError: DataLoader worker (pid(s) 17123, 17124) exited unexpectedly
FAILED test/torchaudio_unittest/sox_effect/dataset_test.py::TestSoxEffectsDataset::test_apply_effects_tensor - RuntimeError: DataLoader worker (pid(s) 17143, 17144, 17145, 17146) exited unexpectedly
FAILED test/torchaudio_unittest/sox_effect/dataset_test.py::TestProcessPoolExecutor::test_executor - concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pe...
============================================================================================================= 3 failed in 1.42s =============================================================================================================

This one should go to 0.7.1 release.

@mthrok mthrok marked this pull request as ready for review November 16, 2020 14:28
@mthrok mthrok marked this pull request as draft November 16, 2020 14:28
@mthrok mthrok changed the title Disable OpenMP support for libsox Temporarily Disable OpenMP support for libsox Nov 17, 2020
@mthrok mthrok marked this pull request as ready for review November 17, 2020 00:22
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a test to catch this locally. As we discussed, disabling OpenMP temporarily in sox seems like the right temporary workaround.

@@ -76,5 +76,5 @@ ExternalProject_Add(libsox
DOWNLOAD_DIR ${ARCHIVE_DIR}
URL https://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2
URL_HASH SHA256=81a6956d4330e75b5827316e44ae381e6f1e8928003c6aa45896da9041ea149c
CONFIGURE_COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/build_codec_helper.sh ${CMAKE_CURRENT_SOURCE_DIR}/src/libsox/configure ${COMMON_ARGS} --with-lame --with-flac --with-mad --with-oggvorbis --without-alsa --without-coreaudio --without-png --without-oss --without-sndfile --with-opus
CONFIGURE_COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/build_codec_helper.sh ${CMAKE_CURRENT_SOURCE_DIR}/src/libsox/configure ${COMMON_ARGS} --with-lame --with-flac --with-mad --with-oggvorbis --without-alsa --without-coreaudio --without-png --without-oss --without-sndfile --with-opus --disable-openmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment linking to this issue here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. added the comment to this issue.

@cpuhrsch
Copy link
Contributor

When we're bringing this back we need to figure out how to link against the version of openmp that pytorch is linking against.

@mthrok
Copy link
Collaborator Author

mthrok commented Nov 20, 2020

When we're bringing this back we need to figure out how to link against the version of openmp that pytorch is linking against.

I am not fully aware of how to do this, but there is a similar conversation going in vision, this might be the way.
pytorch/vision#2783 (comment)

@mthrok mthrok merged commit 7580485 into pytorch:master Nov 20, 2020
@mthrok mthrok deleted the disable-openmp-sox branch November 20, 2020 16:52
mthrok added a commit to mthrok/audio that referenced this pull request Nov 20, 2020
Currently `libsox` on Linux is compiled with GPU OpenMP and it interferes with the version PyTorch uses (Intel in case of binary distribution). This PR disables OpenMP support for `libsox`, while we investigate the way to use the same OpenMP as PyTorch's version.
mthrok added a commit that referenced this pull request Dec 3, 2020
Currently `libsox` on Linux is compiled with GPU OpenMP and it interferes with the version PyTorch uses (Intel in case of binary distribution). This PR disables OpenMP support for `libsox`, while we investigate the way to use the same OpenMP as PyTorch's version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants