Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuation of PR #7280, fixing lifetime of TBB task_scheduler_handle #7294

Merged
merged 2 commits into from Aug 17, 2021

Conversation

stuartarchibald
Copy link
Contributor

As title.

As title. Minor refactor of pycc support testing into
numba.tests.support to save code duplication.
@stuartarchibald
Copy link
Contributor Author

96edef4 fails for the TBB build with https://dev.azure.com/numba/numba/_build/results?buildId=9489&view=logs&j=064bd7ca-23e4-5078-9cb6-8968eb13bf19&t=30a4dffd-4ede-53d2-ca39-79d23392ba79&l=409 (segfault)

======================================================================
FAIL: test_lifetime_of_task_scheduler_handle (numba.tests.test_parallel_backend.TestTBBSpecificIssues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vsts/work/1/s/numba/tests/test_parallel_backend.py", line 1032, in test_lifetime_of_task_scheduler_handle
    out, err = self.run_cmd(cmdline, env=env)
  File "/home/vsts/work/1/s/numba/tests/test_parallel_backend.py", line 478, in run_cmd
    (popen.returncode, err.decode()))
AssertionError: process failed with code -11: stderr follows
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
Fatal Python error: Segmentation fault

Current thread 0x00007f8487c04740 (most recent call first):
Fatal Python error: Segmentation fault

Current thread 0x00007f8487c04740 (most recent call first):

@stuartarchibald
Copy link
Contributor Author

792e2d4 contains @PokhodenkoSA / @Hardcode84 patch for the issue.

@stuartarchibald
Copy link
Contributor Author

792e2d4 failed once with :

Traceback (most recent call last):
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/vsts/work/1/s/numba/runtests.py", line 9, in <module>
    sys.exit(0 if _main(sys.argv) else 1)
  File "/home/vsts/work/1/s/numba/testing/_runtests.py", line 26, in _main
    **kwds).wasSuccessful()
  File "/home/vsts/work/1/s/numba/testing/__init__.py", line 60, in run_tests
    nomultiproc=nomultiproc)
  File "/home/vsts/work/1/s/numba/testing/main.py", line 168, in __init__
    super(NumbaTestProgram, self).__init__(*args, **kwargs)
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/home/vsts/work/1/s/numba/testing/main.py", line 340, in runTests
    run_tests_real()
  File "/home/vsts/work/1/s/numba/testing/main.py", line 325, in run_tests_real
    super(NumbaTestProgram, self).runTests()
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/home/vsts/work/1/s/numba/testing/main.py", line 795, in run
    return super(ParallelTestRunner, self).run(self._run_inner)
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/unittest/runner.py", line 176, in run
    test(result)
  File "/home/vsts/work/1/s/numba/testing/main.py", line 743, in _run_inner
    self._run_parallel_tests(result, pool, child_runner, tests)
  File "/home/vsts/work/1/s/numba/testing/main.py", line 779, in _run_parallel_tests
    raise e
  File "/home/vsts/work/1/s/numba/testing/main.py", line 770, in _run_parallel_tests
    child_result = it.__next__(self.timeout)
  File "/home/vsts/miniconda3/envs/azure_ci/lib/python3.7/multiprocessing/pool.py", line 743, in next
    raise TimeoutError from None
multiprocessing.context.TimeoutError: Tests didn't finish before timeout (or crashed):
- 'numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_mix_use_masks_tbb'

@stuartarchibald stuartarchibald added this to the Numba 0.54 RC4 milestone Aug 9, 2021
Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I have a few comments.

Comment on lines +227 to +234
if (!tbb::finalize(tsh, std::nothrow))
{
tbb::task_scheduler_handle::release(tsh);
puts("Unable to join threads to shut down before fork(). "
"This can break multithreading in child processes.\n");
}
tsh_was_initialized = false;
need_reinit_after_fork = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that these changes are needed to make the new test pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

# incorrect permissions, compilers that don't work for the above
print(e)
print('BROKEN_COMPILERS')
sys.exit(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this return a different exit code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.run_cmd checks for exit code 0 and raises assertion error if it is not. What I was attempting to do here is differentiate between "This test cannot be run because of some compiler setup problem" and "The compiler setup was fine and the test can be run, but is has failed". The test uses a non-zero exit code which results in the assertion error to signal the latter and a zero exit code with a special token assigned to the BROKEN_COMPILERS variable for the former. Should the BROKEN_COMPILERS condition occur, the test sets a self.skipTest().

Copy link
Member

@sklam sklam Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed in OOB conversion, there are too many ways for the compilation to fail. We will allow the test to be a little fragile (may accidentally skip). Later we will look into ways to make sure the test is not skipped on systems that we have tight control and can guarantee the test must work.

numba/tests/test_parallel_backend.py Show resolved Hide resolved
numba/tests/test_parallel_backend.py Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on reviewer Waiting for reviewer to respond to author Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm and removed 3 - Ready for Review labels Aug 10, 2021
@stuartarchibald
Copy link
Contributor Author

This needs to go through the build farm a couple of times to see if the test time out happens on machines with more resources.

@stuartarchibald
Copy link
Contributor Author

Buildfarm ID: numba_smoketest_cpu_yaml_34.

@sklam sklam mentioned this pull request Aug 17, 2021
Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

The test maybe fragile but it is not worth stressing over accidentally skipping it for this rare problem.

@sklam sklam added BuildFarm Passed For PRs that have been through the buildfarm and passed 5 - Ready to merge Review and testing done, is ready to merge and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Aug 17, 2021
@sklam sklam merged commit 28272e4 into numba:master Aug 17, 2021
sklam added a commit that referenced this pull request Aug 18, 2021
Continuation of PR #7280, fixing lifetime of TBB task_scheduler_handle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants