Skip to content

CUDA: Remove items deprecated in 0.53 + simulator test fixes #6840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 24, 2021

Conversation

gmarkall
Copy link
Member

This PR removes items deprecated in 0.53 and scheduled for removal in 0.54. These include:

  • The compute_capabilty kwarg to inspect methods.
  • The ability to index dicts returns by inspect method using a tuple of compute capability and argument types.
  • The return of a single item (instead of a dict) from the inspection methods and ptx property of specialized dispatchers.
  • The definition and definitions property of dispatchers.

Whilst working on this I discovered a couple of other issues in the simulator:

  • The testing of cooperative groups was accidentally skipped because skip_if_cudadevrt_missing always skipped tests on the simulator - this is now resolved.
  • Some tests inspected some fake PTX from the simulator - this makes no sense, so the ptx property of fake kernels has been removed, and the tests which inspected it are now skipped on cudasim.

Some other remarks:

  • Various tests made use of the deprecated functionality - these have now been updated to use the current interfaces.
  • Relevant documentation (the Cooperative groups example and the deprecation notices) is updated.
  • Tests of the deprecation notices have been removed accordingly.

Inspect methods now always return a dict if a signature is not
specified.
This means that the useless ``.ptx`` property of fake kernels can be
removed.
This should never skip on the simulator, it should always assume a
condition as if cudadevrt were present.
Inspecting PTX makes no sense as a test on the simulator.
@gmarkall gmarkall added CUDA CUDA related issue/PR 2 - In Progress labels Mar 18, 2021
@gmarkall gmarkall added this to the Numba 0.54 RC milestone Mar 18, 2021
@gmarkall gmarkall added 3 - Ready for Review Effort - medium Medium size effort needed and removed 2 - In Progress labels Mar 18, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, great to see this cleaned up. There's a few minor things to look at and once resolved this will be good to merge. Thanks again!

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 9, 2021
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 13, 2021
@gmarkall
Copy link
Member Author

@stuartarchibald Many thanks for the review - comments should be addressed now.

@gmarkall
Copy link
Member Author

Is the fail a Frosty Thompson?

INFO: Running slice of discovered tests: (7,None,22)
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
skipped HSA tests
skipped HSA tests
Parallel: 402. Serial: 62
s....................................................................................................................................x...............cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
..................................s................ssssssssssssssss..........F...............................................................................................................................................................................sss..ss..........sssssss.s..s....sss..........s............sss
======================================================================
FAIL: test_fork_from_non_main_thread (numba.tests.test_parallel_backend.TestTBBSpecificIssues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vsts/work/1/s/numba/tests/test_parallel_backend.py", line 921, in test_fork_from_non_main_thread
    self.assertIn(msg_head, err)
AssertionError: 'Attempted to fork from a non-main thread, the TBB library' not found in ''

----------------------------------------------------------------------
Ran 464 tests in 159.435s

FAILED (failures=1, skipped=39, expected failures=1)

@stuartarchibald
Copy link
Contributor

Is the fail a Frosty Thompson?

No, that's the fork from non-main thread problem reappearing, would guess TBB 2021 related. Frosty Thompson appears as a live lock.

@stuartarchibald
Copy link
Contributor

Is the fail a Frosty Thompson?

No, that's the fork from non-main thread problem reappearing, would guess TBB 2021 related. Frosty Thompson appears as a live lock.

Fixed in #6966

@stuartarchibald
Copy link
Contributor

Thanks for the fixes, please could you resolve the conflict against mainline, this can then have a build farm run?

@stuartarchibald stuartarchibald removed the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Apr 30, 2021
@gmarkall
Copy link
Member Author

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@gmarkall
Copy link
Member Author

(Re-running due to conda 500 / 530 errors in the prior test run)

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and fixes.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_61.

@stuartarchibald stuartarchibald added Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 19, 2021
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_61.

This failed. All fails are on windows.

  • py3.8, np1.17, cuda 9.2
  • py3.8, np1.17, cuda 10.0
  • py3.9, np1.19, cuda 10.0
  • py3.9, np1.19, cuda 9.2

Failure is the same on each:

<snip>
FAIL: test_const_record_optimization (numba.cuda.tests.cudapy.test_constmem.TestCudaConstantMemory)
-----------------------------------------------------------------------------
Trace back:
"numba\cuda\tests\test_constmem.py", line 200, in test_const_record_optimization
    self.assertGreaterEqual(u8_load_count, expected_load_count,
AssertionError: 11 not greater than or equal to 12: load record values as individual bytes
<snip>

@stuartarchibald stuartarchibald added the 4 - Waiting on author Waiting for author to respond to review label May 19, 2021
These tests check for very specific patterns in the generated PTX that
are unstable across toolkit versions, platforms, and other minor
changes. Since these frequently need adjustment to match the generated
PTX, and have never revealed an underlying issue, is seems more sensible
to remove them instead of continually fixing them up in the presence of
variations.

Note that the latest issue was detected in the PR numba#6840 buildfarm run,
with failures in test_const_record_optimization on Windows for CUDA 9.2
and 10.0. These failures arose because a change to compile the kernel
with an explicit signature instead of specializing the kernel for given
arguments changed the types the kernel was being compiled for on Windows
- the default int on Windows is different to the default int on Linux.
So this test wasn't even testing the same thing on Windows as on Linux,
and this went unnoticed for a very long time!
@gmarkall
Copy link
Member Author

@stuartarchibald Thanks for the report. I've removed the failing test and a similar one, for the rationale outlined in the commit message - in summary I think these tests are a bit unhelpful / nonsensical and have adjusted them several times over the course of various PRs. Explanation from the commit:

These tests check for very specific patterns in the generated PTX that
are unstable across toolkit versions, platforms, and other minor
changes. Since these frequently need adjustment to match the generated
PTX, and have never revealed an underlying issue, is seems more sensible
to remove them instead of continually fixing them up in the presence of
variations.

Note that the latest issue was detected in the PR numba#6840 buildfarm run,
with failures in test_const_record_optimization on Windows for CUDA 9.2
and 10.0. These failures arose because a change to compile the kernel
with an explicit signature instead of specializing the kernel for given
arguments changed the types the kernel was being compiled for on Windows
- the default int on Windows is different to the default int on Linux.
So this test wasn't even testing the same thing on Windows as on Linux,
and this went unnoticed for a very long time!

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels May 20, 2021
@stuartarchibald
Copy link
Contributor

@stuartarchibald Thanks for the report. I've removed the failing test and a similar one, for the rationale outlined in the commit message - in summary I think these tests are a bit unhelpful / nonsensical and have adjusted them several times over the course of various PRs. Explanation from the commit:

These tests check for very specific patterns in the generated PTX that
are unstable across toolkit versions, platforms, and other minor
changes. Since these frequently need adjustment to match the generated
PTX, and have never revealed an underlying issue, is seems more sensible
to remove them instead of continually fixing them up in the presence of
variations.

Note that the latest issue was detected in the PR numba#6840 buildfarm run,
with failures in test_const_record_optimization on Windows for CUDA 9.2
and 10.0. These failures arose because a change to compile the kernel
with an explicit signature instead of specializing the kernel for given
arguments changed the types the kernel was being compiled for on Windows
- the default int on Windows is different to the default int on Linux.
So this test wasn't even testing the same thing on Windows as on Linux,
and this went unnoticed for a very long time!

Thanks for the explanation @gmarkall. I agree, if these tests are not really adding anything but very susceptible to external/environmental changes then removing them seems to be the best solution.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_63.

@stuartarchibald stuartarchibald removed the 4 - Waiting on reviewer Waiting for reviewer to respond to author label May 20, 2021
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_63.

This failed due to the issue in #7052. Once that's merged will re-run the build.

@gmarkall
Copy link
Member Author

@stuartarchibald / @esc Could this have another run now please?

@stuartarchibald
Copy link
Contributor

@stuartarchibald / @esc Could this have another run now please?

Buildfarm ID: numba_smoketest_cuda_yaml_67.

@stuartarchibald
Copy link
Contributor

@stuartarchibald / @esc Could this have another run now please?

Buildfarm ID: numba_smoketest_cuda_yaml_67.

Passed.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels May 24, 2021
@sklam sklam merged commit 24cb69d into numba:master May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants