LLVM 10 #599

esc · 2020-06-17T12:45:40Z

As title, upgrade to LLVM 10.

LLVM 10 removes llvm::make_unique in favor of std::make_unique. However, this requires C++14 and is therefore unsuitable for LLVM 9 that forces -std=c++11. Update the code to use both conditionally. This fixes all issues with LLVM 10.

https://github.com/conda-forge/llvmdev-feedstock/blob/c706309/recipe/patches/intel-D47188-svml-VF.patch https://github.com/conda-forge/llvmdev-feedstock/blob/c706309/recipe/patches/expect-fastmath-entrypoints-in-add-TLI-mappings.ll.patch

As title

As title.

As title

As title.

This is needed as the llvmlite package we are building needs to differentiate itself somehow from the 'other' dev builds.

esc · 2020-06-18T08:44:58Z

Build-farm id: llvmdev_7.

esc · 2020-06-18T09:58:40Z

Note to self:

remember to remove the VERSION_SUFFIX again before merging.

gmarkall · 2020-06-18T13:56:05Z

This works with the CUDA target, with the exception of numba.cuda.tests.nocuda.test_nvvm.TestNvvmWithoutCuda.test_nvvm_memset_fixup, which fails with:

======================================================================
FAIL: test_nvvm_memset_fixup (numba.cuda.tests.nocuda.test_nvvm.TestNvvmWithoutCuda)
Test llvm.memset changes in llvm7.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/nocuda/test_nvvm.py", line 39, in test_nvvm_memset_fixup
    self.assertIn("call void @llvm.memset", original)
AssertionError: 'call void @llvm.memset' not found in 'source_filename = "<string>"\ntarget datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"\ntarget triple = "nvptx64-nvidia-cuda"\n\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__errcode__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidx__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidx__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidy__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidy__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidz__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidz__" = local_unnamed_addr global i32 0\n@"_ZN08NumbaEnv5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE" = common local_unnamed_addr global i8* null\n\n; Function Attrs: nofree norecurse nounwind\ndefine void @"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8* %.1, i8* %.2, i64 %.3, i64 %.4, i32* %.5, i64 %.6, i64 %.7) local_unnamed_addr #0 {\n.9:\n  %.12 = alloca i8*, align 8\n  store i8* null, i8** %.12, align 8\n  %.17 = call i32 @"_ZN5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8** nonnull %.12, i8* %.1, i8* %.2, i64 %.3, i64 %.4, i32* %.5, i64 %.6, i64 %.7)\n  ret void\n}\n\n; Function Attrs: nofree norecurse nounwind\ndefine linkonce_odr i32 @"_ZN5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8** nocapture %.ret, i8* %arg.x.0, i8* %arg.x.1, i64 %arg.x.2, i64 %arg.x.3, i32* %arg.x.4, i64 %arg.x.5.0, i64 %arg.x.6.0) local_unnamed_addr #0 {\nentry:\n  %.74.inv = icmp sgt i64 %arg.x.2, 0\n  br i1 %.74.inv, label %B14, label %B26\n\nB14:                                              ; preds = %entry, %B14\n  %.110.sroa.0.0114 = phi i64 [ %.110.sroa.0.0, %B14 ], [ 0, %entry ]\n  %.120113 = phi i1 [ %.120, %B14 ], [ %.74.inv, %entry ]\n  %.129108112 = phi i64 [ %.129107, %B14 ], [ %arg.x.2, %entry ]\n  %.133110111 = phi i64 [ %.133109, %B14 ], [ 0, %entry ]\n  %.129 = sext i1 %.120113 to i64\n  %.129107 = add nsw i64 %.129108112, %.129\n  %.133 = zext i1 %.120113 to i64\n  %.133109 = add i64 %.133110111, %.133\n  %.198 = icmp slt i64 %.110.sroa.0.0114, 0\n  %.199 = select i1 %.198, i64 %arg.x.5.0, i64 0\n  %.200 = add i64 %.199, %.110.sroa.0.0114\n  %.213 = getelementptr i32, i32* %arg.x.4, i64 %.200\n  store i32 0, i32* %.213, align 4\n  %.120 = icmp sgt i64 %.129107, 0\n  %.110.sroa.0.0 = select i1 %.120, i64 %.133109, i64 0\n  br i1 %.120, label %B14, label %B26\n\nB26:                                              ; preds = %B14, %entry\n  store i8* null, i8** %.ret, align 8\n  ret i32 0\n}\n\nattributes #0 = { nofree norecurse nounwind }\n\n!nvvm.annotations = !{!0}\n!nvvmir.version = !{!1}\n\n!0 = !{void (i8*, i8*, i64, i64, i32*, i64, i64)* @"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE", !"kernel", i32 1}\n!1 = !{i32 1, i32 2, i32 2, i32 0}\n'

It seems that for code like:

def foo(x):
    # Triggers a generation of llvm.memset
    for i in range(x.size):
        x[i] = 0

a memset is no longer generated. The IR used to look like:

{
B0.endif:
  %.74 = icmp sgt i64 %arg.x.2, 0
  br i1 %.74, label %B26.loopexit, label %B26

B26.loopexit:                                     ; preds = %B0.endif
  %arg.x.41 = bitcast i32* %arg.x.4 to i8*
  %0 = shl nuw i64 %arg.x.2, 2 
  call void @llvm.memset.p0i8.i64(i8* align 4 %arg.x.41, i8 0, i64 %0, i1 false)
  br label %B26 

B26:                                              ; preds = %B26.loopexit, %B0.endif
  store i8* null, i8** %.ret, align 8
  ret i32 0 
}

but now it looks like:

entry:
  %.74.inv = icmp sgt i64 %arg.x.2, 0
  br i1 %.74.inv, label %B14, label %B26

B14:                                              ; preds = %entry, %B14
  %.110.sroa.0.0114 = phi i64 [ %.110.sroa.0.0, %B14 ], [ 0, %entry ]
  %.120113 = phi i1 [ %.120, %B14 ], [ %.74.inv, %entry ]
  %.129108112 = phi i64 [ %.129107, %B14 ], [ %arg.x.2, %entry ]
  %.133110111 = phi i64 [ %.133109, %B14 ], [ 0, %entry ]
  %.129 = sext i1 %.120113 to i64
  %.129107 = add nsw i64 %.129108112, %.129
  %.133 = zext i1 %.120113 to i64
  %.133109 = add i64 %.133110111, %.133
  %.198 = icmp slt i64 %.110.sroa.0.0114, 0
  %.199 = select i1 %.198, i64 %arg.x.5.0, i64 0
  %.200 = add i64 %.199, %.110.sroa.0.0114
  %.213 = getelementptr i32, i32* %arg.x.4, i64 %.200
  store i32 0, i32* %.213, align 4
  %.120 = icmp sgt i64 %.129107, 0
  %.110.sroa.0.0 = select i1 %.120, i64 %.133109, i64 0
  br i1 %.120, label %B14, label %B26

B26:                                              ; preds = %B14, %entry
  store i8* null, i8** %.ret, align 8
  ret i32 0
}

i.e. a loop is generated instead of the memset. I'm presently looking into this.

esc · 2020-06-18T15:32:09Z

@gmarkall awesome, thanks for looking into this!

gmarkall · 2020-06-18T16:50:26Z

It seems that LLVM 10 doesn't do loop idiom recognition at an opt level of 1 - If I apply to Numba:

diff --git a/numba/cuda/codegen.py b/numba/cuda/codegen.py
index e201a2101..0d74b1edc 100644
--- a/numba/cuda/codegen.py
+++ b/numba/cuda/codegen.py
@@ -19,7 +19,7 @@ class CUDACodeLibrary(CodeLibrary):
         # Run some lightweight optimization to simplify the module.
         # This seems to workaround a libnvvm compilation bug (see #1341)
         pmb = ll.PassManagerBuilder()
-        pmb.opt_level = 1
+        pmb.opt_level = 3
         pmb.disable_unit_at_a_time = False
         pmb.disable_unroll_loops = True
         pmb.loop_vectorize = False

then the memmove appears in the optimized code again. I'll look into seeing what the right thing to do is.

gmarkall · 2020-06-19T09:38:30Z

I can't see any negative knock-on effects of using the patch above to use optimization level 3 - the Numba and cuDF test suites pass with no issues, and it will probably result in easier to read optimized IR for future debugging purposes. @esc @sklam @stuartarchibald do you have any objections / concerns about just applying the above patch, when llvmlite moves to LLVM 10?

esc · 2020-07-09T08:01:33Z

@gmarkall I have an LLVM 10 enabled version of llvmlite ready and being tested on: numba/numba#5969 -- I think it should be feasible to carry the above patch you suggested.

@gmarkall

As suggested by @gmarkall : numba/llvmlite#599 (comment)

@angloyna

With LLVM 10 the following CMake line no longer appears to work. ``` llvm_map_components_to_libnames(llvm_libs all) ``` ... and this commit is a suitable workaround. Current working hypothesis is that this might have been caused by the change in: llvm/llvm-project@ab41180#diff-cebfde545aa260d20ca6b0cdc3da08daR270 However the docs at https://llvm.org/docs/CMake.html are not very clear about this. And in addition the output of `llvm-config --components` does list `all` as an option. Many thanks to @angloyna for debugging this!!

As title

As title.

esc · 2020-07-20T09:13:53Z

conda-recipes/llvmlite/meta.yaml

@@ -1,4 +1,4 @@
-{% set VERSION_SUFFIX = "" %} # debug version suffix, appended to the version
+{% set VERSION_SUFFIX = "_llvm10" %} # debug version suffix, appended to the version


Note to self: this change must be reverted before merging to master.

esc · 2020-08-03T14:01:26Z

@sklam @stuartarchibald I have implemented the LLVM version split for aarch64 - I think this will need another review now.

This reverts commit 3f66129. Once numba/numba#6030 is merged, it will no longer be necessary to disable the autoupgrade of atomic intrinsics for NVPTX, because LLVM from llvmlite will not be used to optimize the IR before sending it to NVVM.

The page at: https://releases.llvm.org/10.0.1/docs/ Currently (Tue Aug 4 10:48:42 2020 GMT+2) returns a 404 - page not found and this breaks our build as Sphinx can't setup the intersphinx links.

As we are doing a split release for 0.34.0 and `aarch64` will need LLVM 9.0.* -- we allow to build with this version again.

stuartarchibald

Thanks for the patch, have done a first pass review.

README.rst

conda-recipes/llvmlite/meta.yaml

ffi/CMakeLists.txt

llvmlite/tests/test_binding.py

docs/source/admin-guide/install.rst

Fix missing word and missing punctuation. Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

As title.

We only support 10.0.* and 9.0.* from now on.

sklam

LGTM

Final builds should not have a version suffix.

sklam · 2020-08-05T17:54:55Z

the version revert looks good

KOLANICH · 2020-08-05T18:23:24Z

damn, you have taken too long to review my PRs that I am using llvm 12 currently.

esc · 2020-08-05T19:45:23Z

BFID: numba_smoketest_cpu_64 and passed (except for unrelated win-32 failures fixed by numba/numba#6081 and aarch64 Python 3.7 build which is very likely to pass)

esc · 2020-08-05T19:54:57Z

🎉

gmarkall · 2020-08-05T19:55:12Z

💯

061ab39

Use std::make_unique on LLVM 10

05e141b

LLVM 10 removes llvm::make_unique in favor of std::make_unique. However, this requires C++14 and is therefore unsuitable for LLVM 9 that forces -std=c++11. Update the code to use both conditionally. This fixes all issues with LLVM 10.

esc mentioned this pull request Jun 17, 2020

Use std::make_unique on LLVM 10 #596

Closed

esc added 2 commits June 17, 2020 14:49

grab updated SVML patch and new fastmath patch

24ef8f1

https://github.com/conda-forge/llvmdev-feedstock/blob/c706309/recipe/patches/intel-D47188-svml-VF.patch https://github.com/conda-forge/llvmdev-feedstock/blob/c706309/recipe/patches/expect-fastmath-entrypoints-in-add-TLI-mappings.ll.patch

updte to LLVM 10 in recipe and tests

3c8b513

As title

esc force-pushed the llvm_10 branch from 544d523 to 3c8b513 Compare June 17, 2020 12:49

esc added 7 commits June 17, 2020 16:53

update documented version numbers in README.rst

52b0cd0

As title.

update appveyor.yml comment to include LLVM 10

d9dde6c

As title.

update to llvmdev 10.0* on Azure pipelines

2122ed0

As title.

update to llvmdev 10.0 in buildscripts

4c998c4

As title

update llvmdev_manylinux1 conda-recipe

a7371f0

As title

update Sphinx documentation for correct LLVM version

1dc149c

As title.

temporarily set the VERSION_SUFFIX for CI

3fd75ee

This is needed as the llvmlite package we are building needs to differentiate itself somehow from the 'other' dev builds.

specify the correct compiler in the build requirements

1d2e76c

esc added a commit to esc/numba that referenced this pull request Jul 9, 2020

clamp pmd.opt_level to 3

703c4fe

As suggested by @gmarkall : numba/llvmlite#599 (comment)

esc force-pushed the llvm_10 branch from 8679237 to 5d15e57 Compare July 17, 2020 12:42

esc added 3 commits July 20, 2020 10:28

update the comment and links for the svml patch

c185fcb

add a note about broken LLVM cmake setup

91d6414

As title

update compatability matrix

1627c1d

As title.

esc commented Jul 20, 2020

View reviewed changes

esc requested a review from stuartarchibald July 20, 2020 09:14

This was referenced Jul 20, 2020

Release 0.51.0RC1 / 0.34.0RC1 Checklist numba/numba#6003

Closed

LLVM 10 testing status #583

Closed

stuartarchibald added this to the Version 0.34.0 milestone Aug 4, 2020

gmarkall and others added 3 commits August 4, 2020 10:41

Revert "Fix CUDA with LLVM9"

0fc90da

This reverts commit 3f66129. Once numba/numba#6030 is merged, it will no longer be necessary to disable the autoupgrade of atomic intrinsics for NVPTX, because LLVM from llvmlite will not be used to optimize the IR before sending it to NVVM.

revert LLVM documentation link to 10.0.0

2e65c6e

The page at: https://releases.llvm.org/10.0.1/docs/ Currently (Tue Aug 4 10:48:42 2020 GMT+2) returns a 404 - page not found and this breaks our build as Sphinx can't setup the intersphinx links.

accept LLVM 9.0 once again

324533e

As we are doing a split release for 0.34.0 and `aarch64` will need LLVM 9.0.* -- we allow to build with this version again.

stuartarchibald reviewed Aug 4, 2020

View reviewed changes

Apply suggestions from code review

75a1846

Fix missing word and missing punctuation. Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

stuartarchibald mentioned this pull request Aug 5, 2020

CUDA: Prevent auto-upgrade of atomic intrinsics numba/numba#6080

Merged

esc added 3 commits August 5, 2020 15:24

adding link to LLVM bug report

5c86f47

adding link to potential LLVM Cmake bug

6191a0b

As title.

stop testing older LLVMs

b700372

We only support 10.0.* and 9.0.* from now on.

esc added Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 3 - Ready for Review labels Aug 5, 2020

sklam approved these changes Aug 5, 2020

View reviewed changes

This was referenced Aug 5, 2020

Improved version checks #562

Closed

LLVM 10 and 11 support #563

Closed

getting ready to release 0.34.0 remove VERSION_SUFFIX

8857836

Final builds should not have a version suffix.

esc added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Aug 5, 2020

sklam merged commit 061ab39 into numba:master Aug 5, 2020

esc added 5 - Ready to merge and removed 3 - Ready for Review labels Aug 5, 2020

sklam added a commit to sklam/llvmlite that referenced this pull request Aug 5, 2020

Revert "LLVM 10 (numba#599)"

1910c16

061ab39

ghost mentioned this pull request Jan 9, 2021

Officially support llvm 10 #507

Closed

esc deleted the llvm_10 branch February 26, 2021 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM 10 #599

LLVM 10 #599

esc commented Jun 17, 2020

esc commented Jun 18, 2020

esc commented Jun 18, 2020 •

edited

gmarkall commented Jun 18, 2020

esc commented Jun 18, 2020

gmarkall commented Jun 18, 2020

gmarkall commented Jun 19, 2020

esc commented Jul 9, 2020

esc Jul 20, 2020

esc commented Aug 3, 2020

stuartarchibald left a comment

sklam left a comment

sklam commented Aug 5, 2020

KOLANICH commented Aug 5, 2020 •

edited

esc commented Aug 5, 2020

esc commented Aug 5, 2020

gmarkall commented Aug 5, 2020

		@@ -1,4 +1,4 @@
		{% set VERSION_SUFFIX = "" %} # debug version suffix, appended to the version
		{% set VERSION_SUFFIX = "_llvm10" %} # debug version suffix, appended to the version

LLVM 10 #599

LLVM 10 #599

Conversation

esc commented Jun 17, 2020

esc commented Jun 18, 2020

esc commented Jun 18, 2020 • edited

gmarkall commented Jun 18, 2020

esc commented Jun 18, 2020

gmarkall commented Jun 18, 2020

gmarkall commented Jun 19, 2020

esc commented Jul 9, 2020

esc Jul 20, 2020

Choose a reason for hiding this comment

esc commented Aug 3, 2020

stuartarchibald left a comment

Choose a reason for hiding this comment

sklam left a comment

Choose a reason for hiding this comment

sklam commented Aug 5, 2020

KOLANICH commented Aug 5, 2020 • edited

esc commented Aug 5, 2020

esc commented Aug 5, 2020

gmarkall commented Aug 5, 2020

esc commented Jun 18, 2020 •

edited

KOLANICH commented Aug 5, 2020 •

edited