Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 10 #599

Merged
merged 29 commits into from Aug 5, 2020
Merged

LLVM 10 #599

merged 29 commits into from Aug 5, 2020

Conversation

esc
Copy link
Member

@esc esc commented Jun 17, 2020

As title, upgrade to LLVM 10.

LLVM 10 removes llvm::make_unique in favor of std::make_unique.
However, this requires C++14 and is therefore unsuitable for LLVM 9
that forces -std=c++11.  Update the code to use both conditionally.
This fixes all issues with LLVM 10.
@esc
Copy link
Member Author

esc commented Jun 18, 2020

Build-farm id: llvmdev_7.

@esc
Copy link
Member Author

esc commented Jun 18, 2020

Note to self:

  • remember to remove the VERSION_SUFFIX again before merging.

@gmarkall
Copy link
Member

This works with the CUDA target, with the exception of numba.cuda.tests.nocuda.test_nvvm.TestNvvmWithoutCuda.test_nvvm_memset_fixup, which fails with:

======================================================================
FAIL: test_nvvm_memset_fixup (numba.cuda.tests.nocuda.test_nvvm.TestNvvmWithoutCuda)
Test llvm.memset changes in llvm7.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/nocuda/test_nvvm.py", line 39, in test_nvvm_memset_fixup
    self.assertIn("call void @llvm.memset", original)
AssertionError: 'call void @llvm.memset' not found in 'source_filename = "<string>"\ntarget datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"\ntarget triple = "nvptx64-nvidia-cuda"\n\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__errcode__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidx__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidx__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidy__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidy__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__tidz__" = local_unnamed_addr global i32 0\n@"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE__ctaidz__" = local_unnamed_addr global i32 0\n@"_ZN08NumbaEnv5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE" = common local_unnamed_addr global i8* null\n\n; Function Attrs: nofree norecurse nounwind\ndefine void @"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8* %.1, i8* %.2, i64 %.3, i64 %.4, i32* %.5, i64 %.6, i64 %.7) local_unnamed_addr #0 {\n.9:\n  %.12 = alloca i8*, align 8\n  store i8* null, i8** %.12, align 8\n  %.17 = call i32 @"_ZN5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8** nonnull %.12, i8* %.1, i8* %.2, i64 %.3, i64 %.4, i32* %.5, i64 %.6, i64 %.7)\n  ret void\n}\n\n; Function Attrs: nofree norecurse nounwind\ndefine linkonce_odr i32 @"_ZN5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE"(i8** nocapture %.ret, i8* %arg.x.0, i8* %arg.x.1, i64 %arg.x.2, i64 %arg.x.3, i32* %arg.x.4, i64 %arg.x.5.0, i64 %arg.x.6.0) local_unnamed_addr #0 {\nentry:\n  %.74.inv = icmp sgt i64 %arg.x.2, 0\n  br i1 %.74.inv, label %B14, label %B26\n\nB14:                                              ; preds = %entry, %B14\n  %.110.sroa.0.0114 = phi i64 [ %.110.sroa.0.0, %B14 ], [ 0, %entry ]\n  %.120113 = phi i1 [ %.120, %B14 ], [ %.74.inv, %entry ]\n  %.129108112 = phi i64 [ %.129107, %B14 ], [ %arg.x.2, %entry ]\n  %.133110111 = phi i64 [ %.133109, %B14 ], [ 0, %entry ]\n  %.129 = sext i1 %.120113 to i64\n  %.129107 = add nsw i64 %.129108112, %.129\n  %.133 = zext i1 %.120113 to i64\n  %.133109 = add i64 %.133110111, %.133\n  %.198 = icmp slt i64 %.110.sroa.0.0114, 0\n  %.199 = select i1 %.198, i64 %arg.x.5.0, i64 0\n  %.200 = add i64 %.199, %.110.sroa.0.0114\n  %.213 = getelementptr i32, i32* %arg.x.4, i64 %.200\n  store i32 0, i32* %.213, align 4\n  %.120 = icmp sgt i64 %.129107, 0\n  %.110.sroa.0.0 = select i1 %.120, i64 %.133109, i64 0\n  br i1 %.120, label %B14, label %B26\n\nB26:                                              ; preds = %B14, %entry\n  store i8* null, i8** %.ret, align 8\n  ret i32 0\n}\n\nattributes #0 = { nofree norecurse nounwind }\n\n!nvvm.annotations = !{!0}\n!nvvmir.version = !{!1}\n\n!0 = !{void (i8*, i8*, i64, i64, i32*, i64, i64)* @"_ZN6cudapy5numba4cuda5tests6nocuda9test_nvvm19TestNvvmWithoutCuda22test_nvvm_memset_fixup12$3clocals$3e7foo$241E5ArrayIiLi1E1C7mutable7alignedE", !"kernel", i32 1}\n!1 = !{i32 1, i32 2, i32 2, i32 0}\n'

It seems that for code like:

def foo(x):
    # Triggers a generation of llvm.memset
    for i in range(x.size):
        x[i] = 0

a memset is no longer generated. The IR used to look like:

{
B0.endif:
  %.74 = icmp sgt i64 %arg.x.2, 0
  br i1 %.74, label %B26.loopexit, label %B26

B26.loopexit:                                     ; preds = %B0.endif
  %arg.x.41 = bitcast i32* %arg.x.4 to i8*
  %0 = shl nuw i64 %arg.x.2, 2 
  call void @llvm.memset.p0i8.i64(i8* align 4 %arg.x.41, i8 0, i64 %0, i1 false)
  br label %B26 

B26:                                              ; preds = %B26.loopexit, %B0.endif
  store i8* null, i8** %.ret, align 8
  ret i32 0 
}

but now it looks like:

entry:
  %.74.inv = icmp sgt i64 %arg.x.2, 0
  br i1 %.74.inv, label %B14, label %B26

B14:                                              ; preds = %entry, %B14
  %.110.sroa.0.0114 = phi i64 [ %.110.sroa.0.0, %B14 ], [ 0, %entry ]
  %.120113 = phi i1 [ %.120, %B14 ], [ %.74.inv, %entry ]
  %.129108112 = phi i64 [ %.129107, %B14 ], [ %arg.x.2, %entry ]
  %.133110111 = phi i64 [ %.133109, %B14 ], [ 0, %entry ]
  %.129 = sext i1 %.120113 to i64
  %.129107 = add nsw i64 %.129108112, %.129
  %.133 = zext i1 %.120113 to i64
  %.133109 = add i64 %.133110111, %.133
  %.198 = icmp slt i64 %.110.sroa.0.0114, 0
  %.199 = select i1 %.198, i64 %arg.x.5.0, i64 0
  %.200 = add i64 %.199, %.110.sroa.0.0114
  %.213 = getelementptr i32, i32* %arg.x.4, i64 %.200
  store i32 0, i32* %.213, align 4
  %.120 = icmp sgt i64 %.129107, 0
  %.110.sroa.0.0 = select i1 %.120, i64 %.133109, i64 0
  br i1 %.120, label %B14, label %B26

B26:                                              ; preds = %B14, %entry
  store i8* null, i8** %.ret, align 8
  ret i32 0
}

i.e. a loop is generated instead of the memset. I'm presently looking into this.

@esc
Copy link
Member Author

esc commented Jun 18, 2020

@gmarkall awesome, thanks for looking into this!

@gmarkall
Copy link
Member

It seems that LLVM 10 doesn't do loop idiom recognition at an opt level of 1 - If I apply to Numba:

diff --git a/numba/cuda/codegen.py b/numba/cuda/codegen.py
index e201a2101..0d74b1edc 100644
--- a/numba/cuda/codegen.py
+++ b/numba/cuda/codegen.py
@@ -19,7 +19,7 @@ class CUDACodeLibrary(CodeLibrary):
         # Run some lightweight optimization to simplify the module.
         # This seems to workaround a libnvvm compilation bug (see #1341)
         pmb = ll.PassManagerBuilder()
-        pmb.opt_level = 1
+        pmb.opt_level = 3
         pmb.disable_unit_at_a_time = False
         pmb.disable_unroll_loops = True
         pmb.loop_vectorize = False

then the memmove appears in the optimized code again. I'll look into seeing what the right thing to do is.

@gmarkall
Copy link
Member

I can't see any negative knock-on effects of using the patch above to use optimization level 3 - the Numba and cuDF test suites pass with no issues, and it will probably result in easier to read optimized IR for future debugging purposes. @esc @sklam @stuartarchibald do you have any objections / concerns about just applying the above patch, when llvmlite moves to LLVM 10?

@esc
Copy link
Member Author

esc commented Jul 9, 2020

@gmarkall I have an LLVM 10 enabled version of llvmlite ready and being tested on: numba/numba#5969 -- I think it should be feasible to carry the above patch you suggested.

esc added a commit to esc/numba that referenced this pull request Jul 9, 2020
With LLVM 10 the following CMake line no longer appears to work.

```
llvm_map_components_to_libnames(llvm_libs all)
```

... and this commit is a suitable workaround.

Current working hypothesis is that this might have been caused by the
change in:

llvm/llvm-project@ab41180#diff-cebfde545aa260d20ca6b0cdc3da08daR270

However the docs at https://llvm.org/docs/CMake.html are not very clear
about this. And in addition the output of `llvm-config --components`
does list `all` as an option.

Many thanks to @angloyna for debugging this!!
@@ -1,4 +1,4 @@
{% set VERSION_SUFFIX = "" %} # debug version suffix, appended to the version
{% set VERSION_SUFFIX = "_llvm10" %} # debug version suffix, appended to the version
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this change must be reverted before merging to master.

@esc
Copy link
Member Author

esc commented Aug 3, 2020

@sklam @stuartarchibald I have implemented the LLVM version split for aarch64 - I think this will need another review now.

@stuartarchibald stuartarchibald added this to the Version 0.34.0 milestone Aug 4, 2020
gmarkall and others added 3 commits August 4, 2020 10:41
This reverts commit 3f66129.

Once numba/numba#6030 is merged, it will no
longer be necessary to disable the autoupgrade of atomic intrinsics for
NVPTX, because LLVM from llvmlite will not be used to optimize the IR
before sending it to NVVM.
The page at:

https://releases.llvm.org/10.0.1/docs/

Currently (Tue Aug  4 10:48:42 2020 GMT+2) returns a 404 - page not
found and this breaks our build as Sphinx can't setup the intersphinx
links.
As we are doing a split release for 0.34.0 and `aarch64` will need LLVM
9.0.* -- we allow to build with this version again.
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, have done a first pass review.

README.rst Outdated Show resolved Hide resolved
conda-recipes/llvmlite/meta.yaml Outdated Show resolved Hide resolved
ffi/CMakeLists.txt Outdated Show resolved Hide resolved
llvmlite/tests/test_binding.py Outdated Show resolved Hide resolved
docs/source/admin-guide/install.rst Outdated Show resolved Hide resolved
Fix missing word and missing punctuation.

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
@esc esc added Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 3 - Ready for Review labels Aug 5, 2020
Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This was referenced Aug 5, 2020
Final builds should not have a version suffix.
@sklam
Copy link
Member

sklam commented Aug 5, 2020

the version revert looks good

@KOLANICH
Copy link

KOLANICH commented Aug 5, 2020

damn, you have taken too long to review my PRs that I am using llvm 12 currently.

@esc
Copy link
Member Author

esc commented Aug 5, 2020

BFID: numba_smoketest_cpu_64 and passed (except for unrelated win-32 failures fixed by numba/numba#6081 and aarch64 Python 3.7 build which is very likely to pass)

@esc esc added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Aug 5, 2020
@sklam sklam merged commit 061ab39 into numba:master Aug 5, 2020
@esc
Copy link
Member Author

esc commented Aug 5, 2020

🎉

@gmarkall
Copy link
Member

gmarkall commented Aug 5, 2020

💯

sklam added a commit to sklam/llvmlite that referenced this pull request Aug 5, 2020
@ghost ghost mentioned this pull request Jan 9, 2021
@esc esc deleted the llvm_10 branch February 26, 2021 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants