Skip to content

[SYCLomatic][Pulldown] Fix pulldown conflict from sycl to SYCLomatic#2127

Merged
zhimingwang36 merged 628 commits intooneapi-src:SYCLomaticfrom
tomflinda:SYCLomatic_pulldown_new
Jul 4, 2024
Merged

[SYCLomatic][Pulldown] Fix pulldown conflict from sycl to SYCLomatic#2127
zhimingwang36 merged 628 commits intooneapi-src:SYCLomaticfrom
tomflinda:SYCLomatic_pulldown_new

Conversation

@tomflinda
Copy link
Copy Markdown
Contributor

Signed-off-by: chenwei.sun chenwei.sun@intel.com

josel-amd and others added 30 commits May 28, 2024 14:36
…inearize (#92370)

Building on top of
[#88204](llvm/llvm-project#88204), this PR adds
support for converting `vector.insert` into an equivalent
`vector.shuffle` operation that operates on linearized (1-D) vectors.
The pass constructor can be generated automatically.

This pass is module-level and then runs on all relevant intrinsic
operations inside of the module, no matter what top level operation they
are inside of.
…nd in dropUnitDims pass. (#93317)

`mlir-opt --linalg-fold-unit-extent-dims` pass on the following IR

```
#map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 + d4, d2 + d5, d6)>
#map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>
module {
  func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
    %cst = arith.constant dense<1.000000e+00> : tensor<1x1x1x1xf32>
    %0 = tensor.empty(%arg1) : tensor<?x1x61x1xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %2 = arith.mulf %in, %in_0 : f32
      %3 = arith.addf %out, %2 : f32
      linalg.yield %3 : f32
    } -> tensor<?x1x61x1xf32>
    return %1 : tensor<?x1x61x1xf32>
  }
}
```

produces an incorrect tensor.expand_shape operation:

```
error: 'tensor.expand_shape' op expected dimension 0 of collapsed type to be dynamic since one or more of the corresponding dimensions in the expanded type is dynamic
    %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
         ^
/mathworks/devel/sandbox/sayans/geckWorks/g3294570/repro.mlir:8:10: note: see current operation: %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
// -----// IR Dump After LinalgFoldUnitExtentDimsPass Failed (linalg-fold-unit-extent-dims) //----- //
#map = affine_map<(d0) -> (0, d0)>
#map1 = affine_map<(d0) -> ()>
#map2 = affine_map<(d0) -> (d0)>
"builtin.module"() ({
  "func.func"() <{function_type = (tensor<1x?x?x1xf32>, index) -> tensor<?x1x61x1xf32>, sym_name = "main"}> ({
  ^bb0(%arg0: tensor<1x?x?x1xf32>, %arg1: index):
    %0 = "arith.constant"() <{value = dense<1.000000e+00> : tensor<f32>}> : () -> tensor<f32>
    %1 = "tensor.collapse_shape"(%arg0) <{reassociation = [[0, 1], [2, 3]]}> : (tensor<1x?x?x1xf32>) -> tensor<?x?xf32>
    %2 = "tensor.empty"() : () -> tensor<61xf32>
    %3 = "tensor.empty"() : () -> tensor<61xf32>
    %4 = "linalg.generic"(%1, %0, %2, %3) <{indexing_maps = [#map, #map1, #map2, #map2], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 3, 1>}> ({
    ^bb0(%arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32):
      %6 = "arith.mulf"(%arg2, %arg3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      %7 = "arith.addf"(%arg4, %6) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      "linalg.yield"(%7) : (f32) -> ()
    }) : (tensor<?x?xf32>, tensor<f32>, tensor<61xf32>, tensor<61xf32>) -> tensor<61xf32>
    %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
    "func.return"(%5) : (tensor<?x1x61x1xf32>) -> ()
  }) : () -> ()
}) : () -> ()
```

The reason of this is because the dimension `d0` is determined to be an
unit-dim that can be dropped based on the dimensions of operand `arg0`
to `linalg.generic`. Later on when iterating over operand `outs` the
dimension `d0` is determined to be an unit-dim even though the shape
corresponding to it is `Shape::kDynamic`. For the `linalg.generic` to be
valid `d0` of `outs` does need to be `1` but that isn't properly
processed in the current implementation and the dimension is dropped
resulting in `outs` operand to be `tensor<61xf32>` in the example.

The fix is to also check that the dimension shape is actually `1` before
dropping the dimension. The IR after the fix is:

```
#map = affine_map<()[s0, s1] -> (s0 * s1)>
#map1 = affine_map<(d0) -> (0, d0)>
#map2 = affine_map<(d0) -> ()>
module {
  func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %cst = arith.constant dense<1.000000e+00> : tensor<f32>
    %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2, 3]] : tensor<1x?x?x1xf32> into tensor<?x?xf32>
    %0 = tensor.empty(%arg1) : tensor<?x61xf32>
    %1 = affine.apply #map()[%arg1, %c1]
    %2 = tensor.empty(%1) : tensor<?x61xf32>
    %3 = linalg.generic {indexing_maps = [#map1, #map2, #map1, #map1], iterator_types = ["parallel"]} ins(%collapsed, %cst, %0 : tensor<?x?xf32>, tensor<f32>, tensor<?x61xf32>) outs(%2 : tensor<?x61xf32>) {
    ^bb0(%in: f32, %in_0: f32, %in_1: f32, %out: f32):
      %4 = arith.mulf %in, %in_0 : f32
      %5 = arith.addf %in_1, %4 : f32
      linalg.yield %5 : f32
    } -> tensor<?x61xf32>
    %expanded = tensor.expand_shape %3 [[0, 1], [2, 3]] output_shape [%c0, 1, 61, 1] : tensor<?x61xf32> into tensor<?x1x61x1xf32>
    return %expanded : tensor<?x1x61x1xf32>
  }
}
```
Clang has some unwritten rules about diagnostic wording regarding things
like punctuation and capitalization. This patch documents those rules
and adds some tablegen support for checking diagnostics follow the
rules.

Specifically: tablegen now checks that a diagnostic does not start with
a capital letter or end with punctuation, except for the usual
exceptions like proper nouns or ending with a question.

Now that the code base is clean of such issues, the diagnostics are
emitted as an error rather than a warning to ensure that failure to
follow these rules is either addressed by an author, or a new exception
is added to the checking logic.
Fixes #90941.
Add support for ``[[msvc::noinline]]`` attribute, which is actually an
alias of ``[[clang::noinline]]``.
…le::makeUniqueName()`. (#89057)

E.g. during inlining new symbol name can be duplicated and then
`ValueSymbolTable::makeUniqueName()` will add unique suffix, exceeding
the `non-global-value-max-name-size` restriction.

Also fixed `unsigned` type of the option to `int` since `ValueSymbolTable`'
constructor can use `-1` value that means unrestricted name size.
"const" being removed in this patch prevents the move semantics from
being used in:

  AI.CallStack = Callback(IndexedAI.CSId);

With this patch on an indexed MemProf Version 2 profile, the cycle
count and instruction count go down by 13.3% and 26.3%, respectively,
with "llvm-profdata show" modified to deserialize all MemProfRecords.
There was existing support for constant folding a `linalg.generic` that
was actually a transpose. This commit adds support for the named op,
`linalg.transpose`, as well by making use of the `LinalgOp` interface.
This change updates the dataLayout string to ensure alignment with the
latest LLVM TargetMachine configuration. The aim is to
maintain consistency and prevent potential compilation issues related to
memory address space handling.
fir.box_rank codegen was invalid, it was assuming the rank field in the
descriptor was an i32. This is not correct. Do not hard code the type,
use the named position to find the type, and convert as needed in the
patterns.
  CONFLICT (content): Merge conflict in clang/test/Driver/mlong-double-128.c
  CONFLICT (content): Merge conflict in clang/test/Driver/mlong-double-64.c
Rename things in a couple of places to make the code a bit clearer.
…ing when parsing declaration DIEs. (#92328)

This reapplies
llvm/llvm-project@9a7262c
(#90663) and added llvm/llvm-project#91808 as a
fix.

It was causing tests on macos to fail because
`SymbolFileDWARF::GetForwardDeclCompilerTypeToDIE` returned the map
owned by this symol file. When there were two symbol files, two
different maps were created for caching from compiler type to DIE even
if they are for the same module. The solution is to do the same as
`SymbolFileDWARF::GetUniqueDWARFASTTypeMap`: inquery
SymbolFileDWARFDebugMap first to get the shared underlying SymbolFile so
the map is shared among multiple SymbolFileDWARF.
…ounding ops. (#93356)

The elements that aren't sNans need to get passed through this fadd
instruction unchanged. With the agnostic mask policy they might be
forced to all ones.
Summary:
There was a bug here where we would initialize the plugin multiple times
when there were multiple images. Fix it by putting the `is_initliaized`
check later.
This adds 
- `mlir::tosa::populateTosaToLinalgTypeConversion` which converts
tensors of unsigned integers into tensors of signless integers
- modifies the `tosa.reshape` lowering in TosaToTensor to use the type
converter correctly

I choose to implement the type converter in
`mlir/Conversion/TosaToLinalg/TosaToLinalg.h` instead of
`mlir/Conversion/TosaToTensor/TosaToTensor.h` because I need the same
type converter in the TosaToLinalg lowerings (future PR).
Alternatively, I could duplicate the type converter so it exists both in
TosaToLinalg and TosaToTensor. Let me know if you prefer that.
This patch fixes:

  clang/unittests/Interpreter/IncrementalProcessingTest.cpp:39:13:
  error: unused function 'HostSupportsJit' [-Werror,-Wunused-function]
These interfaces are LLVM interfaces, not Clang ones; but this worked
because of LLVM.h adding the interfaces to the clang namespace.
Skip explicit this check in non-valid scopes due to `null` type in
lambdas with invalid captures or incomplete parameter lists during
parsing


Fixes #91536
Steffen Larsen and others added 20 commits July 2, 2024 10:53
This should address the post-commit failure introduced in
intel/llvm#14353

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
ESIMD kernel doesn't support noinline functions, so we can't support
sanitizer for it now.

We can't check if the kernel is ESIMD kernel at Unified Runtime, so we
have to disable ASan completely when it found ESIMD kernel in device
image.
Changes the semantics of
[sycl_ext_oneapi_enqueue_barrier](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_enqueue_barrier.asciidoc)
commands recorded from a queue into a SYCL-Graph.

The current semantics are that a barrier without a wait-list will depend
on all leaf nodes in the graph, and be a dependency of any subsequent
node added to the graph.

After discussion with users, this change updates the behavior of
barriers to only depend on leaf nodes from the same recording queue, and
only commands recorded from the same queue will have a dependency on the
barrier.
…(#14370)

Host device support is deprecated long time ago. Although our internal
host task & host accessor implementation was still using it.
This change eliminates it and remove possibility to create host
queue/context and device. This brings the following changes:

- Commands & Events could not guarantee queried Context != nullptr and
Queue!= nullptr since for host task stuff no device queue/context is
involved. For host task we have submitted queue instance stored in event
to be able to report exceptions to user and to be able to properly
handle dependencies. Submitted queue for host task is guaranteed to be
not null.
- Connection command for cross context dependencies is now attached to
the queue of new command (dependency for which is being analyzed).
Previously it was also related to host queue only. No perf impact is
expected.
- Stream flush command is now submitted to the same queue as
corresponding kernel (previously it was submitted to the host queue).
This could bring negative perf impact for stream usage with in-order
queue but stream is not perf oriented feature.
ABI breaking changes to remove is_host methods and some SYCL_EXTERN
stuff will be submitted separately.

---------

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
This PR adds a default constructor for `syclcompat::dim3`, and changes
the type of members `x`,`y`,`z` from `const size_t` to `unsigned int`.
The default constructor sets all 3 dimensions to `1`. This means
patterns like this are now possible:

```cpp
syclcompat::dim3 myDim3;
myDim3.x = 32;
```

---------

Signed-off-by: Joe Todd <joe.todd@codeplay.com>
These tests pass now.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Unused param/func warnings

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…DIA and AMD backend in the clang-linker-wrapper for SYCL offloading (#14343)

This PR adds third-party (nvptx, amd) AOT support for SYCL offloading in
the clang-linker-wrapper tool.
Following list of changes are found;

1. Code inside clang-linker-wrapper has been refactored to get it ready
for non-SPIRV backends for SYCL offloading. Main changes are inside
linkAndWrapDeviceFiles. Explanations are added as comments.
2. A new clang-linker-wrapper option to pass additional device library
files for NVPTX backend has been added
3. An additional check to emit error when device library files are
unexpectedly empty has been added
4. New tests have been added to test the compilation flow for the new
changes and also existing tests have been modified to react to the
additional check correctly.

E2E tests and more extensive driver tests will be added in the next
stage.

Thanks

---------

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
SYCL language support on the part of the kernel_compiler is specified
here: intel/llvm#11985

However, that specification is not presently realizable. We need more
support from the FE and post link tool to get the demangled names
propagated through. But it is usable before that, with constraints about
using extern "C" or knowing the mangled kernel name. We have folk that
want to test in the interim. I've refrained from updating the spec, and
instead this interim support in our experimental extension will be snuck
in until it can be completed in full - then we'll update the spec to
release it properly.
To be re-enabled shortly.

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
post commit has unused var warnings that need tidying
intel/llvm#14397

The hang causes all jobs after to fail because the install dir can't be
deleted, we need to disable these ASAP.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Today we don't pass the gpu lit params so we see things running and
failing in the nightly that are already disabled in pre/postcommit.

Example:
https://github.com/intel/llvm/actions/runs/9754298775/job/26921343546

```
FAIL: SYCL :: Regression/in_order_barrier_profiling.cpp (1746 of 2090)
******************** TEST 'SYCL :: Regression/in_order_barrier_profiling.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 1
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp -o /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp -o /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 2
env ONEAPI_DEVICE_SELECTOR=opencl:gpu  /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=opencl:gpu /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out
# .---command stderr------------
# | in_order_barrier_profiling.cpp.tmp.out: /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp:42: int main(): Assertion `KernelEnd <= BarrierStart' failed.
# `-----------------------------
# error: command failed with exit status: -6

```

and the test already has:
```
// UNSUPPORTED: level_zero || (linux && opencl && gpu-intel-gen12)
```

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…14243)

Resolve issue with -save-temps regarding a temp file being linked to
itself.

---------

Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>
This moves the FPBuiltinFnSelection pass to the
llvm/lib/Transforms/Scalar directory. This is needed to enable future
changes that will run this pass as part of the main pipeline for device
compilation. With this pass in CodeGen, the slibs build would fail.
Improve the kernel fusion end-to-end tests:
* Remove the flag to embed IR on CUDA and AMD backend from the tests
that abort execution before requiring access to IR.
* Check for the number of kernel launches to make sure fusion did not
fail silently.

---------

Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
intel/llvm#14339 moved the FPBuiltinFnSelection
pass, but post-commit testing is failing due to the Transform/Scalar
CMake not linking with TargetParser. This commit fixes this issue.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…4264)

This PR adds a missing feature in SYCL hierarchical parallelism support.
Specifically, this PR adds support for the case when there are functions
between parallel_for_work_group and parallel_for_work_item in the call
stack.
For example:
void foo(sycl::group<1> group, ...) {
    group.parallel_for_work_item(range<1>(), [&](h_item<1> i) { ... });
}
// ...
  cgh.parallel_for_work_group<class kernel>(
   range<1>(...), range<1>(...), [=](group<1> g) {
   foo(g, ...);
  });

---------

Signed-off-by: Sudarsanam, Arvind <arvind.sudarsanam@intel.com>
This aspect is not required because bfloat16 math functions are
implemented for all devices via generic implementations. This PR updates
this status inline with the main bfloat16 extension/doc.

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@tomflinda tomflinda requested a review from a team as a code owner July 4, 2024 05:51
@tomflinda tomflinda requested review from ShengchenJ and zhiweij1 July 4, 2024 05:51
Copy link
Copy Markdown
Contributor

@tangjj11 tangjj11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@AndyCHHuang AndyCHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhimingwang36 zhimingwang36 merged commit d3e8ce0 into oneapi-src:SYCLomatic Jul 4, 2024
tomflinda added 3 commits July 5, 2024 05:10
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.