[Flang][MLIR] Add basic initial support for alloca and program address space handling in FIR->LLVMIR codegen #77518

agozillon · 2024-01-09T20:35:48Z

This is a slightly more slimmed down and up-to-date version of the older PR from here: https://reviews.llvm.org/D144203, written by @jsjodin, which has already under gone some review.

This PR places allocas in the alloca address space specified by the provided data layout (default is 0 for all address spaces, unless explicitly specified by the layout), and then will cast these alloca's to the program address space if this address space is different from the allocation address space. For most architectures data layouts, this will be a no-op, as they have a flat address space. But in the case of AMDGPU it will result in allocas being placed in the correct address space (5, private), and then casted into the correct program address space (0, generic). This results in correct (partially, a follow up PR will be forthcoming soon) generation of allocations inside of device code.

This PR is in addition to the work by @skatrak in this PR: #69599 and adds seperate and neccesary functionality of casting alloca's from their address space to the program address space, both are independent PRs, although there is some minor overlap e.g. this PR incorporates some of the useful helper functions from 69599, so whichever lands first will need a minor rebase.

Co-author: jsjodin

llvmbot · 2024-01-09T20:36:21Z

@llvm/pr-subscribers-flang-codegen

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (agozillon)

Changes

This is a slightly more slimmed down and up-to-date version of the older PR from here: https://reviews.llvm.org/D144203, written by @jsjodin, which has already under gone some review.

This PR places allocas in the alloca address space specified by the provided data layout (default is 0 for all address spaces, unless explicitly specified by the layout), and then will cast these alloca's to the program address space if this address space is different from the allocation address space. For most architectures data layouts, this will be a no-op, as they have a flat address space. But in the case of AMDGPU it will result in allocas being placed in the correct address space (5, private), and then casted into the correct program address space (0, generic). This results in correct (partially, a follow up PR will be forthcoming soon) generation of allocations inside of device code.

This PR is in addition to the work by @skatrak in this PR: #69599 and adds seperate and neccesary functionality of casting alloca's from their address space to the program address space, both are independent PRs, although there is some minor overlap e.g. this PR incorporates some of the useful helper functions from 69599, so whichever lands first will need a minor rebase.

Co-author: jsjodin

Patch is 28.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77518.diff

3 Files Affected:

(modified) flang/include/flang/Optimizer/CodeGen/CGPasses.td (+2)
(modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+83-10)
(modified) flang/test/Fir/convert-to-llvm.fir (+74-32)

diff --git a/flang/include/flang/Optimizer/CodeGen/CGPasses.td b/flang/include/flang/Optimizer/CodeGen/CGPasses.td
index 0d20a669a15a1f..9798019bfd6a98 100644
--- a/flang/include/flang/Optimizer/CodeGen/CGPasses.td
+++ b/flang/include/flang/Optimizer/CodeGen/CGPasses.td
@@ -27,6 +27,8 @@ def FIRToLLVMLowering : Pass<"fir-to-llvm-ir", "mlir::ModuleOp"> {
   let options = [
     Option<"forcedTargetTriple", "target", "std::string", /*default=*/"",
            "Override module's target triple.">,
+    Option<"forcedDataLayout", "datalayout", "std::string", /*default=*/"",
+           "Override module's data layout.">,
     Option<"applyTBAA", "apply-tbaa", "bool", /*default=*/"false",
            "Attach TBAA tags to memory accessing operations.">
   ];
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index e07732d57880c5..0768c8144f2fc0 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -44,6 +44,7 @@
 #include "mlir/IR/Matchers.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
+#include "mlir/Target/LLVMIR/Import.h"
 #include "mlir/Target/LLVMIR/ModuleTranslation.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/TypeSwitch.h"
@@ -67,8 +68,41 @@ static constexpr unsigned defaultAlign = 8;
 static constexpr unsigned kAttrPointer = CFI_attribute_pointer;
 static constexpr unsigned kAttrAllocatable = CFI_attribute_allocatable;
 
-static inline mlir::Type getLlvmPtrType(mlir::MLIRContext *context) {
-  return mlir::LLVM::LLVMPointerType::get(context);
+static inline unsigned getAllocaAddressSpace(mlir::ModuleOp module) {
+  if (mlir::Attribute addrSpace =
+          mlir::DataLayout(module).getAllocaMemorySpace())
+    return addrSpace.cast<mlir::IntegerAttr>().getUInt();
+
+  return 0u;
+}
+
+static inline unsigned getProgramAddressSpace(mlir::ModuleOp module) {
+  if (mlir::Attribute addrSpace =
+          mlir::DataLayout(module).getProgramMemorySpace())
+    return addrSpace.cast<mlir::IntegerAttr>().getUInt();
+
+  return 0u;
+}
+
+static inline unsigned
+getAllocaAddressSpace(mlir::ConversionPatternRewriter &rewriter) {
+  mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
+  return parentOp ? ::getAllocaAddressSpace(
+                        parentOp->getParentOfType<mlir::ModuleOp>())
+                  : 0u;
+}
+
+static inline unsigned
+getProgramAddressSpace(mlir::ConversionPatternRewriter &rewriter) {
+  mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
+  return parentOp ? ::getProgramAddressSpace(
+                        parentOp->getParentOfType<mlir::ModuleOp>())
+                  : 0u;
+}
+
+static inline mlir::Type getLlvmPtrType(mlir::MLIRContext *context,
+                                        unsigned addressSpace = 0) {
+  return mlir::LLVM::LLVMPointerType::get(context, addressSpace);
 }
 
 static inline mlir::Type getI8Type(mlir::MLIRContext *context) {
@@ -369,7 +403,7 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern<FromOp> {
   }
 
   // Generate an alloca of size 1 for an object of type \p llvmObjectTy.
-  mlir::LLVM::AllocaOp
+  mlir::Value
   genAllocaWithType(mlir::Location loc, mlir::Type llvmObjectTy,
                     unsigned alignment,
                     mlir::ConversionPatternRewriter &rewriter) const {
@@ -378,9 +412,23 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern<FromOp> {
     mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
     rewriter.setInsertionPointToStart(insertBlock);
     auto size = genI32Constant(loc, rewriter, 1);
-    mlir::Type llvmPtrTy = ::getLlvmPtrType(llvmObjectTy.getContext());
-    auto al = rewriter.create<mlir::LLVM::AllocaOp>(
-        loc, llvmPtrTy, llvmObjectTy, size, alignment);
+    unsigned allocaAs = getAllocaAddressSpace(rewriter);
+    unsigned programAs = getProgramAddressSpace(rewriter);
+
+    mlir::Value al = rewriter.create<mlir::LLVM::AllocaOp>(
+        loc, ::getLlvmPtrType(llvmObjectTy.getContext(), allocaAs),
+        llvmObjectTy, size, alignment);
+
+    // if our allocation address space, is not the same as the program address
+    // space, then we must emit a cast to the program address space before use.
+    // An example case would be on AMDGPU, where the allocation address space is
+    // the numeric value 5 (private), and the program address space is 0
+    // (generic).
+    if (allocaAs != programAs) {
+      al = rewriter.create<mlir::LLVM::AddrSpaceCastOp>(
+          loc, ::getLlvmPtrType(llvmObjectTy.getContext(), programAs), al);
+    }
+
     rewriter.restoreInsertionPoint(thisPt);
     return al;
   }
@@ -532,20 +580,34 @@ struct AllocaOpConversion : public FIROpConversion<fir::AllocaOp> {
         size = rewriter.create<mlir::LLVM::MulOp>(
             loc, ity, size, integerCast(loc, rewriter, ity, operands[i]));
     }
-    mlir::Type llvmPtrTy = ::getLlvmPtrType(alloc.getContext());
+
+    unsigned allocaAs = getAllocaAddressSpace(rewriter);
+    unsigned programAs = getProgramAddressSpace(rewriter);
+
     // NOTE: we used to pass alloc->getAttrs() in the builder for non opaque
     // pointers! Only propagate pinned and bindc_name to help debugging, but
     // this should have no functional purpose (and passing the operand segment
     // attribute like before is certainly bad).
     auto llvmAlloc = rewriter.create<mlir::LLVM::AllocaOp>(
-        loc, llvmPtrTy, llvmObjectType, size);
+        loc, ::getLlvmPtrType(alloc.getContext(), allocaAs), llvmObjectType,
+        size);
     if (alloc.getPinned())
       llvmAlloc->setDiscardableAttr(alloc.getPinnedAttrName(),
                                     alloc.getPinnedAttr());
     if (alloc.getBindcName())
       llvmAlloc->setDiscardableAttr(alloc.getBindcNameAttrName(),
                                     alloc.getBindcNameAttr());
-    rewriter.replaceOp(alloc, llvmAlloc);
+    if (allocaAs == programAs) {
+      rewriter.replaceOp(alloc, llvmAlloc);
+    } else {
+      // if our allocation address space, is not the same as the program address
+      // space, then we must emit a cast to the program address space before
+      // use. An example case would be on AMDGPU, where the allocation address
+      // space is the numeric value 5 (private), and the program address space
+      // is 0 (generic).
+      rewriter.replaceOpWithNewOp<mlir::LLVM::AddrSpaceCastOp>(
+          alloc, ::getLlvmPtrType(alloc.getContext(), programAs), llvmAlloc);
+    }
     return mlir::success();
   }
 };
@@ -3114,7 +3176,7 @@ struct LoadOpConversion : public FIROpConversion<fir::LoadOp> {
       auto storeOp =
           rewriter.create<mlir::LLVM::StoreOp>(loc, boxValue, newBoxStorage);
       attachTBAATag(storeOp, boxTy, boxTy, nullptr);
-      rewriter.replaceOp(load, newBoxStorage.getResult());
+      rewriter.replaceOp(load, newBoxStorage);
     } else {
       auto loadOp = rewriter.create<mlir::LLVM::LoadOp>(
           load.getLoc(), llvmLoadTy, adaptor.getOperands(), load->getAttrs());
@@ -3808,6 +3870,17 @@ class FIRToLLVMLowering
     if (!forcedTargetTriple.empty())
       fir::setTargetTriple(mod, forcedTargetTriple);
 
+    if (!forcedDataLayout.empty()) {
+      llvm::DataLayout dl(forcedDataLayout);
+      mlir::MLIRContext *context = mod.getContext();
+      mod->setAttr(
+          mlir::LLVM::LLVMDialect::getDataLayoutAttrName(),
+          mlir::StringAttr::get(context, dl.getStringRepresentation()));
+      mlir::DataLayoutSpecInterface dlSpec =
+          mlir::translateDataLayout(dl, context);
+      mod->setAttr(mlir::DLTIDialect::kDataLayoutAttrName, dlSpec);
+    }
+
     // Run dynamic pass pipeline for converting Math dialect
     // operations into other dialects (llvm, func, etc.).
     // Some conversions of Math operations cannot be done
diff --git a/flang/test/Fir/convert-to-llvm.fir b/flang/test/Fir/convert-to-llvm.fir
index be82ffab7e33ef..21323a5e657c94 100644
--- a/flang/test/Fir/convert-to-llvm.fir
+++ b/flang/test/Fir/convert-to-llvm.fir
@@ -1,13 +1,14 @@
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=i386-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=powerpc64le-unknown-linux-gn" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-pc-win32" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-apple-darwin" %s | FileCheck %s --check-prefixes=CHECK,CHECK-NO-COMDAT
-
-//=============================================================================
-// SUMMARY: Tests for FIR --> LLVM MLIR conversion independent of the target
-//=============================================================================
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=i386-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=powerpc64le-unknown-linux-gn" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-pc-win32" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-apple-darwin" %s | FileCheck %s --check-prefixes=CHECK,CHECK-NO-COMDAT,GENERIC 
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=amdgcn-amd-amdhsa, datalayout=e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-P0" %s | FileCheck -check-prefixes=CHECK,AMDGPU %s
+
+//===================================================
+// SUMMARY: Tests for FIR --> LLVM MLIR conversion
+//===================================================
 
 // Test simple global LLVM conversion
 
@@ -919,7 +920,9 @@ func.func @test_load_box(%addr : !fir.ref<!fir.box<!fir.array<10xf32>>>) {
 // CHECK-LABEL: llvm.func @test_load_box(
 // CHECK-SAME:      %[[arg0:.*]]: !llvm.ptr) {
 // CHECK-NEXT:    %[[c1:.*]] = llvm.mlir.constant(1 : i32) : i32
-// CHECK-NEXT:    %[[box_copy:.*]] = llvm.alloca %[[c1]] x !llvm.struct<([[DESC_TYPE:.*]])>
+// GENERIC-NEXT:  %[[box_copy:.*]] = llvm.alloca %[[c1]] x !llvm.struct<([[DESC_TYPE:.*]])>
+// AMDGPU-NEXT:   %[[alloca_box_copy:.*]] = llvm.alloca %[[c1]] x !llvm.struct<([[DESC_TYPE:.*]])>{{.*}} : (i32) -> !llvm.ptr<5>
+// AMDGPU-NEXT:   %[[box_copy:.*]] = llvm.addrspacecast %[[alloca_box_copy]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK-NEXT:    %[[box_val:.*]] = llvm.load %[[arg0]] : !llvm.ptr -> !llvm.struct<([[DESC_TYPE]])>
 // CHECK-NEXT:    llvm.store %[[box_val]], %[[box_copy]] : !llvm.struct<([[DESC_TYPE]])>, !llvm.ptr
 // CHECK-NEXT:    llvm.call @takes_box(%[[box_copy]]) : (!llvm.ptr) -> ()
@@ -1064,9 +1067,12 @@ func.func @alloca_one() -> !fir.ref<i32> {
 
 // CHECK-LABEL: llvm.func @alloca_one() -> !llvm.ptr
 // CHECK: [[N:%.*]] = llvm.mlir.constant(1 : i64) : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[N]] x i32
+// GENERIC: [[A:%.*]] = llvm.alloca [[N]] x i32
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[N]] x i32 : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[A]] : !llvm.ptr
 
+
 // -----
 
 // Test fir.alloca of several elements
@@ -1081,7 +1087,9 @@ func.func @alloca_several() -> !fir.ref<i32> {
 // CHECK: [[N:%.*]] = llvm.mlir.constant(100 : index) : i64
 // CHECK: [[ONE:%.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK: [[TOTAL:%.*]] = llvm.mul [[ONE]], [[N]] : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[TOTAL]] x i32
+// GENERIC: [[A:%.*]] = llvm.alloca [[TOTAL]] x i32
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[TOTAL]] x i32 : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[A]] : !llvm.ptr
 
 // -----
@@ -1095,7 +1103,9 @@ func.func @alloca_ptr_to_array() -> !fir.ref<!fir.ptr<!fir.array<?xi32>>> {
 
 // CHECK-LABEL: llvm.func @alloca_ptr_to_array() -> !llvm.ptr
 // CHECK: [[ONE:%.*]] = llvm.mlir.constant(1 : i64) : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[ONE]] x !llvm.ptr
+// GENERIC: [[A:%.*]] = llvm.alloca [[ONE]] x !llvm.ptr
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[ONE]] x !llvm.ptr : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[A]] : !llvm.ptr
 
 // -----
@@ -1113,7 +1123,9 @@ func.func @alloca_char_array(%l: i32, %e : index) -> !fir.ref<!fir.array<?x?x!fi
 // CHECK-DAG: [[LCAST:%.*]] = llvm.sext [[L]] : i32 to i64
 // CHECK: [[PROD1:%.*]] = llvm.mul [[LCAST]], [[E]] : i64
 // CHECK: [[PROD2:%.*]] = llvm.mul [[PROD1]], [[E]] : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[PROD2]] x i8
+// GENERIC: [[A:%.*]] = llvm.alloca [[PROD2]] x i8
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[PROD2]] x i8 : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: return [[A]] : !llvm.ptr
 
 // -----
@@ -1130,7 +1142,9 @@ func.func @alloca_fixed_char_array(%e : index) -> !fir.ref<!fir.array<?x?x!fir.c
 // CHECK-DAG: [[ONE:%.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK: [[PROD1:%.*]] = llvm.mul [[ONE]], [[E]] : i64
 // CHECK: [[PROD2:%.*]] = llvm.mul [[PROD1]], [[E]] : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[PROD2]] x !llvm.array<8 x i8>
+// GENERIC: [[A:%.*]] = llvm.alloca [[PROD2]] x !llvm.array<8 x i8>
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[PROD2]] x !llvm.array<8 x i8> : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: return [[A]] : !llvm.ptr
 
 // -----
@@ -1154,7 +1168,9 @@ func.func @alloca_record(%arg0 : i32, %arg1 : i16) -> !fir.ref<!fir.type<_QTt(p1
 // CHECK-SAME: ([[ARG0:%.*]]: i32, [[ARG1:%.*]]: i16)
 // CHECK-SAME: -> !llvm.ptr
 // CHECK: [[SIZE:%.*]] = llvm.call @_QTtP.mem.size([[ARG0]], [[ARG1]]) : (i32, i16) -> i64
-// CHECK: [[ALLOC:%.*]] = llvm.alloca [[SIZE]] x i8
+// GENERIC: [[ALLOC:%.*]] = llvm.alloca [[SIZE]] x i8
+// AMDGPU: [[A:%.*]] = llvm.alloca [[SIZE]] x i8 : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[ALLOC:%.*]] = llvm.addrspacecast [[A]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[ALLOC]] : !llvm.ptr
 
 // -----
@@ -1173,7 +1189,9 @@ func.func @alloca_multidim_array(%0 : index) -> !fir.ref<!fir.array<8x16x32xf32>
 // CHECK: [[ONE:%.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK: [[MUL1:%.*]] = llvm.mul [[ONE]], [[OP1]] : i64
 // CHECK: [[TOTAL:%.*]] = llvm.mul [[MUL1]], [[OP2]] : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<32 x array<16 x array<8 x f32>
+// GENERIC: [[A:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<32 x array<16 x array<8 x f32>>>
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<32 x array<16 x array<8 x f32>>> : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[A]] : !llvm.ptr
 
 // -----
@@ -1192,7 +1210,9 @@ func.func @alloca_const_interior_array(%0 : index) -> !fir.ref<!fir.array<8x9x?x
 // CHECK: [[ONE:%.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK: [[MUL1:%.*]] = llvm.mul [[ONE]], [[OP1]] : i64
 // CHECK: [[TOTAL:%.*]] = llvm.mul [[MUL1]], [[OP2]] : i64
-// CHECK: [[A:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<9 x array<8 x f32>
+// GENERIC: [[A:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<9 x array<8 x f32>>
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[TOTAL]] x !llvm.array<9 x array<8 x f32>> : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[A:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[A]] : !llvm.ptr
 
 // -----
@@ -1212,7 +1232,9 @@ func.func @alloca_array_with_holes(%0 : index, %1 : index) -> !fir.ref<!fir.arra
 // CHECK: [[PROD1:%.*]] = llvm.mul [[ONE]], [[FIXED]] : i64
 // CHECK: [[PROD2:%.*]] = llvm.mul [[PROD1]], [[A]] : i64
 // CHECK: [[PROD3:%.*]] = llvm.mul [[PROD2]], [[B]] : i64
-// CHECK: [[RES:%.*]] = llvm.alloca [[PROD3]] x !llvm.array<4 x i32>
+// GENERIC: [[RES:%.*]] = llvm.alloca [[PROD3]] x !llvm.array<4 x i32>
+// AMDGPU: [[AA:%.*]] = llvm.alloca [[PROD3]] x !llvm.array<4 x i32> : (i64) -> !llvm.ptr<5>
+// AMDGPU: [[RES:%.*]] = llvm.addrspacecast [[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK: llvm.return [[RES]] : !llvm.ptr
 
 // -----
@@ -1551,7 +1573,9 @@ func.func @embox0(%arg0: !fir.ref<!fir.array<100xi32>>) {
 // CHECK-LABEL: func @embox0(
 // CHECK-SAME:               %[[ARG0:.*]]: !llvm.ptr
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i32) : i32
-// CHECK:         %[[ALLOCA:.*]] = llvm.alloca %[[C1]] x !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}})> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+// GENERIC:       %[[ALLOCA:.*]] = llvm.alloca %[[C1]] x !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}})> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+// AMDGPU:        %[[AA:.*]] = llvm.alloca %[[C1]] x !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}})> {alignment = 8 : i64} : (i32) -> !llvm.ptr<5>
+// AMDGPU:        %[[ALLOCA:.*]] = llvm.addrspacecast %[[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK:         %[[TYPE_CODE:.*]] = llvm.mlir.constant(9 : i32) : i32
 // CHECK:         %[[NULL:.*]] = llvm.mlir.zero : !llvm.ptr
 // CHECK:         %[[GEP:.*]] = llvm.getelementptr %[[NULL]][1]
@@ -1694,7 +1718,7 @@ func.func @embox1(%arg0: !fir.ref<!fir.type<_QMtest_dinitTtseq{i:i32}>>) {
 // CHECK:         %{{.*}} = llvm.insertvalue %[[TYPE_CODE_I8]], %{{.*}}[4] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
 // CHECK:         %[[F18ADDENDUM:.*]] = llvm.mlir.constant(1 : i32) : i32
 // CHECK:         %[[F18ADDENDUM_I8:.*]] = llvm.trunc %[[F18ADDENDUM]] : i32 to i8
-// CHECK:         %{{.*}} = llvm.insertvalue %[[F18ADDENDUM_I8]], %17[6] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
+// CHECK:         %{{.*}} = llvm.insertvalue %[[F18ADDENDUM_I8]], %{{.*}}[6] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
 // CHECK:         %[[TDESC:.*]] = llvm.mlir.addressof @_QMtest_dinitE.dt.tseq : !llvm.ptr
 // CHECK:         %{{.*}} = llvm.insertvalue %[[TDESC]], %{{.*}}[7] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
 
@@ -1752,7 +1776,9 @@ func.func @no_reassoc(%arg0: !fir.ref<i32>) {
 // CHECK-LABEL: llvm.func @no_reassoc(
 // CHECK-SAME:                        %[[ARG0:.*]]: !llvm.ptr) {
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
-// CHECK:         %[[ALLOC:.*]] = llvm.alloca %[[C1]] x i32 : (i64) -> !llvm.ptr
+// GENERIC:       %[[ALLOC:.*]] = llvm.alloca %[[C1]] x i32 : (i64) -> !llvm.ptr
+// AMDGPU:        %[[AA:.*]] = llvm.alloca %[[C1]] x i32 : (i64) -> !llvm.ptr<5>
+// AMDGPU:        %[[ALLOC:.*]] = llvm.addrspacecast %[[AA]] : !llvm.ptr<5> to !llvm.ptr
 // CHECK:         %[[LOAD:.*]] = llvm.load %[[ARG0]] : !llvm.ptr -> i32
 // CHECK:         llvm.store %[[LOAD]], %[[ALLOC]] : i32, !llvm.ptr
 // CHECK:         llvm.return
@@ -1772,7 +1798,9 @@ func.func @xembox0(%arg0: !fir.ref<!fir.array<?xi32>>) {
 // CHECK-LABEL: llvm.func @xembox0(
 // CHECK-SAME:                     %[[ARG0:.*]]: !llvm.ptr
 // CHECK:         %[[ALLOCA_SIZE:.*]] = llvm.mlir.constant(1 : i32) : i32
-// CHECK:         %[[ALLOCA:.*]] = llvm.alloca %[[ALLOCA_SIZE]] x !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, array<1 x array<3 x i64>>)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+// GENERIC:       %[[ALLOCA:.*]] = llvm.alloca %[[ALLOCA_SIZE]] x !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}},...
[truncated]

ergawy

Thanks! I learned a few more things from your PR!

Just a few suggestions and questions.

flang/lib/Optimizer/CodeGen/CodeGen.cpp

flang/test/Fir/convert-to-llvm.fir

agozillon · 2024-01-10T14:17:28Z

Updated PR with requested changes, except the spacing to the test file till another reviewer chimes in and the tidying up of getLlvmPtrType which might be more suitable for a seperate follow up PR!

agozillon · 2024-01-10T14:24:45Z

and I forgot to mention, thank you for the review @ergawy!

flang/test/Fir/convert-to-llvm.fir

agozillon · 2024-01-15T22:52:51Z

Would be good to get a review and approval from one of the previous reviewers (or creators @jsjodin ) of the previous PR on Phabricator as well if possible @jeanPerier @kiranchandramohan, so if one of you has time it would be greatly appreciated (I know you're both incredibly busy so I apologies for requesting more of your time), thank you very much ahead of time. If you're happy with it as is that is also fine!

As an aside this also seems to help fix a compiler crash when using transpose on a regular array inside of a target region, so it doesn't just apply to allocatables, but more broadly to cases where the fortran runtime injects alloca operations into the target region (and user code I'd imagine as well, although, I don't have the Fortran know-how to create a test case for that off the top of my head). After this PR lands, there is however, still a chance of some passes performing some breaking optimisations (executable will give incorrect results or segault) as we currently don't appropriately place these allocation in the entry block when we generate the target region. This happens in the case of scalar allocatable assignment currently, not had the chance to test it in this case just yet, only managed to confirm that this PR fixes the compiler crash so far. I'm having a look into raising the allocas into the entry block, but creating the simple fix I had in mind has proven harder than I expected so far.

jeanPerier

Small suggestion, LGTM otherwise.

jeanPerier · 2024-01-10T09:53:05Z

flang/lib/Optimizer/CodeGen/CodeGen.cpp

+      mlir::MLIRContext *context = mod.getContext();
+      mod->setAttr(
+          mlir::LLVM::LLVMDialect::getDataLayoutAttrName(),
+          mlir::StringAttr::get(context, dl.getStringRepresentation()));
+      mlir::DataLayoutSpecInterface dlSpec =
+          mlir::translateDataLayout(dl, context);
+      mod->setAttr(mlir::DLTIDialect::kDataLayoutAttrName, dlSpec);


You can use fir::support::setMLIRDataLayout(mod, dl); here (https://github.com/llvm/llvm-project/blob/a6b5d6dab0544892fb6afc46f71677969285c5a8/flang/include/flang/Optimizer/Support/DataLayout.h#L32C6-L32C23).

Thank you very much, I will change this before landing!

agozillon · 2024-01-16T21:20:38Z

Thank you @jeanPerier I'll update with the suggested fix and land tomorrow or the following day, so if anyone has any further input please do feel free to add it in the meantime

jsjodin

LGTM

@jsjodin

…s space handling in FIR->LLVMIR codegen This is a slightly more slimmed down and up-to-date version of the older PR from here: https://reviews.llvm.org/D144203, written by @jsjodin, which has already under gone some review. This PR places allocas in the alloca address space specified by the provided datalayout (default is 0 for all address spaces), and then will cast these alloca's to the program address space if this address space is different from the allocation address space. For most architectures data layouts, this will be a no-op, as they have a flat address space. But in the case of AMDGPU it will result in allocas being placed in the correct address space (5, private), and then casted into the correct program address space (0, generic). This results in correct (partially, a follow up PR will be forthcoming soon) generation of allocations inside of device code. This PR is in addition to the work by @skatrak in this PR: llvm#69599 and adds seperate and neccesary functionality of casting alloca's from their address space to the program address space, both are independent PRs, although there is some minor overlap e.g. this PR incorporates some of the useful helper functions from 69599, so whichever lands first will need a minor rebase. Co-author: jsjodin

…RDataLayout

@jsjodin

…s space handling in FIR->LLVMIR codegen (llvm#77518) This is a slightly more slimmed down and up-to-date version of the older PR from here: https://reviews.llvm.org/D144203, written by @jsjodin, which has already under gone some review. This PR places allocas in the alloca address space specified by the provided data layout (default is 0 for all address spaces, unless explicitly specified by the layout), and then will cast these alloca's to the program address space if this address space is different from the allocation address space. For most architectures data layouts, this will be a no-op, as they have a flat address space. But in the case of AMDGPU it will result in allocas being placed in the correct address space (5, private), and then casted into the correct program address space (0, generic). This results in correct (partially, a follow up PR will be forthcoming soon) generation of allocations inside of device code. This PR is in addition to the work by @skatrak in this PR: llvm#69599 and adds seperate and neccesary functionality of casting alloca's from their address space to the program address space, both are independent PRs, although there is some minor overlap e.g. this PR incorporates some of the useful helper functions from 69599, so whichever lands first will need a minor rebase. Co-author: jsjodin

agozillon requested review from ergawy, skatrak, jsjodin, kiranchandramohan, mjklemm, raghavendhra, jeanPerier, DominikAdamski, dpalermo, TIFitis and kparzysz January 9, 2024 20:35

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Jan 9, 2024

ergawy reviewed Jan 10, 2024

View reviewed changes

ergawy approved these changes Jan 11, 2024

View reviewed changes

flang/test/Fir/convert-to-llvm.fir Show resolved Hide resolved

jeanPerier approved these changes Jan 16, 2024

View reviewed changes

jsjodin approved these changes Jan 17, 2024

View reviewed changes

agozillon added 3 commits January 17, 2024 09:14

[Flang][MLIR] Minor changes based on reviewer feedback

b20b7ee

[Flang][MLIR][OpenMP] Simplify data layot setting using helper setMLI…

62d6b6b

…RDataLayout

agozillon force-pushed the alloca-as-program-as-pr branch from fa575ac to 62d6b6b Compare January 17, 2024 16:27

agozillon merged commit abeb6c9 into llvm:main Jan 17, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flang][MLIR] Add basic initial support for alloca and program address space handling in FIR->LLVMIR codegen #77518

[Flang][MLIR] Add basic initial support for alloca and program address space handling in FIR->LLVMIR codegen #77518

agozillon commented Jan 9, 2024

llvmbot commented Jan 9, 2024 •

edited

ergawy left a comment

agozillon commented Jan 10, 2024

agozillon commented Jan 10, 2024

agozillon commented Jan 15, 2024 •

edited

jeanPerier left a comment

jeanPerier Jan 10, 2024

agozillon Jan 16, 2024

agozillon commented Jan 16, 2024

jsjodin left a comment

[Flang][MLIR] Add basic initial support for alloca and program address space handling in FIR->LLVMIR codegen #77518

[Flang][MLIR] Add basic initial support for alloca and program address space handling in FIR->LLVMIR codegen #77518

Conversation

agozillon commented Jan 9, 2024

llvmbot commented Jan 9, 2024 • edited

ergawy left a comment

Choose a reason for hiding this comment

agozillon commented Jan 10, 2024

agozillon commented Jan 10, 2024

agozillon commented Jan 15, 2024 • edited

jeanPerier left a comment

Choose a reason for hiding this comment

jeanPerier Jan 10, 2024

Choose a reason for hiding this comment

agozillon Jan 16, 2024

Choose a reason for hiding this comment

agozillon commented Jan 16, 2024

jsjodin left a comment

Choose a reason for hiding this comment

llvmbot commented Jan 9, 2024 •

edited

agozillon commented Jan 15, 2024 •

edited