Skip to content

Conversation

@farzonl
Copy link
Member

@farzonl farzonl commented Oct 30, 2025

fixes #165663

After experimenting with clang vs DXC via the offload test suite. Noticed clang was doing column major indexing:

Say we have the following HLSL

int2x3 M = int2x3(1, 2, 3, 4, 5, 6); 

Clang and DXC will initalize the constructors the same but walking the index is completely different:

1,3,5
2,4,6

[row][col] -->
[0][0] = 1
[0][1] = 3
[0][2] = 5
[1][0] = 2
[1][1] = 4
[1][2] = 6

While DXC was doing row major indexing

1,2,3,
4,5,6

[row][col] -->
[0][0] = 1
[0][1] = 2
[0][2] = 3
[1][0] = 4
[1][1] = 5
[1][2] = 6

I ended up changing the indexing scheme of MatrixSubscriptExpr to get the offload tests to pass.

Reading the docs See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math#matrix-ordering

Row-major and column-major packing order has no influence on the packing order of constructors
(which always follows row-major ordering).

I think this might explain the indexing differences, because the memory doesn't appear to be laid out any differently but accessing the elements is clearly different.

So changed up the flatten index and cofirmed it did not change for c\c++ via testing.

fixes llvm#165663

After experimenting with clang vs DXC via the offload test suite.
Noticed clang was doing column major init:
```
1,3,5
2,4,6

[row][col] -->
[0][0] = 1
[0][1] = 3
[0][2] = 5
[1][0] = 2
[1][1] = 4
[1][2] = 6
```

While DXC was doing row major indexing

```
1,2,3,
4,5,6

[row][col] -->
[0][0] = 1
[0][1] = 2
[0][2] = 3
[1][0] = 4
[1][1] = 5
[1][2] = 6

```

Reading the docs See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math#matrix-ordering

```
Row-major and column-major packing order has no influence on the packing order of constructors
(which always follows row-major ordering).
```

So changed up the flatten index and cofirmed it did not change for c\c++
via testing.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. HLSL HLSL Language Support llvm:ir labels Oct 30, 2025
@farzonl farzonl requested review from fhahn and llvm-beanz October 30, 2025 07:35
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang

Author: Farzon Lotfi (farzonl)

Changes

fixes #165663

After experimenting with clang vs DXC via the offload test suite. Noticed clang was doing column major init:

1,3,5
2,4,6

[row][col] -->
[0][0] = 1
[0][1] = 3
[0][2] = 5
[1][0] = 2
[1][1] = 4
[1][2] = 6

While DXC was doing row major indexing

1,2,3,
4,5,6

[row][col] -->
[0][0] = 1
[0][1] = 2
[0][2] = 3
[1][0] = 4
[1][1] = 5
[1][2] = 6

Reading the docs See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math#matrix-ordering

Row-major and column-major packing order has no influence on the packing order of constructors
(which always follows row-major ordering).

So changed up the flatten index and cofirmed it did not change for c\c++ via testing.


Full diff: https://github.com/llvm/llvm-project/pull/165666.diff

6 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+12-5)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+10-2)
  • (added) clang/test/CodeGen/matrix-type-indexing.c (+35)
  • (added) clang/test/CodeGenCXX/matrix-type-indexing.cpp (+35)
  • (added) clang/test/CodeGenHLSL/BasicFeatures/matrix-type-indexing.hlsl (+30)
  • (modified) llvm/include/llvm/IR/MatrixBuilder.h (+15-2)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 01f2161f27555..208eb98972b23 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4824,11 +4824,18 @@ LValue CodeGenFunction::EmitMatrixSubscriptExpr(const MatrixSubscriptExpr *E) {
   llvm::Value *RowIdx = EmitMatrixIndexExpr(E->getRowIdx());
   llvm::Value *ColIdx = EmitMatrixIndexExpr(E->getColumnIdx());
 
-  llvm::Value *NumRows = Builder.getIntN(
-      RowIdx->getType()->getScalarSizeInBits(),
-      E->getBase()->getType()->castAs<ConstantMatrixType>()->getNumRows());
-  llvm::Value *FinalIdx =
-      Builder.CreateAdd(Builder.CreateMul(ColIdx, NumRows), RowIdx);
+  llvm::Value *FinalIdx;
+  if (getLangOpts().HLSL) {
+    llvm::Value *NumCols = Builder.getIntN(
+        RowIdx->getType()->getScalarSizeInBits(),
+        E->getBase()->getType()->castAs<ConstantMatrixType>()->getNumColumns());
+    FinalIdx = Builder.CreateAdd(Builder.CreateMul(RowIdx, NumCols), ColIdx);
+  } else {
+    llvm::Value *NumRows = Builder.getIntN(
+        RowIdx->getType()->getScalarSizeInBits(),
+        E->getBase()->getType()->castAs<ConstantMatrixType>()->getNumRows());
+    FinalIdx = Builder.CreateAdd(Builder.CreateMul(ColIdx, NumRows), RowIdx);
+  }
   return LValue::MakeMatrixElt(
       MaybeConvertMatrixAddress(Base.getAddress(), *this), FinalIdx,
       E->getBase()->getType(), Base.getBaseInfo(), TBAAAccessInfo());
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 714192db1b15c..5220621fe7a70 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2118,9 +2118,17 @@ Value *ScalarExprEmitter::VisitMatrixSubscriptExpr(MatrixSubscriptExpr *E) {
   Value *ColumnIdx = CGF.EmitMatrixIndexExpr(E->getColumnIdx());
 
   const auto *MatrixTy = E->getBase()->getType()->castAs<ConstantMatrixType>();
-  unsigned NumRows = MatrixTy->getNumRows();
+
   llvm::MatrixBuilder MB(Builder);
-  Value *Idx = MB.CreateIndex(RowIdx, ColumnIdx, NumRows);
+  Value *Idx;
+  if (CGF.getLangOpts().HLSL) {
+    unsigned NumCols = MatrixTy->getNumColumns();
+    Idx = MB.createRowMajorIndex(RowIdx, ColumnIdx, NumCols);
+  } else {
+    unsigned NumRows = MatrixTy->getNumRows();
+    Idx = MB.createColumnMajorIndex(RowIdx, ColumnIdx, NumRows);
+  }
+
   if (CGF.CGM.getCodeGenOpts().OptimizationLevel > 0)
     MB.CreateIndexAssumption(Idx, MatrixTy->getNumElementsFlattened());
 
diff --git a/clang/test/CodeGen/matrix-type-indexing.c b/clang/test/CodeGen/matrix-type-indexing.c
new file mode 100644
index 0000000000000..19b9f02f5b0a4
--- /dev/null
+++ b/clang/test/CodeGen/matrix-type-indexing.c
@@ -0,0 +1,35 @@
+// RUN: %clang_cc1 -fenable-matrix -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+ typedef float fx2x3_t __attribute__((matrix_type(2, 3)));
+ float Out[6];
+
+void one_index_test(int index, fx2x3_t M) {
+    // CHECK-LABEL: one_index_test
+    // CHECK: %row = alloca i32, align 4
+    // CHECK: %col = alloca i32, align 4
+    // CHECK: [[row_load:%.*]] = load i32, ptr %row, align 4
+    // CHECK-NEXT: [[row_load_zext:%.*]] = zext i32 [[row_load]] to i64
+    // CHECK-NEXT: [[col_load:%.*]] = load i32, ptr %col, align 4
+    // CHECK-NEXT: [[col_load_zext:%.*]] = zext i32 [[col_load]] to i64
+    // CHECK-NEXT: [[col_offset:%.*]] = mul i64 [[col_load_zext]], 2
+    // CHECK-NEXT: [[col_major_index:%.*]] = add i64 [[col_offset]], [[row_load_zext]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x float>, ptr %M.addr, align 4
+    // CHECK-NEXT: %matrixext = extractelement <6 x float> [[matrix_as_vec]], i64 [[col_major_index]]
+    const unsigned int COLS = 3;
+    unsigned int row = index / COLS;
+    unsigned int col = index % COLS;
+    Out[index] = M[row][col];
+}
+
+float two_index_test(int row, int col, fx2x3_t M) {
+    // CHECK-LABEL: two_index_test
+    // CHECK: [[row_load:%.*]] = load i32, ptr %row.addr, align 4
+    // CHECK-NEXT: [[row_load_sext:%.*]] = sext i32 [[row_load]] to i64
+    // CHECK-NEXT: [[col_load:%.*]] = load i32, ptr %col.addr, align 4
+    // CHECK-NEXT: [[col_load_sext:%.*]] = sext i32 [[col_load]] to i64
+    // CHECK-NEXT: [[col_offset:%.*]] = mul i64 [[col_load_sext]], 2
+    // CHECK-NEXT: [[col_major_index:%.*]] = add i64 [[col_offset]], [[row_load_sext]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x float>, ptr %M.addr, align 4
+    // CHECK-NEXT: %matrixext = extractelement <6 x float> [[matrix_as_vec]], i64 [[col_major_index]]
+    return M[row][col];
+}
diff --git a/clang/test/CodeGenCXX/matrix-type-indexing.cpp b/clang/test/CodeGenCXX/matrix-type-indexing.cpp
new file mode 100644
index 0000000000000..b0ed5dc1d717e
--- /dev/null
+++ b/clang/test/CodeGenCXX/matrix-type-indexing.cpp
@@ -0,0 +1,35 @@
+// RUN: %clang_cc1 -fenable-matrix -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - -std=c++11 | FileCheck %s
+
+ typedef float fx2x3_t __attribute__((matrix_type(2, 3)));
+ float Out[6];
+
+void one_index_test(int index, fx2x3_t M) {
+    // CHECK-LABEL: one_index_test
+    // CHECK: %row = alloca i32, align 4
+    // CHECK: %col = alloca i32, align 4
+    // CHECK: [[row_load:%.*]] = load i32, ptr %row, align 4
+    // CHECK-NEXT: [[row_load_zext:%.*]] = zext i32 [[row_load]] to i64
+    // CHECK-NEXT: [[col_load:%.*]] = load i32, ptr %col, align 4
+    // CHECK-NEXT: [[col_load_zext:%.*]] = zext i32 [[col_load]] to i64
+    // CHECK-NEXT: [[col_offset:%.*]] = mul i64 [[col_load_zext]], 2
+    // CHECK-NEXT: [[col_major_index:%.*]] = add i64 [[col_offset]], [[row_load_zext]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x float>, ptr %M.addr, align 4
+    // CHECK-NEXT: %matrixext = extractelement <6 x float> [[matrix_as_vec]], i64 [[col_major_index]]
+    const unsigned int COLS = 3;
+    unsigned int row = index / COLS;
+    unsigned int col = index % COLS;
+    Out[index] = M[row][col];
+}
+
+float two_index_test(int row, int col, fx2x3_t M) {
+    // CHECK-LABEL: two_index_test
+    // CHECK: [[row_load:%.*]] = load i32, ptr %row.addr, align 4
+    // CHECK-NEXT: [[row_load_sext:%.*]] = sext i32 [[row_load]] to i64
+    // CHECK-NEXT: [[col_load:%.*]] = load i32, ptr %col.addr, align 4
+    // CHECK-NEXT: [[col_load_sext:%.*]] = sext i32 [[col_load]] to i64
+    // CHECK-NEXT: [[col_offset:%.*]] = mul i64 [[col_load_sext]], 2
+    // CHECK-NEXT: [[col_major_index:%.*]] = add i64 [[col_offset]], [[row_load_sext]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x float>, ptr %M.addr, align 4
+    // CHECK-NEXT: %matrixext = extractelement <6 x float> [[matrix_as_vec]], i64 [[col_major_index]]
+    return M[row][col];
+}
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/matrix-type-indexing.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/matrix-type-indexing.hlsl
new file mode 100644
index 0000000000000..ff63b23311e77
--- /dev/null
+++ b/clang/test/CodeGenHLSL/BasicFeatures/matrix-type-indexing.hlsl
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 -finclude-default-header -x hlsl -triple dxil-pc-shadermodel6.3-library %s -fnative-half-type -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// NOTE: This test is only to confirm we can do codgen with the matrix alias.
+
+RWBuffer<int> Out : register(u1);
+
+void one_index_test(int index, half2x3 M) {
+    // CHECK-LABEL: one_index_test
+    // CHECK: %row = alloca i32, align 4
+    // CHECK: %col = alloca i32, align 4
+    // CHECK: [[row_load:%.*]] = load i32, ptr %row, align 4
+    // CHECK-NEXT: [[col_load:%.*]] = load i32, ptr %col, align 4
+    // CHECK-NEXT: [[row_offset:%.*]] = mul i32 [[row_load]], 3
+    // CHECK-NEXT: [[row_major_index:%.*]] = add i32 [[row_offset]], [[col_load]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x half>, ptr %M.addr, align 2
+    // CHECK-NEXT: %matrixext = extractelement <6 x half> [[matrix_as_vec]], i32 [[row_major_index]]
+    const uint COLS = 3;
+    uint row = index / COLS;
+    uint col = index % COLS;
+    Out[index] = M[row][col];
+}
+
+half two_index_test(int row, int col, half2x3 M) {
+    // CHECK-LABEL: two_index_test
+    // CHECK: [[row_offset:%.*]] = mul i32 [[row_load:%.*]], 3
+    // CHECK-NEXT: [[row_major_index:%.*]] = add i32 [[row_offset]], [[col_load:%.*]]
+    // CHECK-NEXT: [[matrix_as_vec:%.*]] = load <6 x half>, ptr %M.addr, align 2
+    // CHECK-NEXT: %matrixext = extractelement <6 x half> [[matrix_as_vec]], i32 [[row_major_index]]
+    return M[row][col];
+}
diff --git a/llvm/include/llvm/IR/MatrixBuilder.h b/llvm/include/llvm/IR/MatrixBuilder.h
index 3a04ca87f2b55..f90cd92096103 100644
--- a/llvm/include/llvm/IR/MatrixBuilder.h
+++ b/llvm/include/llvm/IR/MatrixBuilder.h
@@ -241,8 +241,8 @@ class MatrixBuilder {
 
   /// Compute the index to access the element at (\p RowIdx, \p ColumnIdx) from
   /// a matrix with \p NumRows embedded in a vector.
-  Value *CreateIndex(Value *RowIdx, Value *ColumnIdx, unsigned NumRows,
-                     Twine const &Name = "") {
+  Value *createColumnMajorIndex(Value *RowIdx, Value *ColumnIdx,
+                                unsigned NumRows, Twine const &Name = "") {
     unsigned MaxWidth = std::max(RowIdx->getType()->getScalarSizeInBits(),
                                  ColumnIdx->getType()->getScalarSizeInBits());
     Type *IntTy = IntegerType::get(RowIdx->getType()->getContext(), MaxWidth);
@@ -251,6 +251,19 @@ class MatrixBuilder {
     Value *NumRowsV = B.getIntN(MaxWidth, NumRows);
     return B.CreateAdd(B.CreateMul(ColumnIdx, NumRowsV), RowIdx);
   }
+
+  /// Compute the index to access the element at (\p RowIdx, \p ColumnIdx) from
+  /// a matrix with \p NumRows embedded in a vector.
+  Value *createRowMajorIndex(Value *RowIdx, Value *ColumnIdx, unsigned NumCols,
+                             Twine const &Name = "") {
+    unsigned MaxWidth = std::max(RowIdx->getType()->getScalarSizeInBits(),
+                                 ColumnIdx->getType()->getScalarSizeInBits());
+    Type *IntTy = IntegerType::get(RowIdx->getType()->getContext(), MaxWidth);
+    RowIdx = B.CreateZExt(RowIdx, IntTy);
+    ColumnIdx = B.CreateZExt(ColumnIdx, IntTy);
+    Value *NumColsV = B.getIntN(MaxWidth, NumCols);
+    return B.CreateAdd(B.CreateMul(RowIdx, NumColsV), ColumnIdx);
+  }
 };
 
 } // end namespace llvm

}

/// Compute the index to access the element at (\p RowIdx, \p ColumnIdx) from
/// a matrix with \p NumRows embedded in a vector.
Copy link
Member Author

@farzonl farzonl Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// a matrix with \p NumRows embedded in a vector.
/// a matrix with \p NumCols embedded in a vector.

Copy link
Contributor

@Icohedron Icohedron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description you give the example of int2x3 M = int2x3(1, 2, 3, 4, 5, 6); but this example or similar does not appear to be in any of the tests for this PR.

Should there be a test here for matrix initialization via the constructor and/or initializer list?
Or does the matrix constructor test in the offload test suite suffice?

@farzonl
Copy link
Member Author

farzonl commented Oct 30, 2025

In the PR description you give the example of int2x3 M = int2x3(1, 2, 3, 4, 5, 6); but this example or similar does not appear to be in any of the tests for this PR.

Should there be a test here for matrix initialization via the constructor and/or initializer list? Or does the matrix constructor test in the offload test suite suffice?

The bug isn’t in the construction it was in how we were indexing the matrix. The initialization is the same for both dxc and clang.

only mentioned the offload test case because its what made me realize the matrix indexing IR was different between clang and dxc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category HLSL HLSL Language Support llvm:ir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HLSL] Matrix constructors in HLSL are always row major.

3 participants