Skip to content

Conversation

@zhczhong
Copy link
Contributor

@zhczhong zhczhong commented Sep 2, 2024

Track: #288

  1. The extra memref.copy is caused by write-after-write conflict because the canonicalization pass eliminates the single-iteration loop but preserves the extracted slice and insert slice. So here skip generating them when they are single-iteration loop.
  2. Refactor the config. Infer dimType according to the contractionOpInterface.
  3. introduce padding cost, which minimizes the cost on padding and use divisible block if possible

innerMostKBlockCandidates = {16, 32, 64};
innerMostNBlockCandidates = {16, 32, 64};
NBlockCandidates = innerMostNBlockCandidates;
KBlockCandidates = innerMostKBlockCandidates;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after the change here, innermost Kblock will only be one of 16/32/64 if allowIndivisibleInnerblock is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your understanding is correct

@zhczhong zhczhong force-pushed the zhcong/enhance_config branch 2 times, most recently from 073ec23 to 0c56c7d Compare September 2, 2024 05:13
@zhczhong zhczhong force-pushed the zhcong/enhance_config branch from 0c56c7d to 01920b6 Compare September 3, 2024 02:59

inline void getDimTypeFromIterators(linalg::LinalgOp linalgOp,
SmallVectorImpl<DimType> &dimTypes) {
SmallVector<utils::IteratorType> iteratorTypes =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explicitly specify mlir::utils::IteratorType here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add mlir namespace here

@ciyongch
Copy link
Contributor

ciyongch commented Sep 4, 2024

Overall LGTM, do you think we should add a single simple case to cover the case of skipping single-iteration loop generation?

@zhczhong
Copy link
Contributor Author

zhczhong commented Sep 4, 2024

Overall LGTM, do you think we should add a single simple case to cover the case of skipping single-iteration loop generation?

This test could cover this case. The file check will check two scf.forall before this PR but the single-iteration scf.forall will be skipped now.

func.func @matmul_2Dx4D_bf16_with_dlti(%arg0: tensor<4096x4096xbf16>, %arg1: tensor<128x128x16x32x2xbf16>) -> tensor<4096x4096xbf16> {

@ciyongch
Copy link
Contributor

ciyongch commented Sep 4, 2024

Please help to rebase the code base, then we can merge it.

@zhczhong
Copy link
Contributor Author

zhczhong commented Sep 4, 2024

Please help to rebase the code base, then we can merge it.

The code has been rebased now

@zhczhong zhczhong merged commit 3f04dc9 into main Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants