feat: Support low precision#691
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements support for low-precision element types (FP8, FP4, etc.) on the A5 NPU architecture. It updates the PTO IR manual and adds design documentation, refactors type utility functions for storage size calculations, and enhances verifiers for tload, tstore, tprefetch, and tcvt to enforce architecture-specific constraints. Feedback highlights a bug in getPTOStorageElemByteSize that causes regressions for sub-byte types, the presence of absolute local paths in documentation, and the need to use isI8Like instead of isInteger(8) in verifiers to correctly support both signed and unsigned 8-bit integers.
|
|
||
| unsigned mlir::pto::getPTOStorageElemByteSize(Type t) { | ||
| unsigned bitWidth = getPTOStorageElemBitWidth(t); | ||
| return bitWidth == 0 ? 0 : bitWidth / kBitsPerByte; |
There was a problem hiding this comment.
The current implementation of getPTOStorageElemByteSize returns 0 for types with a bit width less than 8 (e.g., i1). In most NPU contexts, the smallest addressable storage unit is a byte, and sub-byte types like i1 are stored in a full byte. This causes regressions in several passes (like PTOIRTranslator and PTOPlanMemory) where a size of 0 now triggers failures or fatal errors instead of falling back to 1 byte. Using (bitWidth + kBitsPerByte - 1) / kBitsPerByte correctly calculates the storage size in bytes for all types.
| return bitWidth == 0 ? 0 : bitWidth / kBitsPerByte; | |
| return (bitWidth + kBitsPerByte - 1) / kBitsPerByte; |
| ## Scope | ||
|
|
||
| This note records the current low-precision C++ API support status in | ||
| `/Users/fangrui/workspace/huawei/pto-isa` for the following PTO operations: |
There was a problem hiding this comment.
The documentation contains absolute local file paths (e.g., /Users/fangrui/...). These should be replaced with relative paths or simply the repository name to ensure the documentation is portable and relevant for all contributors.
| `/Users/fangrui/workspace/huawei/pto-isa` for the following PTO operations: | |
| `.` for the following PTO operations: |
|
|
||
| static bool isA5AccStorePreQuantDstType(Type srcElem, Type dstElem) { | ||
| if (srcElem.isInteger(32)) | ||
| return dstElem.isInteger(8) || dstElem.isF16() || dstElem.isBF16(); |
There was a problem hiding this comment.
The check dstElem.isInteger(8) is too restrictive as it typically only matches signless 8-bit integers. The previous implementation used isI8Like(dstElem), which likely covers both signed and unsigned 8-bit integers (i8 and ui8). According to the manual (line 899), both i8 and ui8 should be supported here.
| return dstElem.isInteger(8) || dstElem.isF16() || dstElem.isBF16(); | |
| return isI8Like(dstElem) || dstElem.isF16() || dstElem.isBF16(); |
| return dstElem.isInteger(8) || dstElem.isF16() || dstElem.isBF16(); | ||
| if (!srcElem.isF32()) | ||
| return false; | ||
| return dstElem.isInteger(8) || dstElem.isF16() || dstElem.isBF16() || |
There was a problem hiding this comment.
Similar to the i32 source case, isInteger(8) should be replaced with isI8Like(dstElem) to correctly support both i8 and ui8 destination types as specified in the manual.
| return dstElem.isInteger(8) || dstElem.isF16() || dstElem.isBF16() || | |
| return isI8Like(dstElem) || dstElem.isF16() || dstElem.isBF16() || |
Codex Review该评论由 review 机器人自动更新。
Summary检查到 2 个问题:新增的低精度 Findings
这个 PR 新增了 A5
A5 vec |
4c5e02b to
cc66d99
Compare
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
cc66d99 to
417e9d7
Compare
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryPR #691 has three real issues: FP4 Findings
The three new low-precision types already declare
|
Signed-off-by: FangRui <fangrui_95@163.com>
Signed-off-by: FangRui <fangrui_95@163.com>
…verage Signed-off-by: FangRui <fangrui_95@163.com>
0c17c69 to
bf044f2
Compare
1、 alloc_tile tstore tload tcvt tprefetch op在a5上支持接受低精度类型
2、支持非mlir内置类型返回正确位宽