-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Title:
Tail Split Required for Dynamic Symbolic Application in Last Dynamic to Avoid Vector Load Blockage, Leading to Significant Performance Drop
Description:
Ref to PR #191 In the current implementation, when applying dynamic symbolic values into the last dynamic iteration, there is no tail split. This omission leads to vector load blockage, which significantly drops performance.
Problem:
- The lack of tail splitting when processing the last dynamic iteration causes vector loads to be blocked.
- This directly impacts the kernel’s performance, resulting in considerable performance degradation.
Affected Code:
The following snippet demonstrates where the issue arises, specifically in the loading of data into shared memory and processing:
if (0 < k) {
// Vector load blocked here due to lack of tail split
...
}Suggested Solution:
To resolve this issue, a tail split should be applied when handling the last dynamic iteration. This will ensure that vector loads are not blocked, improving performance.
Impact:
Without the tail split, the performance of the kernel is severely impacted, particularly for cases where dynamic symbolic values are used. Implementing this solution will help maintain optimal performance, especially for vector loads.