You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pack-peel pipeline, matmul (m=n=1024, k=512) followed by addition of a bias (1-d vector with 1024 values) results in the final allocations (I've renamed the SSA values for clarity).
This seems like it would be more inline with how GPU abstraction works (I'm thinking of OpenCL kernels). I think there shouldn't ever be a contiguous block of memory representing all data memories IMO.
The text was updated successfully, but these errors were encountered:
The pack-peel pipeline, matmul (m=n=1024, k=512) followed by addition of a bias (1-d vector with 1024 values) results in the final allocations (I've renamed the SSA values for clarity).
The above is for a design using a 2x2 array of AIE cores. The IR contains a loop over the 2x2 cores, indexing into arrays as follows
For A and B, a view into shared memory is copied to the entire local buffer for A and B. For C, a slice of the local buffer is taken.
I find this very confusing, and think it would be much better if C was already 'privatized' per core, so that instead of
the allocation was
and then it would effectively just be
This seems like it would be more inline with how GPU abstraction works (I'm thinking of OpenCL kernels). I think there shouldn't ever be a contiguous block of memory representing all data memories IMO.
The text was updated successfully, but these errors were encountered: