You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The picture below depicts a concat node that joins nodes on dimension number 1. The IR that we generate for this code is inefficient for two reasons. First, we can't optimize the operator that writes the result because the result is scattered across the 2st dimension (dim zero is the first). And second, we emit a sequence of insert_tensor instructions that process the tensor several times invalidating cache. A much better way would be to represent this as dim-0 concat followed by a transpose.
Design question: I am not sure if this should be the canonical representation, the only representation or simply a target specific optimization.
The text was updated successfully, but these errors were encountered:
The picture below depicts a concat node that joins nodes on dimension number 1. The IR that we generate for this code is inefficient for two reasons. First, we can't optimize the operator that writes the result because the result is scattered across the 2st dimension (dim zero is the first). And second, we emit a sequence of insert_tensor instructions that process the tensor several times invalidating cache. A much better way would be to represent this as dim-0 concat followed by a transpose.
Design question: I am not sure if this should be the canonical representation, the only representation or simply a target specific optimization.
The text was updated successfully, but these errors were encountered: