Ref to PR #214.
The scheduling template in TL is quite complex because, for different tensor arguments, we need to implement separate TL templates. This complexity arises because TL cannot automatically remove unused tensor arguments, and the tensor arguments cannot be extended as flexibly as needed.
Expected Behavior:
# Given Program
@T.prim_func
def main(A, B, Scale, C):
if with_scale:
f(A, B, Scale, C)
else:
f(A, B, C)
If with_scale is set to False, the Scale argument should be removed from the function's argument list.
### Tasks
- [x] Test Case
- [x] Implement TL.Simplify Pass