-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNC] Replace ComputeAt #55933
Comments
Some discussions related to this: #55853 (comment) |
Why not have the following instead of the desired form above? @ZolotukhinM
|
The option you brought up is what This example just shows that something might be impossible to achieve without |
I get the tradeoff that A more representative example would be:
This is exactly what we have after fusion and buffer compression. Is there any other example for |
It is very similar in the effect, but an important difference is that
Yes, let's look at an example when we need two rows: Original:
After
In the original example we used two columns and one row - that's why to demonstrate the effect of compute-at I needed to apply it to the innermost loop (but in that case as you noted it was similar to compute-inline, but that was a red herring). |
Right, that is the perfect example here. The key challenge is that we need a transformation that introduces redundancy. Here is the reason for this redundancy. Consider the element |
The first step in transforming the code is to fuse loops So, we need to transform the input code to the following:
This changes the iteration space of After this we can fuse the two outer loops and perform buffer compression to get to the desired result. |
To clarify further about the transformation described above: it can be seen as split with overlapping iteration spaces and amount of overlap will determine the size of the redundancy being introduced. |
One question I have is that whether making it a separate transformation brings any value in the end. It feels that whatever we do it only will be useful in compute_at - i.e. we don't generally want to introduce this redundancy, we only want to do that for elements used in one iteration in the consumer stmt (consumer stmt: |
This particular transformation will likely be used only in the context of I have not looked at |
Yeah, loop fusion and buffer compression make sense to me as separate transformation. What I suggest calling the transformation that would introduce this redundancy "compute_at" - it will not try to fuse, and it will not try to compress buffers. But it will be responsible for inserting proper loops at the proper place. E.g. Original:
After
After
After
After
|
Btw, that reminded me about more transformations that we might want to add: transformations to change buffers layout. E.g. in my last example it might be useful to swap |
While I don't mind this approach, here is what I had in mind: Input
After splitting i1 iteration space into 2D with overlap of 2: splitWithOverlap?
After fusing i1 and i2
Buffer compression
|
@ZolotukhinM Here are the steps to perform Input
ComputeAt(A, j2) will include the following steps:
|
Yes, it will work for this case. However, I still don't think it's a right decomposition, and the reason for that is that the first transformation really needs to be "compute whatever part of the producer is used in the consumer statement". In this example, it is
|
Actually, However, I am okay with going along with your approach that you mentioned earlier (by "computing at" without fusing / buffer compression). |
It is then possible that we just call the same thing different names :) |
Given the input IR:
we need a transformation that gets to:
This should then be able to replace
ComputeAt
.The text was updated successfully, but these errors were encountered: