-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalarization transformation #2499
Comments
First step - on Reference: Find next (and previous?) Reference to this symbol. |
The VariableAccess information already contains all accesses in order. |
Yeah - the plan is to use that data with this transformation. |
@hiker I'm a bit confused about how to use the VariablesAccessInfo - is there a linkage between VariablesAccessInfo and the node for a given read/write access? E.g. For a given routine if I wanted to find (in order) all the accesses/dependencies on a given symbol (slash signature) can I do that with the VariablesAccessInfo? I can find the sequence of reads/writes but if I wanted to refer back to the relevant Ah I guess its |
Yes :) I saw the comments in the wrong order, and commented elsewhere :) |
(Towards #2499) Initial implementation of next_access function on Reference
@sergisiso If the next access to an array reference (that is otherwise a potential target for scalarization) - if its contained within an IfBlock that isn't also an ancestor of the Loop I'm "scalarizing" I will just ignore it rather than dealing with the if condition - unless you think we should specifically try to handle if blocks here? |
Also I realise that I probably need to be careful with
will point to the LHS of the assignment, so I should also check the RHS of the assignment in this case for scalarization. |
In parts of the physics codes for LFRiC we come across loop patterns such as this:
Once we inline and fuse this loop structure, we get loops like this:
For cases such as this,
temp_in
andtemp
can be scalarised providing that nothing outside the loop depends on their values (which would already be a strange implementation choice, since it would only be for the final value ofi
). This would help us remove some false dependencies, as there is a write-write dependency ontemp(l)
if we usecollapse
on this loop, however these are not necessary sincetemp
can just be a local scalar instead.The goal of this transformation would be to take code like the above (post all the other inline and loop fusion transformations) and generate:
At this point, we can apply
target
+loop
withcollapse
which will lead to less kernel launches and synchronization, and probably better performance on GPU.The text was updated successfully, but these errors were encountered: