-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLO iota should fuse with other ops and not (often) require materialization into memory. #13745
Comments
@benvanik Are you working on this or needs an owner? |
Needs an owner. |
@jpienaar @julianwa @mattwalsh Can we find an owner for this? Or drop to P2? |
This is relevant to LLM memory usage and important for that effort. In some models it can be several hundred MB of cumulative iota allocations (from things like |
Strange. This should work out of the box. |
So
by itself get fused
So this might be an issue of multiple uses... #13747 is probably going to help (I do get a lot better IR with that).... Have to resolve issues with landing that PR |
Ah yes, multiple uses would cause this for sure - in a model with multiple topks I bet the iota is getting CSEd. |
Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated `linalg.indexed_generic` were around it would still be a copy-like op). Fixes iree-org#13745
Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op). Fixes #13745
Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op). Fixes iree-org#13745
Seeing this input HLO with iota and broadcast/etc that don't get fused, sometimes leading up to sorts and sometimes directly being inserted into tensors/etc:
->
->
The iota should definitely end up with the broadcast, but should probably even end up in the subsequent sort/consumer as in most cases iota should be something we can derive from workgroup/distribution and not something we need to materialize in memory.
Full reproducer with two such iotas in #13729. Most LLMs also do this though and #13729, #13637, and #13648 all share this pattern to varying degrees.
The text was updated successfully, but these errors were encountered: