This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The reason will be displayed to describe this comment to others. Learn more.
Do we want to ignore this directive if the metadata exists, but we're targeting a pre-sm_90 GPU?
It may be useful for non-clang LLVM users (e.g XLA) to be able to always specify launch bounds metadata, and let LLVM decide on what it can do with it. Generating the directive for older GPUs would result in ptxas error, while ignoring it would still allow the kernels to compile and work, the same as would be the case if the metadata was correctly absent. I don't think there's not much point to require users to jump through more hoops just to achieve exactly the same result.
The reason will be displayed to describe this comment to others. Learn more.
We do not have a good way to issue any diagnostics from LLVM, so the choice would be to either reject the IR as invalid, or make an effort to compile to valid PTX. Right now we're neither here nor there.
I'd be fine with either of the options above. That said, ignoring metadata which we can't apply seems OK to me.
The reason will be displayed to describe this comment to others. Learn more.
I've talked to @alinas who has more experience dealing with IR and she also thinks that ignoring maxclusterrank metadata on older GPUs is the right choice here.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to ignore this directive if the metadata exists, but we're targeting a pre-sm_90 GPU?
It may be useful for non-clang LLVM users (e.g XLA) to be able to always specify launch bounds metadata, and let LLVM decide on what it can do with it. Generating the directive for older GPUs would result in ptxas error, while ignoring it would still allow the kernels to compile and work, the same as would be the case if the metadata was correctly absent. I don't think there's not much point to require users to jump through more hoops just to achieve exactly the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right,
ptxasreacts to a sample with.maxclusterrankwith pre SM_90 with a hard error:Do I understand you right, that you'd like to see a check similar to what we do in SemaDeclAttr and filter out the directive on targets < SM_90?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have a good way to issue any diagnostics from LLVM, so the choice would be to either reject the IR as invalid, or make an effort to compile to valid PTX. Right now we're neither here nor there.
I'd be fine with either of the options above. That said, ignoring metadata which we can't apply seems OK to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've talked to @alinas who has more experience dealing with IR and she also thinks that ignoring
maxclusterrankmetadata on older GPUs is the right choice here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in: 261840a