-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ROCm ❤ TensorExpr #45506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm ❤ TensorExpr #45506
The head ref may contain hidden characters: "ROCm_\u2764_TensorExpr"
Conversation
💊 CI failures summary and remediationsAs of commit 71bf92a (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by 🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
22ef336
to
e80289b
Compare
e80289b
to
aff3290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@t-vi yeah, uh we don't limit the blockDim in the Cuda backend, we just know that (currently) we limit the thread loop to 512 elements. A definite TODO, but for now might make sense to disable it for ROCm and come back to it. |
Doesn't Cuda have bounds, too?
I thought 1024 was the block size limit there.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give the constant 128 and 1024 a name here. So people know its meaning.
Also if possible, please include a reference/link where they come from, so future developers know how to update them. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would describing them in the comment more explicitly work or do you prefer a #define
or somesuch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a "static const int kBlcokSizeLimit..." or something like that. I think having both comments and variable names will be even more helpful for future developers. But it is up to you. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the new commentary, it should work. It's a much better comment now, thank you for suggesting that it needed improvement.
What do you think?
a63b39c
to
f92f089
Compare
f92f089
to
a3dac27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@Krovatkin merged this pull request in 22a34bc. |
@Krovatkin merged this pull request in 22a34bc. |
This might be an alternative to reverting #45396 .
The obvious rough edge is that I'm not really seeing the work group limits that TensorExpr produces.