-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make JIT not assume that the device is CUDA. #54238
Conversation
💊 CI failures summary and remediationsAs of commit 70087ff (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
70953c2
to
11d3e80
Compare
@ezyang Could you please help review? Thanks. |
struct ArgumentInfo { | ||
friend struct ArgumentSpec; | ||
using plain_data_type = uint32_t; | ||
using plain_data_type = uint64_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, what's going on with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, struct sized increased
@@ -61,6 +55,8 @@ struct ArgumentInfo { | |||
int device_ : 8; // NOTE: this needs to be signed because we use -1 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of date now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the comments correspondingly.
uint32_t total_dims; // all TensorInfoPODs are in CompleteArgumentSpec's | ||
unsigned dev_type : 16; | ||
unsigned | ||
total_dims : 16; // all TensorInfoPODs are in CompleteArgumentSpec's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is narrower than the olde type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it large enough for holding the number of all the tensor operands' dimensions?
I will add overflow check when save the dimensions number to the CompleteArgumentInfoPOD::total_dims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have both device
and dev_type
(30 bits in total) while Aten Device only needs 16 bits. Are both necessary?
This seems OK but it should really get a review from the JIT side. @SplitInfinity, do you think you know who could look at this? |
Adding @ZolotukhinM @Krovatkin based on GitHub recommendation. I know they work on/used to work on that part of the codebase. If neither of you are the right reviewer, feel free to tag the right person and remove yourselves. |
3860a26
to
d99d232
Compare
@ZolotukhinM @Krovatkin Could you give comments for this PR? Thanks a lot. |
@ezyang & @SplitInfinity |
@gujinghui sorry for the delay, I'll take a look tomorrow!!! |
@Krovatkin Thanks for your reply. Could you give suggestion for this PR? Thanks. |
@chengjunlu pls help resolve the conflict. |
d99d232
to
70087ff
Compare
The changes has been rebased with the latest main trunk. |
Codecov Report
@@ Coverage Diff @@
## master #54238 +/- ##
=======================================
Coverage 76.47% 76.47%
=======================================
Files 1992 1992
Lines 199858 199866 +8
=======================================
+ Hits 152840 152850 +10
+ Misses 47018 47016 -2 |
@ezyang @Krovatkin |
I just pinged @Krovatkin; if there is still no action in a few days I'll just start landing this |
@ezyang could you help merge this PR? Thanks so much. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArgumentSpec seems to be an overly clever structure, which makes it a bit hard to reason about (at least for me) Since we are sticking with int64_t
as an element size, we should be okay modulo a few nitpicks
unsigned type_ : 8; | ||
unsigned dev_type_ : 16; | ||
unsigned : 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aten Device
needs 16 bits for both an index and device type here we seem to be using 8 + 16 = 24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DeviceType and DeviceIndex has been narrowed in the commit #47023
I will change the code to align the new size of the at::Device correspondingly.
uint32_t total_dims; // all TensorInfoPODs are in CompleteArgumentSpec's | ||
unsigned dev_type : 16; | ||
unsigned | ||
total_dims : 16; // all TensorInfoPODs are in CompleteArgumentSpec's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have both device
and dev_type
(30 bits in total) while Aten Device only needs 16 bits. Are both necessary?
@@ -271,6 +274,10 @@ struct CompleteArgumentSpec { | |||
} | |||
} | |||
// each POD has a running tally of all dimensions including its own | |||
TORCH_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we won't hit this limit. Practically, it could probably only happen with code-generated models with lots of arguments.
arg.device_ = t->is_cuda() ? t->get_device() : -1; | ||
at::Device device = t->device(); | ||
arg.dev_type_ = | ||
static_cast<std::underlying_type<DeviceType>::type>(device.type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be a widening (but unsigned cast), so we should be okay
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@Krovatkin merged this pull request in db90533. |
Summary: Decouple the JIT argument spec and shape analysis with CUDA. Pull Request resolved: pytorch#54238 Reviewed By: ngimel Differential Revision: D28802085 Pulled By: Krovatkin fbshipit-source-id: 4068c9460cdec2d80733f001ca90ea3f5e6d3a7e
Decouple the JIT argument spec and shape analysis with CUDA.