Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JIT not assume that the device is CUDA. #54238

Closed
wants to merge 1 commit into from

Conversation

chengjunlu
Copy link
Contributor

Decouple the JIT argument spec and shape analysis with CUDA.

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Mar 18, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 18, 2021

💊 CI failures summary and remediations

As of commit 70087ff (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@chengjunlu chengjunlu force-pushed the decouple_jit_with_cuda branch 2 times, most recently from 70953c2 to 11d3e80 Compare March 18, 2021 06:06
@gujinghui
Copy link
Collaborator

@ezyang Could you please help review? Thanks.

struct ArgumentInfo {
friend struct ArgumentSpec;
using plain_data_type = uint32_t;
using plain_data_type = uint64_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, what's going on with this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, struct sized increased

@@ -61,6 +55,8 @@ struct ArgumentInfo {
int device_ : 8; // NOTE: this needs to be signed because we use -1 to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the comments correspondingly.

uint32_t total_dims; // all TensorInfoPODs are in CompleteArgumentSpec's
unsigned dev_type : 16;
unsigned
total_dims : 16; // all TensorInfoPODs are in CompleteArgumentSpec's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is narrower than the olde type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it large enough for holding the number of all the tensor operands' dimensions?
I will add overflow check when save the dimensions number to the CompleteArgumentInfoPOD::total_dims.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have both device and dev_type (30 bits in total) while Aten Device only needs 16 bits. Are both necessary?

@ezyang
Copy link
Contributor

ezyang commented Mar 19, 2021

This seems OK but it should really get a review from the JIT side. @SplitInfinity, do you think you know who could look at this?

@SplitInfinity
Copy link

Adding @ZolotukhinM @Krovatkin based on GitHub recommendation. I know they work on/used to work on that part of the codebase. If neither of you are the right reviewer, feel free to tag the right person and remove yourselves.

@chengjunlu chengjunlu force-pushed the decouple_jit_with_cuda branch 2 times, most recently from 3860a26 to d99d232 Compare March 26, 2021 06:42
@gujinghui
Copy link
Collaborator

@ZolotukhinM @Krovatkin Could you give comments for this PR? Thanks a lot.

@gujinghui
Copy link
Collaborator

@ezyang & @SplitInfinity
Could you help us to get any feedback? Many thanks.

@Krovatkin
Copy link
Contributor

@gujinghui sorry for the delay, I'll take a look tomorrow!!!

@gujinghui
Copy link
Collaborator

@gujinghui sorry for the delay, I'll take a look tomorrow!!!

@Krovatkin Thanks for your reply. Could you give suggestion for this PR? Thanks.

@gujinghui
Copy link
Collaborator

@chengjunlu pls help resolve the conflict.

@chengjunlu
Copy link
Contributor Author

The changes has been rebased with the latest main trunk.

@codecov
Copy link

codecov bot commented May 17, 2021

Codecov Report

Merging #54238 (70087ff) into master (71f4c5c) will increase coverage by 0.00%.
The diff coverage is 45.45%.

@@           Coverage Diff           @@
##           master   #54238   +/-   ##
=======================================
  Coverage   76.47%   76.47%           
=======================================
  Files        1992     1992           
  Lines      199858   199866    +8     
=======================================
+ Hits       152840   152850   +10     
+ Misses      47018    47016    -2     

@gujinghui
Copy link
Collaborator

@ezyang @Krovatkin
The PR is rebased. Could you help review it? Thanks a lot.

@ezyang
Copy link
Contributor

ezyang commented May 18, 2021

I just pinged @Krovatkin; if there is still no action in a few days I'll just start landing this

@gujinghui
Copy link
Collaborator

@ezyang could you help merge this PR? Thanks so much. :)

Copy link
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

ArgumentSpec seems to be an overly clever structure, which makes it a bit hard to reason about (at least for me) Since we are sticking with int64_t as an element size, we should be okay modulo a few nitpicks

unsigned type_ : 8;
unsigned dev_type_ : 16;
unsigned : 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aten Device needs 16 bits for both an index and device type here we seem to be using 8 + 16 = 24?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeviceType and DeviceIndex has been narrowed in the commit #47023
I will change the code to align the new size of the at::Device correspondingly.

uint32_t total_dims; // all TensorInfoPODs are in CompleteArgumentSpec's
unsigned dev_type : 16;
unsigned
total_dims : 16; // all TensorInfoPODs are in CompleteArgumentSpec's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have both device and dev_type (30 bits in total) while Aten Device only needs 16 bits. Are both necessary?

@@ -271,6 +274,10 @@ struct CompleteArgumentSpec {
}
}
// each POD has a running tally of all dimensions including its own
TORCH_CHECK(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we won't hit this limit. Practically, it could probably only happen with code-generated models with lots of arguments.

arg.device_ = t->is_cuda() ? t->get_device() : -1;
at::Device device = t->device();
arg.dev_type_ =
static_cast<std::underlying_type<DeviceType>::type>(device.type());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a widening (but unsigned cast), so we should be okay

@facebook-github-bot
Copy link
Contributor

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Krovatkin merged this pull request in db90533.

deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Decouple the JIT argument spec and shape analysis with CUDA.

Pull Request resolved: pytorch#54238

Reviewed By: ngimel

Differential Revision: D28802085

Pulled By: Krovatkin

fbshipit-source-id: 4068c9460cdec2d80733f001ca90ea3f5e6d3a7e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants