Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add dependency awareness to torch-trt partitioning #1304

Conversation

mfeliz-cruise
Copy link
Contributor

Adds a heuristic to torch-trt partitioning's segmentation to avoid materializing segments until we hit a dependency of that segment. This can significantly reduce the number of segments/engines in cases where the linear traversal of torchscipt nodes would otherwise produce alternating torch and TRT segments which are not dependent on each-other

Fixes # (issue)

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

  • This change requires a documentation update

  • My code follows the style guidelines of this project (You can use the linters)

  • I have performed a self-review of my own code

  • I have commented my code, particularly in hard-to-understand areas and hacks

  • I have made corresponding changes to the documentation

  • I have added tests to verify my fix or my feature

  • New and existing unit tests pass locally with my changes

  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@mfeliz-cruise mfeliz-cruise changed the title [feat] Add dependency awareness to torch-trt partitioning (#40) [feat] Add dependency awareness to torch-trt partitioning Aug 23, 2022
@github-actions github-actions bot added component: core Issues re: The core compiler component: partitioning component: tests Issues re: Tests labels Aug 23, 2022
@mfeliz-cruise
Copy link
Contributor Author

I think this can be handled more simply in future by directly partitioning on the dependency graph. This would require updating the min_block_size logic, but would remove the need to merge segments after the initial partition.

@narendasan
Copy link
Collaborator

@mfeliz-cruise we are currently working on a major restructuring of the partitioning phase to hopefully bring it closer to other design patterns in the project and make it easier to debug the state and develop new features (#1263). Could you try rebasing this work on that branch and point the PR to merge into partitioning_ctx?

@bowang007
Copy link
Collaborator

@mfeliz-cruise we are currently working on a major restructuring of the partitioning phase to hopefully bring it closer to other design patterns in the project and make it easier to debug the state and develop new features (#1263). Could you try rebasing this work on that branch and point the PR to merge into partitioning_ctx?

Looks like now there are still some errors on that branch.

@mfeliz-cruise
Copy link
Contributor Author

@mfeliz-cruise we are currently working on a major restructuring of the partitioning phase to hopefully bring it closer to other design patterns in the project and make it easier to debug the state and develop new features (#1263). Could you try rebasing this work on that branch and point the PR to merge into partitioning_ctx?

Looks like now there are still some errors on that branch.

I'll hold off for now until it stabilizes.

Adds a heuristic to torch-trt partitioning's segmentation to avoid materializing segments until we hit a dependency of that segment. This can significantly reduce the number of segments/engines in cases where the linear traversal of torchscipt nodes would otherwise produce alternating torch and TRT segments which are not dependent on each-other

Fixes # (issue)

Please delete options that are not relevant and/or add your own.

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing functionality to not work as expected)
- This change requires a documentation update

- [ ] My code follows the style guidelines of this project (You can use the linters)
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas and hacks
- [ ] I have made corresponding changes to the documentation
- [ ] I have added tests to verify my fix or my feature
- [ ] New and existing unit tests pass locally with my changes
- [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified
@mfeliz-cruise mfeliz-cruise force-pushed the michael.feliz/dependency_aware_partitioning branch from 1818813 to 3a33b6e Compare October 6, 2022 22:48
@mfeliz-cruise
Copy link
Contributor Author

@mfeliz-cruise we are currently working on a major restructuring of the partitioning phase to hopefully bring it closer to other design patterns in the project and make it easier to debug the state and develop new features (#1263). Could you try rebasing this work on that branch and point the PR to merge into partitioning_ctx?

Looks like now there are still some errors on that branch.

I'll hold off for now until it stabilizes.

I've rebased and should be ready for review.

@peri044
Copy link
Collaborator

peri044 commented Oct 11, 2022

Hello @mfeliz-cruise
I went through this PR and observed the test cases. I ran them and understand the final graphs you are trying to achieve post segmentation. However, I'm a little unclear on the code logic. Can you explain the heuristic and logic in a write-up ( with also some references to the modified test cases) ? Since this is advanced segmentation/merging, it would be nice for this write-up to serve as a reference for future.

@mfeliz-cruise
Copy link
Contributor Author

mfeliz-cruise commented Oct 11, 2022

Sure @peri044, do you have a standard place you put this kind of documentation or should I just expand the PR description?

@narendasan
Copy link
Collaborator

We keep documentation for contributors on the implementation of partitioning here: https://pytorch.org/TensorRT/contributors/partitioning.html#partitioning

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 11, 2022
@mfeliz-cruise
Copy link
Contributor Author

We keep documentation for contributors on the implementation of partitioning here: https://pytorch.org/TensorRT/contributors/partitioning.html#partitioning

I've taken a first pass at documenting this in docsrc/contributors/partitioning.rst

@peri044
Copy link
Collaborator

peri044 commented Oct 12, 2022

  • How does the merge adjacent segments work ? What's the criteria to merge ?

@mfeliz-cruise
Copy link
Contributor Author

@peri044 peri044 merged commit 6f30e4b into pytorch:master Nov 2, 2022
@peri044
Copy link
Collaborator

peri044 commented Nov 2, 2022

Hello @mfeliz-cruise ,
I removed this snippet and ran the tests and they ran successfully. I'm wondering what is the usecase behind this snippet ?

From my understanding this is written for in-place ops.
For eg: In the sample graph

%2 = aten::cat(%1)
%2 = aten::append(%2, %3)
%4 = aten::relu
For n = aten::append, use.user would be %2 (aten::cat output) which would not be isAfter(n) correct ?
Do you have an example in mind which uses this ?

@mfeliz-cruise
Copy link
Contributor Author

Hello @mfeliz-cruise , I removed this snippet and ran the tests and they ran successfully. I'm wondering what is the usecase behind this snippet ?

From my understanding this is written for in-place ops. For eg: In the sample graph

%2 = aten::cat(%1) %2 = aten::append(%2, %3) %4 = aten::relu For n = aten::append, use.user would be %2 (aten::cat output) which would not be isAfter(n) correct ? Do you have an example in mind which uses this ?

It would be a case like this #1018 where we have an op that modifies its input without producing the modified value:
= aten::_set_item(%out_dict.1, %3, %x.1)
%z.1 : Tensor = aten::__getitem__(%out_dict.1, %3)

Here %out_dict.1 is modified by _set_item and we should recognize that this makes aten::_getitem__ a dependent of the set. If we only look at the node outputs here we would not identify this relationship.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: core Issues re: The core compiler component: partitioning component: tests Issues re: Tests documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants