fix: Allow full model compilation with collection outputs #1599

gs-olive · 2023-01-19T22:10:52Z

Description

Update graph-building in compiler to account for case where all operations are supported by Torch-TRT, but the output is a collection
Enable 'psuedo-partitioning' for nearly-fully-compiled models for which the only non-supported aspect of the model is the format of the output (TRT cannot output complex collections)
Define a small subset of operation schemas which are allowed despite the flag require_full_compilation. These operations are packing and unpacking of Tuples/Lists, and some are already used in cases of require_full_compilation
Display warnings to users if any portion of the pseudo-partitioning is unexpected, for example the model being partitioned ends up in more than 3 segments (maximally - a Torch segment to preprocess collection inputs, a TRT segment to perform model logic, a Torch segment to post-process collection outputs) or if schemas falling outside of the collection subset are encountered in a Torch segment
Add end-to-end test case with minimal reproducing example of a failing model, repaired with the changes to the compiler
Add minor fix to lowering to remediate c++ compiler warning

This fix was designed to minimally alter the existing phases of model conversion and does not manually flatten/reconstruct complex collection outputs, but instead uses the existing partitioning infrastructure and engine-stitching paradigm to accomplish this.

Fixes #1598
Fixes #1368

Type of change

Bug fix (non-breaking change which fixes an issue)
MVP for New feature

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

core/compiler.cpp

narendasan

Looks mostly fine, just some ux and dev stuff

core/compiler.cpp

core/partitioning/partitioning.h

gs-olive · 2023-01-23T20:10:20Z

Thanks for the comments and review @narendasan - I have incorporated the feedback and updated two of the user warnings to compilation-halting errors.

One note I wanted to make is that despite the min_block_size=1 and allowing collection-type-nodes to run in Torch, this implementation still respects full compilation and will not execute intermediate pack/unpack operations in Torch. This is because prim::TupleUnpack and other such operators are not automatically added to torch_executed_ops - this is only done in the case where input_signature is used, which is not the intent of this PR (it will be a future PR). As a result, only the collection ops needed to pack the final model output are run in Torch, as per this function:

TensorRT/core/partitioning/partitioning.cpp

Lines 31 to 48 in 86a998e

    
           // Check if the inputs and outputs of the graph are Tensor. If not, then fallback connected nodes 
        
           void setInputsOutputsConnectedNodes(PartitioningCtx* ctx, torch::jit::Block* block) { 
        
             // fallback nodes that produce entire graph's nonTensor output 
        
             for (auto i : block->outputs()) { 
        
               if (!isTensor(i)) { 
        
                 ctx->setNodeExecutorDecision(i->node(), NodeExecutorDecision::kNON_TENSOR); 
        
               } 
        
             } 
        
             // fallback nodes that consume entire graph's nonTensor input 
        
             for (auto i : block->inputs()) { 
        
               if (!isTensor(i)) { 
        
                 for (auto use : i->uses()) { 
        
                   ctx->setNodeExecutorDecision(use.user, NodeExecutorDecision::kNON_TENSOR); 
        
                 } 
        
               } 
        
             } 
        
           }

Any intermediate packing/unpacking is handled by the evaluators and does not cause a graph segmentation, since those nodes are not directly graph outputs.

core/compiler.cpp

narendasan · 2023-02-01T19:26:22Z

core/partitioning/partitioning.cpp

+  // executed in TRT, regardless of the size of the graph
+  if (expect_full_compilation) {
+    // If minimum block size is different from the default, the user must have specified it
+    if (ctx->settings.min_block_size != 3) {


Create an issue to centralize defaults somewhere in the core

What if a user sets min_block_size =3 as well? Does he still get the warning message?

No, the user would not get a warning message in that case. We currently don't have a way of knowing whether the user inputs a value or not, since the defaults are not centralized. There is an issue #1644 to address this, but as of now, your statement is correct. Additionally, it is worth noting that prior to this PR, if a user specified min_block_size and require_full_compilation=True, we would still ignore the min_block_size, but without warning.

core/compiler.cpp

narendasan

LGTM, @bowang007 can you take a look?

bowang007

LGTM

bowang007 · 2023-02-06T22:17:17Z

core/partitioning/partitioning.cpp

+  // executed in TRT, regardless of the size of the graph
+  if (expect_full_compilation) {
+    // If minimum block size is different from the default, the user must have specified it
+    if (ctx->settings.min_block_size != 3) {


What if a user sets min_block_size =3 as well? Does he still get the warning message?

bowang007 · 2023-02-06T22:58:51Z

core/compiler.cpp

+           (!(cfg.lower_info.forced_fallback_modules.size() == 0 &&
+              cfg.partitioning_info.forced_fallback_operators.size() == 0 && isBlockConvertible) ||
+            outputIsCollection || user_requested_long)) ||
+          requires_collection_handling) {


Could this If statement be optimized? Seems like isBlockConvertible and outputIsCollection overlap with require_collection_handling

I've updated this statement to make the conditions clearer, by removing the ! and distributing it over the conditionals inside. Other than this, the statement cannot be reduced any further since the requires_collection_handling boolean is independent of cfg.partitioning_info.enabled (since we want to partition in this case regardless of require_full_compilation=True)

bowang007 · 2023-02-07T00:44:48Z

core/compiler.cpp

+
+    // If full compilation is expected, cannot have more than 2 Torch segments
+    // (one for preprocessing inputs, one for post-processing outputs) and 1 TRT segment
+    if (expect_full_compilation && !(num_torch_segments <= 2 && num_trt_segments == 1)) {


Do we have edge cases like 2 torch_segments for inputs/outputs? Does merge_adjacent_segments_of_same_type always merge them into one?

There should not be a case where multiple Torch segments appear for inputs/outputs, since merge_adjacent_segments_of_same_type addresses this case, as you had mentioned. Since the tensors in question are inputs, it should not arise that segment.do_not_merge() is True, since the only approved operators falling into these segments are for collection construction, and only the prim::If or prim::Loop operators can induce a non-merge situation.

gs-olive · 2023-03-08T23:35:25Z

core/compiler.cpp

+      if ((cfg.partitioning_info.enabled &&
+           (cfg.lower_info.forced_fallback_modules.size() != 0 ||
+            cfg.partitioning_info.forced_fallback_operators.size() != 0 || !isBlockConvertible || outputIsCollection ||
+            user_requested_long)) ||
+          requires_collection_handling) {


Note the updates to the conditional logic to make it $$(cfg.partitioning\_info.enabled \wedge (x \vee y \vee z ... )) \vee requires\_collection\_handling$$

- Update graph-building in compiler to account for case where all operations are supported by Torch-TRT, but the output is a collection. - Enable 'psuedo-partitioning' for nearly-fully-compiled models for which the only non-supported aspect of the model is the format of the output (TRT cannot output complex collections) - Define a small subset of operation schemas which are allowed despite the flag `require_full_compilation`. These operations are packing and unpacking of Tuples/Lists, and some are already used in cases of `require_full_compilation` - Display warnings to users if any portion of the `pseudo-partitioning` is unexpected, for example the model being partitioned ends up in more than 3 segments (maximally - a Torch segment to preprocess collection inputs, a TRT segment to perform model logic, a Torch segment to post-process collection outputs) or if schemas falling outside of the collection subset are encountered in a Torch segment - Add end-to-end test case with minimal reproducing example of a failing model, repaired with the changes to the compiler - Add minor fix to lowering to remediate c++ compiler warning

- Add function to check the equivalence of two collection-based outputs for comparison across Torch-TRT and Torch outputs - Improved test robustness in end-to-end to check for equivalent output schemas in addition to successful compilation

- Add test case to elicit behavior where full compilation is requested but TRT engine size falls below default `min_block_size=3` - Move `min_block_size` condition to narrow scope - Coalesce logic to improve code readability

gs-olive · 2023-03-13T18:21:53Z

core/compiler.cpp

+      // Partitioning is required if:
+      // 1. User requested some modules/operators fallback
+      // 2. The block (graph) cannot be converted due to operator coverage
+      // 3. The output of the graph is a collection
+      // 4. The user requested a non-TRT data type input
+      auto isPartitioningRequired =
+          (isFallbackRequested || !isBlockConvertible || outputIsCollection || user_requested_long);


Coalesced partitioning logic for readability

gs-olive · 2023-03-13T18:28:09Z

core/compiler.cpp

+bool userRequestedFallback(CompileSpec& cfg) {
+  return cfg.lower_info.forced_fallback_modules.size() != 0 ||
+      cfg.partitioning_info.forced_fallback_operators.size() != 0;
+}


Added helper function to determine if the user's input specifications imply fallback

narendasan

LGTM

bowang007

Looks great, thanks!

gs-olive requested a review from narendasan January 19, 2023 22:10

gs-olive self-assigned this Jan 19, 2023

facebook-github-bot added the cla signed label Jan 19, 2023

github-actions bot added component: core Issues re: The core compiler component: lowering Issues re: The lowering / preprocessing passes component: partitioning component: tests Issues re: Tests labels Jan 19, 2023

github-actions bot requested review from bowang007 and peri044 January 19, 2023 22:11

gs-olive force-pushed the collection_support_mvp branch from a1dd53c to 655ce22 Compare January 19, 2023 22:40

gs-olive commented Jan 20, 2023

View reviewed changes

core/compiler.cpp Outdated Show resolved Hide resolved

gs-olive commented Jan 20, 2023

View reviewed changes

core/compiler.cpp Outdated Show resolved Hide resolved

gs-olive force-pushed the collection_support_mvp branch from ef828ad to 08bb906 Compare January 20, 2023 21:55

gs-olive mentioned this pull request Jan 20, 2023

✨[Feature] Remove requirement for require_full_compilation=False when using input_signature #1602

Closed

narendasan reviewed Jan 22, 2023

View reviewed changes

core/compiler.cpp Outdated Show resolved Hide resolved

core/compiler.cpp Outdated Show resolved Hide resolved

core/compiler.cpp Outdated Show resolved Hide resolved

core/partitioning/partitioning.h Outdated Show resolved Hide resolved

gs-olive force-pushed the collection_support_mvp branch from 08bb906 to 17753fc Compare January 23, 2023 19:40

gs-olive requested a review from narendasan January 23, 2023 21:52

gs-olive force-pushed the collection_support_mvp branch from f43ef07 to 2b35c09 Compare January 28, 2023 06:31

narendasan requested changes Feb 1, 2023

View reviewed changes

core/compiler.cpp Outdated Show resolved Hide resolved

core/compiler.cpp Outdated Show resolved Hide resolved

gs-olive mentioned this pull request Feb 1, 2023

✨[Feature] Centralize defaults for compilation within core #1644

Closed

gs-olive force-pushed the collection_support_mvp branch from 2b35c09 to 1209225 Compare February 1, 2023 23:42

gs-olive requested a review from narendasan February 1, 2023 23:43

narendasan approved these changes Feb 2, 2023

View reviewed changes

This was referenced Feb 7, 2023

🐛 [Bug] Support for modules with multiple outputs seems broken in v1.2.0 #1368

Closed

fix: Allow full model compilation with collection inputs (input_signature) #1656

Merged

peri044 removed their request for review February 13, 2023 19:10

gs-olive assigned bowang007 and unassigned gs-olive Feb 28, 2023

This was referenced Feb 28, 2023

🐛 [Bug] Differing behavior for fully-supported models with collection outputs, when toggling require_full_compilation #1598

Closed

✨[Feature] Collection IO end to end support for TRT engine #1293

Closed

bowang007 approved these changes Mar 7, 2023

View reviewed changes

gs-olive force-pushed the collection_support_mvp branch from 1209225 to 91306c9 Compare March 8, 2023 23:02

gs-olive commented Mar 8, 2023

View reviewed changes

gs-olive requested a review from bowang007 March 10, 2023 01:48

gs-olive added 3 commits March 13, 2023 10:34

fix: Add test case, improve logic clarity in compiler

00f1a3a

- Add test case to elicit behavior where full compilation is requested but TRT engine size falls below default `min_block_size=3` - Move `min_block_size` condition to narrow scope - Coalesce logic to improve code readability

gs-olive force-pushed the collection_support_mvp branch from 91306c9 to 00f1a3a Compare March 13, 2023 17:35

gs-olive commented Mar 13, 2023

View reviewed changes

narendasan approved these changes Mar 13, 2023

View reviewed changes

bowang007 approved these changes Mar 13, 2023

View reviewed changes

gs-olive merged commit d0af394 into pytorch:main Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Allow full model compilation with collection outputs #1599

fix: Allow full model compilation with collection outputs #1599

gs-olive commented Jan 19, 2023 •

edited

narendasan left a comment

gs-olive commented Jan 23, 2023 •

edited

narendasan Feb 1, 2023

bowang007 Feb 6, 2023

gs-olive Mar 8, 2023 •

edited

narendasan left a comment

bowang007 left a comment

bowang007 Feb 6, 2023

bowang007 Feb 6, 2023

gs-olive Mar 8, 2023 •

edited

bowang007 Feb 7, 2023

gs-olive Mar 8, 2023

gs-olive Mar 8, 2023

gs-olive Mar 13, 2023

gs-olive Mar 13, 2023 •

edited

narendasan left a comment

bowang007 left a comment

fix: Allow full model compilation with collection outputs #1599

fix: Allow full model compilation with collection outputs #1599

Conversation

gs-olive commented Jan 19, 2023 • edited

Description

Type of change

Checklist:

narendasan left a comment

Choose a reason for hiding this comment

gs-olive commented Jan 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gs-olive Mar 8, 2023 • edited

Choose a reason for hiding this comment

narendasan left a comment

Choose a reason for hiding this comment

bowang007 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gs-olive Mar 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gs-olive Mar 13, 2023 • edited

Choose a reason for hiding this comment

narendasan left a comment

Choose a reason for hiding this comment

bowang007 left a comment

Choose a reason for hiding this comment

gs-olive commented Jan 19, 2023 •

edited

gs-olive commented Jan 23, 2023 •

edited

gs-olive Mar 8, 2023 •

edited

gs-olive Mar 8, 2023 •

edited

gs-olive Mar 13, 2023 •

edited