fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback #1549

gs-olive · 2022-12-14T04:58:55Z

Description

Fix compilation error for GPT-2 model arising from Byte-type inputs fed into TensorRT Engine
Update translation dictionary between Torch and TensorRT types to include at::kByte
Add field to PartitioningInfo specifying whether to cast Int8 inputs to TensorRT Engines to Int, to avoid error arising from Int8 inputs being fed into non-quantized engines
Add automatic detection of quantized/calibrated models and disable Int8 => Int32 casting in those cases
Fix bug where LoweringInfo target device was not being updated for Python API
Allow castNode to force creation of a new node and avoid searching for an existing one to convert
Add test to ensure cast is inserted in the Torch engine preceding a TensorRT engine, when the Byte tensor is an output of the Torch engine

Error displayed when passing Int8 inputs to non-quantized TRT Engine:

ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: input_0: input/output with DataType Int8 in network without Q/DQ layers must have dynamic range set when no calibrator is used.
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [network.cpp::validate::2772] Error Code 4: Internal Error (DataType does not match TensorFormats.)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

With this PR, GPT-2 now compiles and runs inference successfully.

Fixes #1455

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

core/partitioning/partitioninginfo/PartitioningInfo.h

- Fix compilation error for GPT-2 model arising from Byte-type inputs fed into TensorRT Engine - Update translation dictionary between Torch and TensorRT types to include `at::kByte` - Add field to PartitioningInfo specifying whether to cast Int8 inputs to TensorRT Engines to Int, to avoid error arising from Int8 inputs being fed into non-quantized engines - Add automatic detection of quantized/calibrated models and disable Int8 => Int32 casting in those cases - Fix bug where LoweringInfo target device was not being updated for Python API - Allow `castNode` to force creation of a new node and avoid searching for an existing one to convert - Add test to ensure cast is inserted in the Torch engine preceding a TensorRT engine, when the Byte tensor is an output of the Torch engine

peri044 · 2022-12-21T23:19:36Z

core/partitioning/shape_analysis.cpp

+    if (partitioning_info.truncate_long_and_double) {
+      for (size_t i = 0; i < seg_block.inputs().size(); ++i) {
+        if (ivalues_maps[seg_block.raw_inputs()[i]].isTensor()) {
+          auto cur_ivalue = ivalues_maps[seg_block.raw_inputs()[i]];
+          at::ScalarType t = cur_ivalue.toTensor().scalar_type();
+          if (t == at::kLong) {
+            // we add a cast operation to cast the type to Int64
+            auto cast_node = createCastNode(seg_block, i, true, target_device);
+            seg_block.g()->prependNode(cast_node);
+            seg_block.inputs()[i]->replaceAllUsesAfterNodeWith(cast_node, cast_node->outputs()[0]);
+          }
        }
      }


Is this just linter formatting changes?

I manually made the formatting changes to reduce redundancy of if statements, but they should be functionally equivalent to the previous version

core/partitioning/shape_analysis.cpp

- Address review comments - Improve documentation and logging messages - Restructure casting function to allow for casting of variable data types - Add casting for `at::kByte` segment block inputs as well as segment block outputs

peri044

LGTM

gs-olive requested a review from bowang007 December 14, 2022 04:58

facebook-github-bot added the cla signed label Dec 14, 2022

github-actions bot added component: api [Python] Issues re: Python API component: api [C++] Issues re: C++ API component: core Issues re: The core compiler component: partitioning component: tests Issues re: Tests labels Dec 14, 2022

github-actions bot requested review from narendasan and peri044 December 14, 2022 04:59

gs-olive self-assigned this Dec 14, 2022

peri044 reviewed Dec 19, 2022

View reviewed changes

core/partitioning/partitioninginfo/PartitioningInfo.h Outdated Show resolved Hide resolved

gs-olive force-pushed the gpt_2_bugfix branch from ee69829 to a4c2d60 Compare December 21, 2022 04:53

gs-olive requested a review from peri044 December 21, 2022 04:53

gs-olive changed the title ~~fix: Properly cast Int8 inputs to TensorRT Engines in Fallback~~ fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback Dec 21, 2022

peri044 reviewed Dec 22, 2022

View reviewed changes

peri044 requested changes Dec 22, 2022

View reviewed changes

core/partitioning/shape_analysis.cpp Show resolved Hide resolved

fix: Improve logging, restructure casting function

d74e0b5

- Address review comments - Improve documentation and logging messages - Restructure casting function to allow for casting of variable data types - Add casting for `at::kByte` segment block inputs as well as segment block outputs

gs-olive requested a review from peri044 December 22, 2022 03:32

peri044 approved these changes Dec 22, 2022

View reviewed changes

peri044 merged commit 544654f into pytorch:master Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback #1549

fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback #1549

gs-olive commented Dec 14, 2022

peri044 Dec 21, 2022

gs-olive Dec 22, 2022

peri044 left a comment

fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback #1549

fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback #1549

Conversation

gs-olive commented Dec 14, 2022

Description

Type of change

Checklist:

peri044 Dec 21, 2022

Choose a reason for hiding this comment

gs-olive Dec 22, 2022

Choose a reason for hiding this comment

peri044 left a comment

Choose a reason for hiding this comment