Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] compiled model gives different outputs from torch model (used to work on torch_tensorrt 2.2.0) #2989

Open
orioninthesky98 opened this issue Jul 9, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@orioninthesky98
Copy link

orioninthesky98 commented Jul 9, 2024

Bug Description

my model outputs a tuple of mu and logvar. for the mu, there are 4 columns (features), consisting of 3 features of type A and 1 feature of type B. you can see the FinalEncoder.forward() code in the gist below for the details.

as sene below, for 3 features of type A, only the first feature matches the pytorch model. the 2nd and 3rd features are total garbage. for the type B feature, it matches the pytorch model.

this used to work perfectly fine on the previous version of torch-TensorRT (2.2.0) before I updated to 2.3.0. in fact, if you look at the model code, i had to write the trt_compat_mode specially for 2.3.0. When I was using 2.2.0, the original pytorch forward() actually compiled fine and gave the expected speedups (4 to 5 times)

torch mu

tensor([[ 0.1179,  0.2490,  0.0227,  0.7348],
        [ 0.1885,  0.3117, -0.0790, -0.6819],
        [ 0.2545, -0.2422,  0.1816,  1.1018],
        [-0.2488,  0.2577, -0.0928,  0.4927]],

tensorRT mu, 2nd & 3rd column is wrong

tensor([[ 0.1182, -0.0108, -0.0108,  0.7333],
        [ 0.1887, -0.0108, -0.0108, -0.6839],
        [ 0.2548, -0.0108, -0.0108,  1.1000],
        [-0.2486, -0.0108, -0.0108,  0.4902],

To Reproduce

Steps to reproduce the behavior:

  1. initialize the pytorch model
  2. compile to TensorRT
  3. run inference & compare outputs against pytorch model

this is the model code
https://gist.github.com/orioninthesky98/d0a987197950bc0b945d28b240d5bc53#file-model-py-L327-L352
the problematic part is highlighted in the gist. you can see the for-loop here and somehow only the 1st feature (inv_mu / inv_logvar) is correct but the remaining 2 are garbage

i've tried unrolling the loop myself (so hardcoding the indices provided into torch.index_select() just in case there was something wrong when tracing the for-loop. it still didn't fix the issue.

i tried to do stuff with torch._constrain_as_size(bs or num_inv_feats) but didn't find success as torch complained that those are not of type SymInt.

i have also tried changing all the .view() to .reshape() but that didn't change anything.
i tried adding .clone(), .contiguous() and that didn't help either.

also something weird is that I was forced to use torch.index_select(). previously in torch_tensorrt 2.2.0, I could do plain slice-indexing and it compiled just fine, something like curr_input = masked_input[:, i, ...].

i tried to revert to torch_tensorrt 2.2.0, but very strangely, it rejects the use of torch.index_select() lol! with 2.2.0, i have to set trt_compat_mode=False, and then it compiles fine, AND it gives the correct outputs

pytorch_mu: tensor([[-1.3618e+05,  3.9028e+07,  1.6671e+07, -2.7819e+08],
        [ 1.2645e+07,  2.5498e+07, -2.1328e+07, -3.2754e+08],
        [-1.0710e+07, -1.4777e+07,  5.7531e+06, -2.5132e+08],
        [ 1.6348e+07,  5.0527e+07,  7.3478e+05, -3.3687e+08]], device='cuda:0')
tensorrt_mu: tensor([[-6.5385e+04,  3.9133e+07,  1.6772e+07, -2.7830e+08],
        [ 1.2640e+07,  2.5586e+07, -2.1301e+07, -3.2748e+08],
        [-1.0643e+07, -1.4718e+07,  5.8226e+06, -2.5134e+08],
        [ 1.6426e+07,  5.0602e+07,  7.4901e+05, -3.3704e+08]], device='cuda:0')

for the compilation I am using this code:

minibatch_size = 1024
net_input_shape = (1, 1, 1, 40)
x_rand = torch.rand((minibatch_size,) + tuple(net_input_shape))
x_rand = x_rand.to(device)
trt_model = trt.compile(
    encoder,
    inputs=[x_rand],
    enabled_precisions={torch.float32},
    optimization_level=5,
    use_fast_partitioner=True,
    dynamic=False,
    disable_tf32=True,
)

Expected behavior

compiled model outputs need to match torch model outputs, at least in approximation

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 2.3.0
  • PyTorch Version (e.g. 1.0): 2.3.0+cu121
  • CPU Architecture: x86_64
  • OS (e.g., Linux): Linux, "Ubuntu 22.04.4 LTS"
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Python version: 3.10.14
  • CUDA version: 12.5
  • GPU models and configuration: 1 x H100
  • Any other relevant information:

Additional context

@orioninthesky98 orioninthesky98 added the bug Something isn't working label Jul 9, 2024
@orioninthesky98 orioninthesky98 changed the title 🐛 [Bug] compiled model gives different outputs from torch model 🐛 [Bug] compiled model gives different outputs from torch model (used to work on tensorRT 2.2.0) Jul 9, 2024
@orioninthesky98 orioninthesky98 changed the title 🐛 [Bug] compiled model gives different outputs from torch model (used to work on tensorRT 2.2.0) 🐛 [Bug] compiled model gives different outputs from torch model (used to work on torch_tensorrt 2.2.0) Jul 9, 2024
@zewenli98
Copy link
Collaborator

Hi @orioninthesky98 thanks for the details.
I'm able to get the same results of torch_tensorrt and pytorch models by using the repro (a little changes) you gave:
image

Here's what I did:

  1. uncomment this line (otherwise there's an type error): https://gist.github.com/orioninthesky98/d0a987197950bc0b945d28b240d5bc53#file-model-py-L342
    I didn't touch other codes.

  2. Run the inference code:

encoder = FinalEncoder().to("cuda")
encoder.eval()
minibatch_size = 1024
net_input_shape = (1, 1, 1, 40)

x_rand = torch.rand((minibatch_size,) + tuple(net_input_shape))
x_rand = x_rand.to("cuda")
trt_model = torch_tensorrt.compile(
    encoder,
    inputs=[x_rand],
    enabled_precisions={torch.float32},
    optimization_level=5,
    use_fast_partitioner=True,
    dynamic=False,
    disable_tf32=True,
)
print("==================== trt_model mu ====================")
print(trt_model(x_rand)[0])
print("==================== torch_model mu ====================")
print(encoder(x_rand)[0])

Then I can get the same results.

For your reference, here's my env:

tensorrt                      10.0.1
torch                         2.5.0.dev20240703+cu121
torch_tensorrt                2.5.0.dev0+feb4d84ff  (main branch as of today)
torchvision                   0.20.0.dev20240703+cu121

I recommend you using the latest Torch-TRT main branch to test again. Please let me know if you still get the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants