Skip to content

TestSofieModels is failing with PyTorch >= 2.9.0 #20145

@bellenot

Description

@bellenot

Check duplicate issues.

  • Checked for duplicates

Description

gtest-tmva-sofie-TestSofieModels is failing on alma8, alma10, ubuntu22, and ubuntu2404, as shown below:

   864/2687 Test  #439: gtest-tmva-sofie-TestSofieModels ..................................................................***Failed   30.62 sec
  Running main() from ./googletest/src/gtest_main.cc
  [==========] Running 14 tests from 1 test suite.
  [----------] Global test environment set-up.
  [----------] 14 tests from SOFIE
  [ RUN      ] SOFIE.Linear_B1
  using batch-size = 1 input dim = 10 nlayers = 4
  input data torch.Size([1, 10])
  tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
  W1020 01:57:39.124000 78746 torch/onnx/_internal/exporter/_registration.py:107] torchvision is not installed. Skipping torchvision::nms
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`...
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`... ✅
  [torch.onnx] Run decomposition...
  [torch.onnx] Run decomposition... ✅
  [torch.onnx] Translate the graph into ONNX...
  [torch.onnx] Translate the graph into ONNX... ✅
  output data : shape,  torch.Size([1, 4])
  tensor([[ 0.0202, -0.0144,  0.0183, -0.1695]], grad_fn=<AddmmBackward0>)
  /github/home/ROOT-CI/build/tmva/sofie/test/LinearModelGenerator.py:148: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
  Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
    f.write(str(float(yvec[i]))+" ")
  executing python3 LinearModelGenerator.py   1  10  4
  parsing file LinearModel_B1.onnx
  generating model.....
  writing model as header .....
  output written in  LinearModel_B1.hxx
  doing inference.....(std::vector<float>) { inff, inff, inff, inff }
   result inf reference 0.0201594
  /github/home/ROOT-CI/src/tmva/sofie/test/TestSofieModels.cxx:97: Failure
  The difference between result.at(i) and refValue[i] is inf, which exceeds 10 * std::numeric_limits<float>::epsilon(), where
  result.at(i) evaluates to inf,
  refValue[i] evaluates to 0.020159400999546051, and
  10 * std::numeric_limits<float>::epsilon() evaluates to 1.1920928955078125e-06.
  
   result inf reference -0.0144313
  /github/home/ROOT-CI/src/tmva/sofie/test/TestSofieModels.cxx:97: Failure
  The difference between result.at(i) and refValue[i] is inf, which exceeds 10 * std::numeric_limits<float>::epsilon(), where
  result.at(i) evaluates to inf,
  refValue[i] evaluates to -0.014431252144277096, and
  10 * std::numeric_limits<float>::epsilon() evaluates to 1.1920928955078125e-06.
  
   result inf reference 0.0182777
  /github/home/ROOT-CI/src/tmva/sofie/test/TestSofieModels.cxx:97: Failure
  The difference between result.at(i) and refValue[i] is inf, which exceeds 10 * std::numeric_limits<float>::epsilon(), where
  result.at(i) evaluates to inf,
  refValue[i] evaluates to 0.018277715891599655, and
  10 * std::numeric_limits<float>::epsilon() evaluates to 1.1920928955078125e-06.
  
   result inf reference -0.169499
  /github/home/ROOT-CI/src/tmva/sofie/test/TestSofieModels.cxx:97: Failure
  The difference between result.at(i) and refValue[i] is inf, which exceeds 10 * std::numeric_limits<float>::epsilon(), where
  result.at(i) evaluates to inf,
  refValue[i] evaluates to -0.16949871182441711, and
  10 * std::numeric_limits<float>::epsilon() evaluates to 1.1920928955078125e-06.
  
  [  FAILED  ] SOFIE.Linear_B1 (6135 ms)
  [ RUN      ] SOFIE.Linear_B4
  using batch-size = 4 input dim = 10 nlayers = 4
  input data torch.Size([4, 10])
  tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
          [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
          [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
          [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.]])
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`...
  W1020 01:57:47.306000 78949 torch/onnx/_internal/exporter/_registration.py:107] torchvision is not installed. Skipping torchvision::nms
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`... ✅
  [torch.onnx] Run decomposition...
  [torch.onnx] Run decomposition... ✅
  [torch.onnx] Translate the graph into ONNX...
  [torch.onnx] Translate the graph into ONNX... ✅
  output data : shape,  torch.Size([4, 4])
  tensor([[ 0.0210,  0.1184,  0.0405,  0.0845],
          [-0.0021,  0.1561, -0.0039,  0.0571],
          [-0.0206,  0.1899, -0.0541,  0.0284],
          [-0.0396,  0.2219, -0.1040, -0.0017]], grad_fn=<AddmmBackward0>)
  /github/home/ROOT-CI/build/tmva/sofie/test/LinearModelGenerator.py:148: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
  Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
    f.write(str(float(yvec[i]))+" ")
  executing python3 LinearModelGenerator.py   4  10  4
  parsing file LinearModel_B4.onnx
  generating model.....
  writing model as header .....
  output written in  LinearModel_B4.hxx
  unknown file: Failure
  C++ exception with description "TMVA-SOFIE failed to read the values for tensor tensor_out3weight" thrown in the test body.
  
  [  FAILED  ] SOFIE.Linear_B4 (9577 ms)
  [ RUN      ] SOFIE.Conv2d_B1
  using batch-size = 1 nchannels = 2 dim = 4 ngroups = 2 nlayers = 4
  input data torch.Size([1, 2, 4, 4])
  tensor([[[[ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.]],
  
           [[-1., -1., -1., -1.],
            [-1., -1., -1., -1.],
            [-1., -1., -1., -1.],
            [-1., -1., -1., -1.]]]])
  Net(
    (conv0): Conv2d(2, 4, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
    (conv1): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2)
    (conv2): Conv2d(8, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv3): Conv2d(4, 1, kernel_size=(2, 2), stride=(2, 2))
  )
  W1020 01:57:57.015000 79134 torch/onnx/_internal/exporter/_registration.py:107] torchvision is not installed. Skipping torchvision::nms
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`...
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`... ✅
  [torch.onnx] Run decomposition...
  [torch.onnx] Run decomposition... ✅
  [torch.onnx] Translate the graph into ONNX...
  [torch.onnx] Translate the graph into ONNX... ✅
  output data : shape,  torch.Size([1, 1, 2, 2])
  tensor([[[[-0.1113, -0.1142],
            [-0.1172, -0.1270]]]], grad_fn=<ConvolutionBackward0>)
  /github/home/ROOT-CI/build/tmva/sofie/test/Conv2dModelGenerator.py:163: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
  Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
    f.write(str(float(yvec[i]))+" ")
  executing python3 Conv2dModelGenerator.py  1 2 4 2 4
  parsing file Conv2dModel_B1.onnx
  generating model.....
  writing model as header .....
  output written in  Conv2dModel_B1.hxx
  unknown file: Failure
  C++ exception with description "TMVA-SOFIE failed to read the values for tensor tensor_conv2weight" thrown in the test body.
  
  [  FAILED  ] SOFIE.Conv2d_B1 (8715 ms)
  [ RUN      ] SOFIE.Conv2d_B4
  using batch-size = 4 nchannels = 2 dim = 4 ngroups = 2 nlayers = 4
  input data torch.Size([4, 2, 4, 4])
  tensor([[[[ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.],
            [ 1.,  1.,  1.,  1.]],
  
           [[-1., -1., -1., -1.],
            [-1., -1., -1., -1.],
            [-1., -1., -1., -1.],
            [-1., -1., -1., -1.]]],
  
  
          [[[ 2.,  2.,  2.,  2.],
            [ 2.,  2.,  2.,  2.],
            [ 2.,  2.,  2.,  2.],
            [ 2.,  2.,  2.,  2.]],
  
           [[-2., -2., -2., -2.],
            [-2., -2., -2., -2.],
            [-2., -2., -2., -2.],
            [-2., -2., -2., -2.]]],
  
  
          [[[ 3.,  3.,  3.,  3.],
            [ 3.,  3.,  3.,  3.],
            [ 3.,  3.,  3.,  3.],
            [ 3.,  3.,  3.,  3.]],
  
           [[-3., -3., -3., -3.],
            [-3., -3., -3., -3.],
            [-3., -3., -3., -3.],
            [-3., -3., -3., -3.]]],
  
  
          [[[ 4.,  4.,  4.,  4.],
            [ 4.,  4.,  4.,  4.],
            [ 4.,  4.,  4.,  4.],
            [ 4.,  4.,  4.,  4.]],
  
           [[-4., -4., -4., -4.],
            [-4., -4., -4., -4.],
            [-4., -4., -4., -4.],
            [-4., -4., -4., -4.]]]])
  Net(
    (conv0): Conv2d(2, 4, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
    (conv1): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2)
    (conv2): Conv2d(8, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv3): Conv2d(4, 1, kernel_size=(2, 2), stride=(2, 2))
  )
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`...
  W1020 01:58:04.780000 79238 torch/onnx/_internal/exporter/_registration.py:107] torchvision is not installed. Skipping torchvision::nms
  [torch.onnx] Obtain model graph for `Net([...]` with `torch.export.export(..., strict=False)`... ✅
  [torch.onnx] Run decomposition...
  [torch.onnx] Run decomposition... ✅
  [torch.onnx] Translate the graph into ONNX...
  [torch.onnx] Translate the graph into ONNX... ✅
  output data : shape,  torch.Size([4, 1, 2, 2])
  tensor([[[[-0.2314, -0.2289],
            [-0.2167, -0.2251]]],
  
  
          [[[-0.2314, -0.2406],
            [-0.2084, -0.2412]]],
  
  
          [[[-0.2323, -0.2494],
            [-0.2005, -0.2507]]],
  
  
          [[[-0.2332, -0.2600],
            [-0.1935, -0.2618]]]], grad_fn=<ConvolutionBackward0>)
  /github/home/ROOT-CI/build/tmva/sofie/test/Conv2dModelGenerator.py:163: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
  Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
    f.write(str(float(yvec[i]))+" ")
  executing python3 Conv2dModelGenerator.py  4 2 4 2 4
  parsing file Conv2dModel_B4.onnx
  generating model.....
  writing model as header .....
  output written in  Conv2dModel_B4.hxx
  doing inference...../github/home/ROOT-CI/src/core/testsupport/src/TestSupport.cxx:79: Failure
  Failed
  Received unexpected diagnostic of severity 4000 at 'TUnixSystem::DispatchSignals' reading 'segmentation violation'.
  Suppress those using ROOT/TestSupport.hxx
  
   Generating stack trace...
   0x00005611d713bdd6 in SOFIE_Conv2d_B4_Test::TestBody() + 0x66 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d71830ff in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x8f from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d71699d6 in testing::Test::Run() + 0xd6 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d7169b95 in testing::TestInfo::Run() + 0x195 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d7169d7f in testing::TestSuite::Run() + 0x1bf from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d7177bec in testing::internal::UnitTestImpl::RunAllTests() + 0x36c from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d71837d7 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x87 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d7169f78 in testing::UnitTest::Run() + 0x78 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00005611d71357c4 in main + 0x44 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
   0x00007fa76de461ca in <unknown> from /lib/x86_64-linux-gnu/libc.so.6
   0x00007fa76de4628b in __libc_start_main + 0x8b from /lib/x86_64-linux-gnu/libc.so.6
   0x00005611d7135bc5 in _start + 0x25 from /github/home/ROOT-CI/build/tmva/sofie/test/TestSofieModels
  CMake Error at /github/home/ROOT-CI/src/cmake/modules/RootTestDriver.cmake:232 (message):
    error code: 129

Reproducer

ctest -VV -C gtest-tmva-sofie-TestSofieModels

ROOT version

master (6.37.01) and v6-36-00-patches

Installation method

from source (in the CI)

Operating system

Linux (alma8, alma10, ubuntu22, and ubuntu2404)

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions