Error: The size of tensor a (10) must match the size of tensor b (8) #70

NJuntunen · 2021-12-09T08:20:46Z

Hi,

I am trying to fit a tabnet model to a large dataset using unsupervised and supervised training like in the article. I am getting this error message every now and then when I have trained the unsupervised model and I am using it in supervised mode. Sometimes it helps just to run the exact same code again but sometimes it pops up every time. Don't really know how to approach this error. Sorry for not having a reproducible example, the data I am using is bit sensitive.

Error in (function (self, src, non_blocking) :
The size of tensor a (10) must match the size of tensor b (8) at non-singleton dimension 0
Exception raised from infer_size_impl at ....\aten\src\ATen\ExpandUtils.cpp:28 (most recent call first):
00007FF92EF010D200007FF92EF01070 c10.dll!c10::Error::Error [ @ ]
00007FF92EF00BAE00007FF92EF00B60 c10.dll!c10::detail::torchCheckFail [ @ ]
00007FF8BA78201500007FF8BA781DD0 torch_cpu.dll!at::DynamicLibrary::sym [ @ ]
00007FF8BA78344900007FF8BA783420 torch_cpu.dll!at::infer_size_dimvector [ @ ]
00007FF8BA799F5500007FF8BA799E00 torch_cpu.dll!at::TensorIteratorBase::compute_shape [ @ ]
00007FF8BA79843200007FF8BA7983D0 torch_cpu.dll!at::TensorIteratorBase::build [ @ ]
00007FF8BA74CDE200007FF8BA74CDA0 torch_cpu.dll!at::TensorIteratorConfig::build [ @ ]
00007FF8BA8E744A00007FF8BA8E6C60 torch_cpu.dll!at::native::copy_ [ @ ]
00007FF8BA8E6CB700007FF8BA8E6C60 torch_cpu.dll!at::native::copy_ [ @ ]
00007FF8BB040D9E00007FF8BB040CF0 torch_cpu.dll!at::redispatch::copy_ [ @ ]
00007FF8BCD03C3900007FF8BCD029E0 torch_cpu.dll!torch::autograd::VariableType::allCUDATypes [ @ ]
00007FF8BB040D9E00007FF8BB040CF0 torch_cpu.dll!at::redispatch::copy_ [ @ ]
00007FF8BCD0387B00007FF8BCD029E0 torch_cpu.dll!torch::autograd::VariableType::allCUDATypes [ @ ]
00007FF8BB45C18200007FF8BB45C050 torch_cpu.dll!at::Tensor::copy_ [ @ ]
00007FF9279A2EEF00007FF9279A2E00 lantern.dll!lantern_Tensor_copy__tensor_tensor_bool [ @ ]
0000000065FEE0180000000065FEDFF0 torchpkg.dll!Z45cpp_torch_method_copy__self_Tensor_src_Tensor15XPtrTorchTensorS_b [ @ ]
0000000065EEDC7D0000000065EEDBE0 torchpkg.dll!torch_cpp_torch_method_copy__self_Tensor_src_Tensor [ @ ]
000000006C7A7BAE000000006C79F730 R.dll!Rf_NewFrameConfirm [ @ ]
000000006C7A886D000000006C79F730 R.dll!Rf_NewFrameConfirm [ @ ]
000000006C7ED189000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7FCD9C000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7740AD000000006C76D110 R.dll!Rf_coerceVector [ @ ]
000000006C7ED189000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7F4F54000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7F4F54000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7FCD9C000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C8008B7000000006C7FFDD0 R.dll!R_execMethod [ @ ]
000000006C7FCFE5000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7FCD9C000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C8008B7000000006C7FFDD0 R.dll!R_execMethod [ @ ]
000000006C7FCFE5000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7FCD9C000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C8008B7000000006C7FFDD0 R.dll!R_execMethod [ @ ]
000000006C7FCFE5000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FD4B9000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FD938000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7F2602000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FD4B9000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FCF04000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FD4B9000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FD938000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7F2602000000006C7E5EC0 R.dll!R_initAssignSymbols [ @ ]
000000006C7FCBF1000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C7FE907000000006C7FE460 R.dll!R_cmpfun1 [ @ ]
000000006C7FFB6A000000006C7FF9B0 R.dll!Rf_applyClosure [ @ ]
000000006C7FCD9C000000006C7FCA80 R.dll!Rf_eval [ @ ]
000000006C8008B7000000006C7FFDD0 R.dll!R_execMethod [ @ ]

cregouby · 2021-12-10T10:32:15Z

Hello @NJuntunen,

Without leaking any sensitive date, are you able to share your data preparation code ( i.e. the recipe or the formula or just the str() of your outcome after preprocessing), the tabnet_pretrain() and tabnet_fit() config parameters, and as well the infrastructure stack ( i.e. tabnet version, torch version, CPU or GPU...) so that we get a taste of what part of the code to look at ( i.e. regression vs classification, batch and minibatch size, ... ? .

Your issue is tough, but I think we all need to shape a kind of troubleshouting pattern is such case...

NJuntunen · 2021-12-16T09:11:25Z

Hi @cregouby, and thanks for offering help! The issue was indeed in data preprocessing and more precise I used step_nzv() in my recipe to exclude NA values from the dataset. So, as you might guess in some cases by chance the step_nzv eliminated some column from one of the datasets (unsupervised or supervised) which caused this error. By first look the error code looked bit challenging but luckily the problem wasn't bigger than that.

cregouby · 2021-12-16T09:56:44Z

Thanks for the feedback.

This scenario will become even trickier as now we support missing values in the unsupervised step. Gives me food for though for the best way to message user of such potential issue in the future...

cregouby closed this as completed Dec 16, 2021

cregouby mentioned this issue Dec 16, 2021

Add check and messages to improve resilience on the training scenario #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: The size of tensor a (10) must match the size of tensor b (8) #70

Error: The size of tensor a (10) must match the size of tensor b (8) #70

NJuntunen commented Dec 9, 2021

cregouby commented Dec 10, 2021 •

edited

NJuntunen commented Dec 16, 2021

cregouby commented Dec 16, 2021

Error: The size of tensor a (10) must match the size of tensor b (8) #70

Error: The size of tensor a (10) must match the size of tensor b (8) #70

Comments

NJuntunen commented Dec 9, 2021

cregouby commented Dec 10, 2021 • edited

NJuntunen commented Dec 16, 2021

cregouby commented Dec 16, 2021

cregouby commented Dec 10, 2021 •

edited