There is a problem when run my data #19

zuoeye · 2017-11-29T08:35:06Z

I tried to run this code in my data. But I am having trouble when run my data:
/home/ai/torch/install/bin/luajit: train.lua:382: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-3009/cutorch/lib/THC/generic/THCTensorCopy.c:18 stack traceback: [C]: in function 'indexCopy' train.lua:382: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670
How to solve this problem?
I look forward to your response at your earliest convenience.
Thanks.

The text was updated successfully, but these errors were encountered:

zuoeye · 2017-11-29T09:29:52Z

My Data is 28283. So I Just copy the model_def of FRGC to build my network architecture. Is this the reason of the trouble？Then，how to define my own network architecture to train the model on images with size 28*28?

jwyang · 2017-11-29T16:31:32Z

Yes, I think that might be the reason. FRGC is 32x32, so the output feature dimesion of the network might be wrong for you 28x28 images. You can try the architecture for MNIST, since it is also 28x28.

zuoeye · 2017-11-30T11:58:28Z

Thanks for your answer, but there is still probelm when I tried architecture for MNIST:

`online epoch # 0 [batchSize = 100] [learningRate = 0.01]
/home/ai/torch/install/bin/luajit: /home/ai/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: Need input of dimension 4 and input.size[1] == 1 but got input to be of shape: [100 x 3 x 28 x 28] at /tmp/luarocks_cunn-scm-1-3260/cunn/lib/THCUNN/generic/SpatialConvolutionMM.cu:49
stack traceback:
[C]: in function 'v'
/home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:421: in function 'opfunc'
/home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:436: in function 'updateCNN'
train.lua:487: in main chunk
[C]: in function 'dofile'
...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:421: in function 'opfunc'
/home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:436: in function 'updateCNN'
train.lua:487: in main chunk
[C]: in function 'dofile'
...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670`

I think that It may be my data is three-channel RGB data. I found FRGC is the three-channel RGB data. Then I resize my data to 33232. But there is still trouble when I tried architecture for FRGC:

==> online epoch # 0 [batchSize = 100] [learningRate = 0.01] loss: 0.037374177345863 /home/ai/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/ai/torch/pkg/torch/generic/Tensor.c:913) stack traceback: [C]: at 0x7f453c6d9b30 [C]: in function '__index' train.lua:366: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

Do you know how to do? Or can you teach me how to build my network architecture. Especially, how to choose the parameters in the model_def? For example, how to choose the parameters of nInputPlanes, nOutputPlanes, nn.View, nn.Linear, nn.Normalize in the model_def?

zuoeye · 2017-12-04T01:02:53Z

Hello, I'm waiting for your answer. Kindly favour me with an early reply. Thank you.

jwyang · 2017-12-25T20:49:09Z

Have you solved the problem? I think you need to convert you data to 3 channels. SInce the architecture for FRGC merely takes 3 channels as input.

Also, please remember to give the groundtruth labels. If you do not have, then randomly initialize the labels in advance.

dcharua · 2018-03-17T22:35:52Z

Hi I have the same issue, ran it on the FRGC with 3 channels, but got

/home/lifelogging/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/lifelogging/torch/pkg/torch/generic/Tensor.c:913)
stack traceback:
[C]: at 0x7f1af1a2bb60
[C]: in function '__index'
train.lua:368: in function 'organize_samples'
train.lua:424: in function 'opfunc'
/home/lifelogging/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:438: in function 'updateCNN'
train.lua:489: in main chunk
[C]: in function 'dofile'
...ging/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

were you able to solve this?

Thank you

dcharua · 2018-03-17T23:02:44Z

The problem is with the data, it needs to be in the correct format a 32float
so the header of the h5 should look like this

HDF5 "data4torch.h5" {
GROUP "/" {
DATASET "data" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 35898, 3, 32, 32 ) / ( 35898, 3, 32, 32 ) }
}
DATASET "labels" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 35898 ) / ( 35898 ) }
}
}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a problem when run my data #19

There is a problem when run my data #19

zuoeye commented Nov 29, 2017

zuoeye commented Nov 29, 2017

jwyang commented Nov 29, 2017

zuoeye commented Nov 30, 2017

zuoeye commented Dec 4, 2017

jwyang commented Dec 25, 2017

dcharua commented Mar 17, 2018

dcharua commented Mar 17, 2018

There is a problem when run my data #19

There is a problem when run my data #19

Comments

zuoeye commented Nov 29, 2017

zuoeye commented Nov 29, 2017

jwyang commented Nov 29, 2017

zuoeye commented Nov 30, 2017

zuoeye commented Dec 4, 2017

jwyang commented Dec 25, 2017

dcharua commented Mar 17, 2018

dcharua commented Mar 17, 2018