Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a problem when run my data #19

Open
zuoeye opened this issue Nov 29, 2017 · 7 comments
Open

There is a problem when run my data #19

zuoeye opened this issue Nov 29, 2017 · 7 comments

Comments

@zuoeye
Copy link

zuoeye commented Nov 29, 2017

I tried to run this code in my data. But I am having trouble when run my data:
/home/ai/torch/install/bin/luajit: train.lua:382: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-3009/cutorch/lib/THC/generic/THCTensorCopy.c:18 stack traceback: [C]: in function 'indexCopy' train.lua:382: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670
How to solve this problem?
I look forward to your response at your earliest convenience.
Thanks.

@zuoeye
Copy link
Author

zuoeye commented Nov 29, 2017

My Data is 28283. So I Just copy the model_def of FRGC to build my network architecture. Is this the reason of the trouble?Then,how to define my own network architecture to train the model on images with size 28*28?

@jwyang
Copy link
Owner

jwyang commented Nov 29, 2017

Yes, I think that might be the reason. FRGC is 32x32, so the output feature dimesion of the network might be wrong for you 28x28 images. You can try the architecture for MNIST, since it is also 28x28.

@zuoeye
Copy link
Author

zuoeye commented Nov 30, 2017

Thanks for your answer, but there is still probelm when I tried architecture for MNIST:

`online epoch # 0 [batchSize = 100] [learningRate = 0.01]
/home/ai/torch/install/bin/luajit: /home/ai/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: Need input of dimension 4 and input.size[1] == 1 but got input to be of shape: [100 x 3 x 28 x 28] at /tmp/luarocks_cunn-scm-1-3260/cunn/lib/THCUNN/generic/SpatialConvolutionMM.cu:49
stack traceback:
[C]: in function 'v'
/home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:421: in function 'opfunc'
/home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:436: in function 'updateCNN'
train.lua:487: in main chunk
[C]: in function 'dofile'
...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/ai/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:421: in function 'opfunc'
/home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:436: in function 'updateCNN'
train.lua:487: in main chunk
[C]: in function 'dofile'
...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670`

I think that It may be my data is three-channel RGB data. I found FRGC is the three-channel RGB data. Then I resize my data to 33232. But there is still trouble when I tried architecture for FRGC:

==> online epoch # 0 [batchSize = 100] [learningRate = 0.01] loss: 0.037374177345863 /home/ai/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/ai/torch/pkg/torch/generic/Tensor.c:913) stack traceback: [C]: at 0x7f453c6d9b30 [C]: in function '__index' train.lua:366: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

Do you know how to do? Or can you teach me how to build my network architecture. Especially, how to choose the parameters in the model_def? For example, how to choose the parameters of nInputPlanes, nOutputPlanes, nn.View, nn.Linear, nn.Normalize in the model_def?

@zuoeye
Copy link
Author

zuoeye commented Dec 4, 2017

Hello, I'm waiting for your answer. Kindly favour me with an early reply. Thank you.

@jwyang
Copy link
Owner

jwyang commented Dec 25, 2017

Have you solved the problem? I think you need to convert you data to 3 channels. SInce the architecture for FRGC merely takes 3 channels as input.

Also, please remember to give the groundtruth labels. If you do not have, then randomly initialize the labels in advance.

@dcharua
Copy link

dcharua commented Mar 17, 2018

Hi I have the same issue, ran it on the FRGC with 3 channels, but got

/home/lifelogging/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/lifelogging/torch/pkg/torch/generic/Tensor.c:913)
stack traceback:
[C]: at 0x7f1af1a2bb60
[C]: in function '__index'
train.lua:368: in function 'organize_samples'
train.lua:424: in function 'opfunc'
/home/lifelogging/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:438: in function 'updateCNN'
train.lua:489: in main chunk
[C]: in function 'dofile'
...ging/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

were you able to solve this?

Thank you

@dcharua
Copy link

dcharua commented Mar 17, 2018

The problem is with the data, it needs to be in the correct format a 32float
so the header of the h5 should look like this

HDF5 "data4torch.h5" {
GROUP "/" {
DATASET "data" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 35898, 3, 32, 32 ) / ( 35898, 3, 32, 32 ) }
}
DATASET "labels" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 35898 ) / ( 35898 ) }
}
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants