Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

Open
InfectedPacket opened this issue Dec 21, 2017 · 1 comment
Open

Comments

@InfectedPacket
Copy link

Hello,

I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an IndexError and appears to hang the entire training. I am using a very small dataset from the mass_buildings set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:

System:

uname -a
Linux user-VirtualBox 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Python (w/o Anaconda):

$ python -V
Python 3.5.2

Python modules:

user@user-VirtualBox:~/Source/ssai-cnn$ pip3 freeze
...
chainer==1.5.0.2
...
Cython==0.23.4
...
h5py==2.7.1
...
lmdb==0.87
...
matplotlib==2.1.1
...
numpy==1.10.1
onboard==1.2.0
opencv-python==3.1.0.3
...
six==1.10.0
tqdm==4.19.5
...

Additionally, Boost 1.59.0 and OpenCV 3.0.0 have been build and installed from source and both installs appears successful. The utils is also built successfully.

I have downloaded only a small subset of the mass_buildings dataset:

# ls -R ./data/mass_buildings/train/
./data/mass_buildings/train/:
map  sat

./data/mass_buildings/train/map:
22678915_15.tif  22678930_15.tif  22678945_15.tif  22678960_15.tif

./data/mass_buildings/train/sat:
22678915_15.tiff  22678930_15.tiff  22678945_15.tiff  22678960_15.tiff

Below is the output obtained by running the shells/create_datasets.sh script, modified only to build the mass_buildings data:

patch size: 92 24 16
n_all_files: 1
divide:0.6727173328399658
0 / 1 n_patches: 7744
patches:	 7744
patch size: 92 24 16
n_all_files: 1
divide:0.6314394474029541
0 / 1 n_patches: 7744
patches:	 7744
patch size: 92 24 16
n_all_files: 4
divide:0.6260504722595215
0 / 4 n_patches: 7744
divide:0.667414665222168
1 / 4 n_patches: 15488
divide:0.628319263458252
2 / 4 n_patches: 23232
divide:0.6634025573730469
3 / 4 n_patches: 30976
patches:	 30976
0.03437542915344238 sec (128, 3, 64, 64) (128, 16, 16)

Then the training script is initiated using the following command:

user@user-VirtualBox:~/Source/ssai-cnn$ CHAINER_TYPE_CHECK=0 CHAINER_SEED=$1 \
> nohup python ./scripts/train.py \
> --seed 0 \
> --gpu -1 \
> --model ./models/MnihCNN_multi.py \
> --train_orthokill _db data/mass_buildings/lmdb/train_sat \
> --train_label_db data/mass_buildings/lmdb/train_map \
> --valid_ortho_db data/mass_buildings/lmdb/valid_sat \
> --valid_label_db data/mass_buildings/lmdb/valid_map \
> --dataset_size 1.0 \
> --epoch 1

As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using nohup also does not complete. When forcefully stopped using Ctrl-C, I'm obtaining the following message:

# cat nohup.out 
Traceback (most recent call last):
  File "./scripts/train.py", line 313, in <module>
    model, optimizer = one_epoch(args, model, optimizer, epoch, True)
  File "./scripts/train.py", line 265, in one_epoch
    optimizer.update(model, x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 377, in update
    loss = lossfun(*args, **kwds)
  File "./models/MnihCNN_multi.py", line 31, in __call__
    self.loss = F.softmax_cross_entropy(h, t, normalize=False)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 152, in softmax_cross_entropy
    return SoftmaxCrossEntropy(use_cudnn, normalize)(x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 105, in __call__
    outputs = self.forward(in_data)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 183, in forward
    return self.forward_cpu(inputs)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 39, in forward_cpu
    p = yd[six.moves.range(t.size), numpy.maximum(t.flat, 0)]
IndexError: index 76 is out of bounds for axis 1 with size 3

This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.

Thank you

@mitmul
Copy link
Owner

mitmul commented Mar 17, 2018

@InfectedPacket Thank you for trying my code. If you don't change anything in the code, the training successfully run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants