IndexError: index 76 is out of bounds for axis 1 with size 3 #21

InfectedPacket · 2017-12-21T16:16:36Z

Hello,

I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an IndexError and appears to hang the entire training. I am using a very small dataset from the mass_buildings set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:

System:

uname -a
Linux user-VirtualBox 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Python (w/o Anaconda):

$ python -V
Python 3.5.2

Python modules:

user@user-VirtualBox:~/Source/ssai-cnn$ pip3 freeze
...
chainer==1.5.0.2
...
Cython==0.23.4
...
h5py==2.7.1
...
lmdb==0.87
...
matplotlib==2.1.1
...
numpy==1.10.1
onboard==1.2.0
opencv-python==3.1.0.3
...
six==1.10.0
tqdm==4.19.5
...

Additionally, Boost 1.59.0 and OpenCV 3.0.0 have been build and installed from source and both installs appears successful. The utils is also built successfully.

I have downloaded only a small subset of the mass_buildings dataset:

# ls -R ./data/mass_buildings/train/
./data/mass_buildings/train/:
map  sat

./data/mass_buildings/train/map:
22678915_15.tif  22678930_15.tif  22678945_15.tif  22678960_15.tif

./data/mass_buildings/train/sat:
22678915_15.tiff  22678930_15.tiff  22678945_15.tiff  22678960_15.tiff

Below is the output obtained by running the shells/create_datasets.sh script, modified only to build the mass_buildings data:

patch size: 92 24 16
n_all_files: 1
divide:0.6727173328399658
0 / 1 n_patches: 7744
patches:	 7744
patch size: 92 24 16
n_all_files: 1
divide:0.6314394474029541
0 / 1 n_patches: 7744
patches:	 7744
patch size: 92 24 16
n_all_files: 4
divide:0.6260504722595215
0 / 4 n_patches: 7744
divide:0.667414665222168
1 / 4 n_patches: 15488
divide:0.628319263458252
2 / 4 n_patches: 23232
divide:0.6634025573730469
3 / 4 n_patches: 30976
patches:	 30976
0.03437542915344238 sec (128, 3, 64, 64) (128, 16, 16)

Then the training script is initiated using the following command:

user@user-VirtualBox:~/Source/ssai-cnn$ CHAINER_TYPE_CHECK=0 CHAINER_SEED=$1 \
> nohup python ./scripts/train.py \
> --seed 0 \
> --gpu -1 \
> --model ./models/MnihCNN_multi.py \
> --train_orthokill _db data/mass_buildings/lmdb/train_sat \
> --train_label_db data/mass_buildings/lmdb/train_map \
> --valid_ortho_db data/mass_buildings/lmdb/valid_sat \
> --valid_label_db data/mass_buildings/lmdb/valid_map \
> --dataset_size 1.0 \
> --epoch 1

As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using nohup also does not complete. When forcefully stopped using Ctrl-C, I'm obtaining the following message:

# cat nohup.out 
Traceback (most recent call last):
  File "./scripts/train.py", line 313, in <module>
    model, optimizer = one_epoch(args, model, optimizer, epoch, True)
  File "./scripts/train.py", line 265, in one_epoch
    optimizer.update(model, x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 377, in update
    loss = lossfun(*args, **kwds)
  File "./models/MnihCNN_multi.py", line 31, in __call__
    self.loss = F.softmax_cross_entropy(h, t, normalize=False)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 152, in softmax_cross_entropy
    return SoftmaxCrossEntropy(use_cudnn, normalize)(x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 105, in __call__
    outputs = self.forward(in_data)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 183, in forward
    return self.forward_cpu(inputs)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 39, in forward_cpu
    p = yd[six.moves.range(t.size), numpy.maximum(t.flat, 0)]
IndexError: index 76 is out of bounds for axis 1 with size 3

This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.

Thank you

The text was updated successfully, but these errors were encountered:

mitmul · 2018-03-17T17:24:42Z

@InfectedPacket Thank you for trying my code. If you don't change anything in the code, the training successfully run?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

InfectedPacket commented Dec 21, 2017

mitmul commented Mar 17, 2018

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

Comments

InfectedPacket commented Dec 21, 2017

mitmul commented Mar 17, 2018