You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an IndexError and appears to hang the entire training. I am using a very small dataset from the mass_buildings set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:
System:
uname -a
Linux user-VirtualBox 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Additionally, Boost 1.59.0 and OpenCV 3.0.0 have been build and installed from source and both installs appears successful. The utils is also built successfully.
I have downloaded only a small subset of the mass_buildings dataset:
# ls -R ./data/mass_buildings/train/
./data/mass_buildings/train/:
map sat
./data/mass_buildings/train/map:
22678915_15.tif 22678930_15.tif 22678945_15.tif 22678960_15.tif
./data/mass_buildings/train/sat:
22678915_15.tiff 22678930_15.tiff 22678945_15.tiff 22678960_15.tiff
Below is the output obtained by running the shells/create_datasets.sh script, modified only to build the mass_buildings data:
As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using nohup also does not complete. When forcefully stopped using Ctrl-C, I'm obtaining the following message:
# cat nohup.out
Traceback (most recent call last):
File "./scripts/train.py", line 313, in <module>
model, optimizer = one_epoch(args, model, optimizer, epoch, True)
File "./scripts/train.py", line 265, in one_epoch
optimizer.update(model, x, t)
File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 377, in update
loss = lossfun(*args, **kwds)
File "./models/MnihCNN_multi.py", line 31, in __call__
self.loss = F.softmax_cross_entropy(h, t, normalize=False)
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 152, in softmax_cross_entropy
return SoftmaxCrossEntropy(use_cudnn, normalize)(x, t)
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 105, in __call__
outputs = self.forward(in_data)
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 183, in forward
return self.forward_cpu(inputs)
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 39, in forward_cpu
p = yd[six.moves.range(t.size), numpy.maximum(t.flat, 0)]
IndexError: index 76 is out of bounds for axis 1 with size 3
This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.
Thank you
The text was updated successfully, but these errors were encountered:
Hello,
I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an
IndexError
and appears to hang the entire training. I am using a very small dataset from themass_buildings
set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:System:
Python (w/o Anaconda):
Python modules:
Additionally,
Boost 1.59.0
andOpenCV 3.0.0
have been build and installed from source and both installs appears successful. Theutils
is also built successfully.I have downloaded only a small subset of the
mass_buildings
dataset:Below is the output obtained by running the
shells/create_datasets.sh
script, modified only to build themass_buildings
data:Then the training script is initiated using the following command:
As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using
nohup
also does not complete. When forcefully stopped usingCtrl-C
, I'm obtaining the following message:This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.
Thank you
The text was updated successfully, but these errors were encountered: