All data nan #118

3bobo · 2018-08-18T13:01:34Z

I've installed all requirement package, but as I run train.py --arch fcn8s ( I set ade20k as default), there are some warning and all result are nan like this:

/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:31:RuntimeWarning: invalid value encountered in double_scalars
acc = np.diag(hist).sum() / hist.sum()
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:32: RuntimeWarning: invalid value encountered in divide
acc_cls = np.diag(hist) / hist.sum(axis=1)
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:33: RuntimeWarning: Mean of empty slice
acc_cls = np.nanmean(acc_cls)
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:34: RuntimeWarning: invalid value encountered in divide
iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:35: RuntimeWarning: Mean of empty slice
mean_iu = np.nanmean(iu)
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:36: RuntimeWarning: invalid value encountered in divide
freq = hist.sum(axis=1) / hist.sum()
/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/metrics.py:37: RuntimeWarning: invalid value encountered in greater
fwavacc = (freq[freq > 0] * iu[freq > 0]).sum()
('FreqW Acc : \t', 0.0)
('Overall Acc: \t', nan)
('Mean Acc : \t', nan)
('Mean IoU : \t', nan)
0it [00:00, ?it/s]
('FreqW Acc : \t', 0.0)
('Overall Acc: \t', nan)
('Mean Acc : \t', nan)
('Mean IoU : \t', nan)
0it [00:00, ?it/s]
('FreqW Acc : \t', 0.0)
('Overall Acc: \t', nan)
('Mean Acc : \t', nan)
('Mean IoU : \t', nan)
0it [00:00, ?it/s]
('FreqW Acc : \t', 0.0)
('Overall Acc: \t', nan)
('Mean Acc : \t', nan)
('Mean IoU : \t', nan)

what's wrong with this? and how to fix it,thanks for your help

The text was updated successfully, but these errors were encountered:

adam9500370 · 2018-08-18T15:04:47Z

0it [00:00, ?it/s]
It seems that validation data path is wrong (loading 0 file).

https://github.com/meetshah1995/pytorch-semseg/blob/6d2a31c4e389dfe4c62301748a46d5db49019ce6/train.py#L30
https://github.com/meetshah1995/pytorch-semseg/blob/6d2a31c4e389dfe4c62301748a46d5db49019ce6/ptsemseg/loader/ade20k_loader.py#L25-L27
Due to inconsistent split definition in train.py and ptsemseg/loader/ade20k_loader.py, we should replace those lines in ptsemseg/loader/ade20k_loader.py with the following lines, then we can store the correct training and validation file lists.

split_dict = {"training": "training", "val": "validation",}
for split in split_dict:
    file_list = recursive_glob(rootdir=self.root + 'images/' + split_dict[split] + '/', suffix='.jpg')
    self.files[split] = file_list

3bobo · 2018-08-19T01:50:18Z

@adam9500370 I've replaced the code in ptsmseg/loader/ade20k, and set the path in config.json and in ade20k_loader.py in the same like this:

in ade20k_loader.py :

if __name__ == '__main__':
    local_path = '/home/zqk/datasets/ADE20K/ADE20K_2016_07_26'
    dst = ADE20KLoader(local_path, is_transform=True)

and in config.json :

  "ade20k":
  {
    "data_path": "/home/zqk/datasets/ADE20K/ADE20K_2016_07_26"
  }

but the data is still nan while I run train.py, thanks for your help

adam9500370 · 2018-08-19T02:39:01Z

https://github.com/meetshah1995/pytorch-semseg/blob/6d2a31c4e389dfe4c62301748a46d5db49019ce6/ptsemseg/loader/ade20k_loader.py#L26
Since this line means string concatenating self.root (data_path) and the remaining detailed path, then you would read the path like the following:

absolute_path = self.root + 'images/'
print(absolute_path)
# Output:  /home/zqk/datasets/ADE20K/ADE20K_2016_07_26images/

Therefore, you should set the path with the last slash symbol "/home/zqk/datasets/ADE20K/ADE20K_2016_07_26/".

3bobo · 2018-08-19T02:54:40Z

oh, thanks. but here appears another error :

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    train(args)
  File "train.py", line 82, in train
    for i, (images, labels) in enumerate(trainloader):
  File "/home/zqk/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 286, in __next__
    return self._process_next_batch(batch)
  File "/home/zqk/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
  File "/home/zqk/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/loader/ade20k_loader.py", line 48, in __getitem__
    img, lbl = self.augmentations(img, lbl)tmseg/ptsemseg/augmentations.py", line 15, in __call__
    img, mask = Image.fromarray(img, mode='RGB'), Image.fromarray(mask, mode='L')
  File "/home/zqk/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2436, in fromarray
    raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.

Is it because the dimension of ade20k's picture isn't in accordance with dimension the code requires?

adam9500370 · 2018-08-19T03:54:22Z

You can see comment in #88.

3bobo · 2018-08-19T04:35:59Z

ok , thank you very much!

3bobo · 2018-08-19T05:29:23Z

I use the second way so I keep original 3146 classes in ADE20K, I set self.n_classes = 3146 in ade20k_loader.py and modify the fcn.py as you said ,from

 l2.weight.data = l1.weight.data[:n_class, :].view(l2.weight.size())
 l2.bias.data = l1.bias.data[:n_class]

to:

 copy_idx = l1.weight.data.shape[0] if l1.weight.data.shape[0] < n_class else n_class
 l2.weight.data[:copy_idx, :] = l1.weight.data[:copy_idx, :]
 l2.bias.data[:copy_idx] = l1.bias.data[:copy_idx]

but a new error appears:

  File "/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/models/fcn.py", line 343, in init_vgg16_params
    l2.weight.data[:copy_idx, :] = l1.weight.data[:copy_idx, :]
RuntimeError: The expanded size of the tensor (1) must match the existing size (4096) at non-singleton dimension 3

thanks for your help!

adam9500370 · 2018-08-19T10:00:34Z

We can add .view(...) to match the same weight shape between l1 and l2.

l2.weight.data[:copy_idx, :] = l1.weight.data[:copy_idx, :].view(l2.weight[:copy_idx, :].size())

Or you can also set copy_fc8=False without loading the last pre-trained classifier layer.

3bobo · 2018-08-19T11:40:19Z

I tried to change the code as you said, and met the size error:

  File "/home/zqk/work/pytorch_seg/pytorch-semseg/ptsemseg/models/fcn.py", line 340, in init_vgg16_params
    l2.weight.data = l1.weight.data[:n_class, :].view(l2.weight.size())
RuntimeError: invalid argument 2: size '[3146 x 4096 x 1 x 1]' is invalid for input with 4096000 elements at /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/TH/THStorage.c:41

so I set copy_fc8=False, but it comes back to the original error:

  File "/home/zqk/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2436, in fromarray
    raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.

I check all the corrections like this issue:

https://github.com/meetshah1995/pytorch-semseg/blob/dfecf4973c702c9be4b55d5066bd4a79bcb0c1bb/ptsemseg/loader/ade20k_loader.py#L25-L27
to:

        split_dict = {"training": "training", "val": "validation",}
        for split in split_dict:
            file_list = recursive_glob(rootdir=self.root + 'images/' + split_dict[split] + '/', suffix='.jpg')
            self.files[split] = file_list

        if not np.all(classes == np.unique(lbl)):
            print("WARN: resizing labels yielded fewer classes")

        if not np.all(np.unique(lbl) < self.n_classes):
            raise ValueError("Segmentation map contained invalid class values")

do I miss some import corrections ?

adam9500370 · 2018-08-19T11:52:42Z

You need to move line 63 to line 41 in ade20k_loader.py to avoid ValueError: Too many dimensions: 3 > 2., since lbl = self.encode_segmap(lbl) should be done before augmentation img, lbl = self.augmentations(img, lbl).

adam9500370 mentioned this issue Aug 23, 2018

not enough image data #119

Closed

3bobo closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All data nan #118

All data nan #118

3bobo commented Aug 18, 2018

adam9500370 commented Aug 18, 2018 •

edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018 •

edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018

3bobo commented Aug 19, 2018

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018 •

edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018

All data nan #118

All data nan #118

Comments

3bobo commented Aug 18, 2018

adam9500370 commented Aug 18, 2018 • edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018 • edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018

3bobo commented Aug 19, 2018

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018 • edited

3bobo commented Aug 19, 2018

adam9500370 commented Aug 19, 2018

adam9500370 commented Aug 18, 2018 •

edited

adam9500370 commented Aug 19, 2018 •

edited

adam9500370 commented Aug 19, 2018 •

edited