Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

DenseNetFCN not training to expected performance #63

Closed
ahundt opened this issue Mar 31, 2017 · 17 comments
Closed

DenseNetFCN not training to expected performance #63

ahundt opened this issue Mar 31, 2017 · 17 comments

Comments

@ahundt
Copy link
Collaborator

ahundt commented Mar 31, 2017

I'm training and testing DenseNetFCN on Pascal VOC2012

Could I get advice on next steps to take to debug and improve the results?

To do so I'm using the train.py training script in my fork of Keras-FCN in conjunction with DenseNetFCN in keras-contrib with #46 applied, which for DenseNetFCN really mostly changes the formatting for pep8, though the regular densenet is more heavily modified.

I use Keras-FCN because we don't have an FCN training script here in keras-contrib yet, though I plan to adapt & submit this here once things work properly. While the papers don't publish results on Pascal VOC, the original DenseNet has fairly state of the art results on ImageNet & cifar10/cifar100, and DenseNetFCN performed well on CamVid and the Gatech dataset. Considering this I expected that perhaps DenseNetFCN might fail to get state of the art results, but I figured it could most likely in a worst case get over 50% mIOU and around 70-80% pixel accuracy since it has many similarities in common with ResNet, and performed quite well on the much smaller CamVid dataset.

DenseNet FCN configuration I'm using

    return densenet.DenseNetFCN(input_shape=(320, 320, 3),
                                weights=None, classes=classes,
                                nb_layers_per_block=4,
                                growth_rate=13,
                                dropout_rate=0.2)

This is very close to the configuration FC-DenseNet56 from the 100 layers tiramisu aka DenseNetFCN paper.

Sparse training accuracy

Here is what I'm seeing as I train with the Adam optimizer and loss rate of 0.1:

lr: 0.100000
Epoch 2/450
366/366 [==============================] - 287s - loss: 62.5131 - 
sparse_accuracy_ignoring_last_label: 0.3657
[...snip...]
lr: 0.100000
Epoch 148/450
366/366 [==============================] - 286s - loss: 82.2138 - 
sparse_accuracy_ignoring_last_label: 0.3375

Similar networks and training scripts verified for a baseline

  1. I've successfully trained AtrousFCNResNet50_16s from scratch up to around 7x% pixel accuracy on the pascal VOC2012 test set.
  2. AtrousFCNResNet50_16s with resnet from the keras pretrained weights as fchollet provided downloaded using the Keras-FCN get_weights_path and transfer_FCN script trained without issue to 0.56025 mIOU and around 8x% accuracy.
  3. I've been able to train plain DenseNet on cifar 10 up to expected levels of accuracy published in the original papers.
  4. I've been able to train DenseNetFCN on a single image, and predict pixels with 99% accuracy on the image itself with all augmentation disabled, and mid 9x% accuracy on augmented versions of the training image.
    • This should at least demonstrate that the network can be trained, and the labels aren't saved incorrectly. I know (4) isn't a valid experiment for final conclusions, it just helps eliminate a variety of possible bugs.

Verifying training scripts with AtrousFCNResNet50_16s

For comparison, AtrousFCNResNet50_16s test set training results, which can be brought to 0.661 mIOU with augmented pascal voc,

I also trained AtrousFCNResNet50_16s from scratch with the results below:

PASCAL VOC trained with pretrained imagenet weights
IOU:
[ 0.90873648  0.74772504  0.44416247  0.57239141  0.50728778  0.51896323
  0.69891196  0.66111323  0.64380596  0.19145411  0.49733934  0.32720705
  0.5488089   0.49649298  0.6157158   0.75780816  0.35492963  0.57446371
  0.32721105  0.63200183  0.53067634]
meanIOU: 0.550343
pixel acc: 0.896132
150.996609926s used to calculate IOU.

Download links

Pascal VOC
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar  

Augmented 
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz

Automated VOC download script

Thanks

@titu1994 and @aurora95, since I'm using your respective DenseNetFCN and Keras-FCN implementations, could I get any comments or advice you might have on this? I'd appreciate your thoughts.

All, thanks for giving this a look, as well as your consideration and advice!

@ahundt ahundt changed the title DenseNetFCN not training to performance reported in reference paper DenseNetFCN not training to expected performance Mar 31, 2017
@ahundt
Copy link
Collaborator Author

ahundt commented Mar 31, 2017

Update: One obvious problem is I had the wrong learning rate for Adam, which I changed from 0.1 to the default 0.001. Training is still in progress but perhaps this will get to where I expect.

lr: 0.001000
Epoch 8/450
366/366 [==============================] - 264s - loss: 3.8106 - 
sparse_accuracy_ignoring_last_label: 0.5282

@titu1994
Copy link
Contributor

titu1994 commented Apr 4, 2017

@ahundt How was the training with the corrected learning rate? In my case, it seems to perform well enough, but does not beat the benchmark yet. Perhaps deeper and wider DenseNetFCNs will do the trick.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 4, 2017

@titu1994 I've done a few things since the last update. In your case, did you use the same scripts I reference here or did you try something else?

Here is what I've done:

  1. When I ran training after my last post, it appeared to quickly reach 65% pixel accuracy. However, when I looked at the actual results it turns out the network fell into the local minimum where it essentially labeling all pixels as the background class.

  2. I'm going to add some class weighting to see if it addresses the problem.

  3. There was also a bug with my new parameters to select a top for segmentation or classification, fixed in 2a2c176.

  4. I'm also going to try training on COCO to see if using pretrained weights can improve results. The enet repository looks like it has a nice script for this purpose, and my data_coco.py script can be used to download and extract the dataset with the command ± python data_coco.py coco_setup.

  5. I've verified that the resnet training with pre-trained weights works quite well!

atrous_resnet_prediction2007_000129
groundtruth2007_000129
original2007_000129

  1. Performance may also benefit from employing the Atrous Densenet in the same way as the Atrous resnet from Keras-FCN, and converting imagenet based DenseNet pretrained weights. Here are a bunch of links that may be useful for that purpose:

@titu1994
Copy link
Contributor

titu1994 commented Apr 4, 2017

No I used a private dataset, which normal networks like UNet have difficulty with.

DenseNet seems to perform better than UNet but then it plateaus and further improvement is trivial after that point. Although in my case there was no single class problem.

@aurora95
Copy link

aurora95 commented Apr 4, 2017

I'm not sure but it shouldn't label all pixels as background, there must be something wrong in the code or training settings. Could you please give a link to the code you are using?

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 4, 2017

@titu1994 sorry I didn't realize I had submitted my most recent post then edited it, there is now a lot of additional information above.

@aurora95 Here is My Keras-FCN version where I'm training with DenseNet, with options for Atrous_DenseNet and DenseNet_FCN as generated after cloning keras-contrib with #46 applied. Most modification is in models.py where I import this keras-contrib repository, and train.py where I changed the file paths, switched from SGD to Adam, and changed the learning rate appropriately. A few additional minor changes were also needed for compatibility of image and batch dimensions.

@titu1994
Copy link
Contributor

titu1994 commented Apr 4, 2017

Hmm I don't think there are weights for DenseNet trained on ImageNet. I'll have to do a more thorough search to be sure.

Atrous DenseNets seems nice, could you try implementing it ? I'll give it a look, but i don't think just adding the atrous rate parameter will translate to better performance.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 4, 2017

@titu1994 they are trained on imagenet with DenseNet-caffe and the original DenseNet repository. See my densenet + imagenet links two posts up for those and some caffe to keras conversion scripts that might help, I haven't had a chance to try it all out yet.

@titu1994
Copy link
Contributor

titu1994 commented Apr 4, 2017

This is great news! I'll be sure to look at it in some time, and if possible port the weights to Keras.

However if it's in caffe, I won't be able to convert it, since im on Windows.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 4, 2017

@titu1994 Also, Atrous Densenet is already implemented in #46, next steps would be converting the pretrained imagenet weights or training from scratch.

If the pretrained weights don't work, I have access to a distributed GPU cluster... but it will quite some time before I have all of that implemented, integrated, and tested. tf-slim or tensorpack could help there once paired with a script to copy weights between tf and keras models.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 10, 2017

Found another bug which came from combining the two scripts where the loss function softmax_sparse_crossentropy_ignoring_last_label applies softmax a second time. This is due to a difference between the Keras_FCN models which never apply softmax and then the loss function applies softmax. The keras-contrib densenet models apply softmax by default, thus combining keras-contrib models with keras-fcn loss results in 2x softmax.

I'm now training a new model with the bugfix on Pascal VOC 2012.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 19, 2017

Other optimizers & hyperparameters tensorflow/tensorflow#9175 may help.

It also appears https://github.com/0bserver07/One-Hundred-Layers-Tiramisu has an independent keras implementation but has also run into similar training limitations.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 20, 2017

Update: my fork of Keras-FCN is merged to master with instructions in the README.md
It looks like a potential DenseNet weight conversion process is in https://github.com/nicolov/segmentation_keras which uses github.com/ethereon/caffe-tensorflow, though I got an error when trying to run the conversion script nicolov/segmentation_keras#13.

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 20, 2017

It seems the original authors explain in SimJeg/FC-DenseNet#10 from their FC-Densenet repository that they have found DenseNetFCN performance isn't very good on Pascal VOC. @0bserver07 @titu1994 you will be interested in this info.

@titu1994
Copy link
Contributor

I have seen similar poor performance on a private dataset. Seems it learns rapidly upto a certain point, then cannot improve at all.

UNet seems to perform well in that dataset, far better than DenseNetFCN.

Perhaps the implementation is correct but the model is not able to learn properly on all datasets ?

@ahundt
Copy link
Collaborator Author

ahundt commented Apr 20, 2017

@titu1994 that seems likely, camvid is a much simpler dataset than Pascal VOC 2012. The real test would likely be to try training on camvid itself.

@ahundt
Copy link
Collaborator Author

ahundt commented May 8, 2017

Another interesting difference is the use of a ceil mode in pooling.
ethereon/caffe-tensorflow#112

I'm a bit doubtful that this is the key cause of the performance difference between the paper and the keras implementations, however. Especially considering the tiramisu paper didn't use pretrained weights.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants