Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second block should have 2 convolutions #9

Open
prhbrt opened this issue Jan 16, 2017 · 26 comments
Open

Second block should have 2 convolutions #9

prhbrt opened this issue Jan 16, 2017 · 26 comments

Comments

@prhbrt
Copy link

prhbrt commented Jan 16, 2017

I noticed that your second block in the diagram in your paper

v-net

source: https://arxiv.org/abs/1606.04797

has 2 convolutional layers, yet here is has one:

https://github.com/faustomilletari/VNet/blob/master/Prototxt/train_noPooling_ResNet_cinque.prototxt#L143

Shouldn't there be 2? And shouldn't the thrid block have 3?

https://github.com/faustomilletari/VNet/blob/master/Prototxt/train_noPooling_ResNet_cinque.prototxt#L230

@faustomilletari
Copy link
Owner

looking into it...

@mattmacy
Copy link

mattmacy commented Mar 7, 2017

I'm translating this to pytorch and have noticed the same thing only more so. It might help if you put each layer on a single line like:
http://lmb.informatik.uni-freiburg.de/resources/opensource/3dUnet_miccai2016_with_BN.prototxt
Prototxt is horrible no matter how it's formatted, but it's much easier to see what's going on that way.

I've lumped together the up/down convolution and the subsequent convolution(s) in to a single logical layer. The first argument to Down/Up is the number of input channels and the second argument is the number of convolutions. The following is how the network is actually implemented in ResNet_cinque

class VNet(nn.Module):
def init(self):
super(VNet, self).init()
self.in_tr = InputTransition(16)
self.down_tr32 = DownTransition(16, 1)
self.down_tr64 = DownTransition(32, 2)
self.down_tr128 = DownTransition(64, 3)
self.down_tr256 = DownTransition(128, 3)
self.up_tr256 = UpTransition(256, 3)
self.up_tr128 = UpTransition(128, 2)
self.up_tr64 = UpTransition(64, 1)
self.up_tr32 = UpTransition(32, 1)
self.out_tr = OutputTransition(16)

This is what it would look like if it were implemented per the diagram:

class VNet(nn.Module):
def init(self):
super(VNet, self).init()
self.in_tr = InputTransition(16)
self.down_tr32 = DownTransition(16, 2)
self.down_tr64 = DownTransition(32, 3)
self.down_tr128 = DownTransition(64, 3)
self.down_tr256 = DownTransition(128, 3)
self.up_tr256 = UpTransition(256, 3)
self.up_tr128 = UpTransition(128, 3)
self.up_tr64 = UpTransition(64, 2)
self.up_tr32 = UpTransition(32, 1)
self.out_tr = OutputTransition(16)

I haven't ported the Dice loss function or the Vnet equivalent to Unet3D's value deformation layer and I only have a single Pascal Titan X so I can't say how much of a difference it all makes in practice. I don't have any spare time at the moment, but next month I may compare them both on the LUNA16 data set.

@prhbrt
Copy link
Author

prhbrt commented Mar 7, 2017

Correct, I also saw several similar inconsistencies, just wanted to start with one :) I have written code in Keras btw.

@mattmacy
Copy link

mattmacy commented Mar 7, 2017

@prinsherbert have you extracted the data augmentation code and dice loss function from his Caffe fork? I started extracting the former from the 3D Unet caffe patch and it's rather painful.

Also have you tested the results of what's in the diagram versus what's in the prototxt?

@gattia
Copy link

gattia commented Mar 7, 2017

@prinsherbert, similar to the comment by @mattmacy about the data augmentation, have you written your own Keras code for that? I've been playing with a keras implementation but haven't tried data augmentation at the moment because I haven't had the time to write my own image manipulations and keras only has augmentations for 2D images out of the box.

@mattmacy
Copy link

mattmacy commented Mar 8, 2017

@prinsherbert another irregularity that I'm not completely comfortable with (but is actually not a discrepancy between the prototxt and the paper) is that in the sixth block you're actually reducing the number of channels by 4x, going from 256->64. Once again, as implemented it may well be for the best, but for symmetry's sake I wonder if there should be an additional up convolution layer before we start adding skip connections.

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

I played with some cancerimagearchive data, but not with augmentation. I would suggest creating three Gaussian random fields (blurred 3D Gaussian noise with some variance). Then you set each Augmented[i,j,k] to Original[ grf_x[i], grf_y[j], grf_z[k] ]. Since the indices in Original are floating point, you'd need some interpolation. Not sure if @faustomilletari 's paper does this, but is should be quick and easy to implement.

Unfortunately I am waiting for my CT-scans, there should be 1000's of them, and decided to put a pin in it until I have it. Not sure if i'd be using augmentation, probably I will.

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

@mattmacy I found a whole bunch of things that at least quite puzzled me, were inconsistent or plain wrong. I have no idea about the details, but I also had some trouble getting the downsampling and reported number of channels in the paper and prototext to be consistent or sensible. I think decided to report some issues here and allow @faustomilletari to have a look at it.

@faustomilletari No offence by the way, I am trying to be constructive, your work seems relevant nevertheless. One question I wanted to answer is whether the system has significant better performance than "average probability of all training ground truths" and stuff like that, since prostates will typically be at the same location across humans up until some level.

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

So a good next step, I assume, would be to move the prototext to a new schematic image to have a good idea of what is happening, and allow easy comparison with the paper's schema?

This would only work if several people review the new schematic image to avoid the same problems.

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

I would think porting to tensorflow is equally complicated as the other ports (keras (which can be tensorflow) and pytorch) if it is unclear what the paper's network was. I don't think it's another option, I think it's an extra nice to have. But that is also because I am not familiar with caffe and prototext.

I am talking a bit from polite frustration because I was going back and forth with network architectures trying to make it work according to the prototext and the schema. (In the hopes to protect some else from having the same frustrations :) )

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

But there are no bugs in the code, since that is what the results are based on, the bugs are in the paper, right?

The code is the 'ground truth'

@prhbrt
Copy link
Author

prhbrt commented Mar 8, 2017

Well, I'm just commenting on what would make sense to me :P But it's a free github, anyone can do as he pleases.

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@mattmacy
Copy link

mattmacy commented Mar 8, 2017

@faustomilletari I couldn't tell from the paper since your respective datasets were quite different, but how do your results compare with 3D Unet (which appeared to come out almost concomitantly)?

I really appreciate you putting the prototxt up as it makes it easier to understand what you were doing than simply looking at the diagram in the paper.

I'm also not being critical. I'm just trying to solve a segmentation problem whilst leveraging prior art. Any implementation issues are just opportunities for better understanding the space.

If someone else wants to reimplement in TF that would be yet another data point. However, even though I have the most experience with TF I find that its API is essentially verbal vomit. At some point Keras will be reimplemented as the "official" interface to TF which will probably be palatable. Until then I don't see any reason to use it for development purposes unless required to by a course (it may well be the best solution for deploying to a mobile OS). Not only is pytorch noted for training as much as several times faster than other frameworks but I find the API extremely clean as compared with all the alternatives I've looked at. See my work in progress to get a sense for it: https://github.com/mattmacy/vnet.pytorch/blob/master/vnet.py - When working on something as esoteric as biomedical imaging I think the performance and ease of use of the framework is more important than the mainstream-ness. That said I instantly got 8 stars checking in something called vnet.pytorch. Maybe it would have been 80 if it were vnet.tensorflow.

I've only just written a loader for the preprocessed LUNA16 dataset and need to port over the loss function before I can start training today and don't expect to get good results until I implement the data augmentation functions.

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@mattmacy
Copy link

mattmacy commented Mar 8, 2017

Thanks for the response. It's really hard for me to evaluate how much a given architecture contributes to the SotA when it's evaluated on a different data set from comparable architectures and there's no way to repeat the authors' experiments because they haven't made the data available in the right format - assuming it's publicly available at all. It might be worthwhile doing a survey paper with different models and different public data sets with all of the models hosted on github for direct peer review. I'm not an academic, but it might be an edifying exercise.

Ciao',
Matt

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 8, 2017 via email

@mattmacy
Copy link

mattmacy commented Mar 11, 2017

@faustomilletari Just to update you on where I'm at before I get your thoughts on how to cope with the massive data set sizes.
I ported the loss function:
https://github.com/mattmacy/torchbiomed/blob/master/torchbiomed/loss.py
I'm assuming that I should treat channel 0 as the target segmentation channel and channel 1 as the background. In order to do this I need to generate a second inverted mask in order to match against background channel. Have I understood that correctly?

This is what the network graph ends up looking like for me:
https://github.com/mattmacy/vnet.pytorch/blob/master/images/vnet.png

My inputs are lung CTs with the voxels isotropically scaled to 2mm. In order to fit the nodule masks from all data points I ended up having to make the input volumes 192x160x192 (384mm x 320mm x 384mm). I've been trying to do a minibatch of size 2, but in order for forward propagation to complete I had to replace the PReLU with in-place ReLU. Even then backprop would still run out of memory until I reduced the mini-batch size to 1. Realistically I'll need to downsample still further to 2.5mm^3 voxels in order to pass the full volume to the current architecture.

Which brings me to what I wanted your opinion on. The nodules are all 30 mm or less in diameter, so if I did a sliding window scan on the z-axis by passing in 64mm x 320mm x 384mm at a time I would be guaranteed that one volume would fully contain the nodule. I could take this a step further and just pass in a 64mm^3 volume at a time. That would bring me down to 32768 voxels which would allow me to run more normal sized mini-batches on my (already obsolete :-/) Pascal Titan X. What are your thoughts on that approach? It seems like one would still get the benefits of volumetric segmentation with more tractable data quantities. However, one major concern I have with that approach is that it would induce an even bigger class imbalance than we're starting out with. At least when doing whole lung scans 60% of the patients in the data set have at least one nodule. With smaller volumes 90% of the scans will be negative. Would it be worthwhile (or perhaps even necessary) to re-balance the dataset in the loading process so that some reasonable (say 25%) of the volumes contained nodules?

Un saluto Matthew

@mattmacy
Copy link

I now understand what the purpose of the argmax is - to convert the softmax results to a one hot encoding. I now appear to be getting reasonable results on the simple task of lung segmentation but the network will often start off mostly favoring one channel or the other without a careful choice of seed.

@faustomilletari
Copy link
Owner

faustomilletari commented Mar 14, 2017 via email

@kirk86
Copy link

kirk86 commented Mar 27, 2017

However, even though I have the most experience with TF I find that its API is essentially verbal vomit

100% agree on that. It should have been theano 2.0 but it failed completely in that regard. Pun intented! PyTorch on the other hand even though it has nice syntax and lots of things inherited from chainer is still very immature and still lots of bugs.

@mattmacy If you don't mind I just have a question. Is your implementation correct?

@faustomilletari from going through all the above conversations it's still unclear to me whether there's an issue/error in the paper or in the code (Since people seem unable to replicate the results, if I understood correctly)?

Also it would have been nice to have .caffemodel file so that people can at least replicate and reproduce the results. Even in the case where the actual model in the paper might have been trained on a non public dataset for the sake of the experimentation, it would have been nice to also train that model on a public dataset, report the results and provide the .caffemodel, so that others can replicate the experiment and verify the validity of the model.

@AmitAilianiSDC
Copy link

AmitAilianiSDC commented Jun 7, 2018

@mattmacy Can you please provide an update on nodule detection using vnet and what worked for you?

@faustomilletari
Copy link
Owner

faustomilletari commented Jun 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants