-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object boundary reconstruction quality #33
Comments
Hi Marvin, Glad it's helpful to you :) Could you post an example picture? And do you observe the same when applying the model to images from the training set? |
Hey Martin, thanks for your quick response :) The cell on the right should be star convex, however the segmentation boundary is a rather smoothed version of the cell outline... Here's another example of a more round cell: While in the previous example the boundary is simply very smooth, here it seems to be slightly off. This data is actually taken from http://celltrackingchallenge.net/2d-datasets (GFP-GOWT1 mouse stem cells), as it looks very similar to mine. I trained the network on their training data with the default settings from the training notebook plus 128 rays. |
Oh, I forgot to mention that the images shown above come from the test data. The boundaries in the training dataset seem to look better. |
Hi Martin, Uwe, and Marvin - Also - thanks for a great tool and very helpful examples and documentation. I'm also having an issue with stardist representations at the boundary; it is somewhat different from Marvin's but thought I would start here - if you prefer me to open a separate issue please ask. I have 2D manually segmented nuclei from a single z-slice of a 3D confocal image of a zebrafish larva. The image sampling rate is lower than the examples (~0.5x0.5um in x/y) and the nuclei are quite crowded and small. So, the average nucleus area (in terms of number of voxels) is quite small compared to the provided example data. I have not trained a model yet - I've only reconstructed the manual labels in accordance with the data ipynb example. The reconstructed stardist representations are quite noisy along the boundaries, and the mean IOU scores saturate at 0.75 and even decline for ray counts greater than 128 (have tested up to 512). Do you guys have any insight as to why reconstructions for the ground truth labels might be so rough at the edges? |
@GFleishman Good point, I had also noticed that! It seems that ray propagation during the reconstruction is kind of coarse: Decreasing the stepsize to This might be important when dealing with small objects. Not sure decreasing the step size will help with my boundaries but I'll retrain and give it a try, potentially being off at the boundaries could confuse the distance predictions (?) |
Thanks Marvin, that's super helpful, I think you're probably right that this is the relevant code to modify for my data. However, this loopy python code is not practical to actually run on my data, where I have several hundred cells per image. It seems the default for my local installation is to run the compiled cpp version: I'd like to modify this and rebuild - but I'm unsure how to do it, since the entire build was part of the Edit: This turned out to be easy. I just made my modifications to the cpp code and reran Since you've been so helpful, I'd like to try and help with your issue as well - but since it occurs primarily on reconstructions of the test data polygons, it's probably something different. Can you rule out that it's just a phenomenon of the data, i.e. different intensity characteristics at the boundaries of training/testing data - so the model just learned to produce more smooth/conservative boundaries? Edit: I forced the python versions to run with a 4 in the denominator for the step size; it is significantly slower than the CPP code and not practically useful, but it does definitely solve the roughness problem for small label areas: |
@GFleishman True, the python version is extremely slow!
I think the package gets built when installing with pip, on the Mac I had to follow these instructions by @maweigert: #21. I also have stardist installed on a Ubuntu 18 machine for the training and there was no problem. Otherwise, you could speed up the python loops using from stardist.geometry import geom2d
from numba import jit
def _py_star_dist_modified(a, n_rays=32):
# (np.isscalar(n_rays) and 0 < int(n_rays)) or _raise(ValueError())
n_rays = int(n_rays)
a = a.astype(np.uint16,copy=False)
dst = np.empty(a.shape+(n_rays,),np.float32)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
value = a[i,j]
if value == 0:
dst[i,j] = 0
else:
st_rays = np.float32((2*np.pi) / n_rays)
for k in range(n_rays):
phi = np.float32(k*st_rays)
dy = np.cos(phi)#/100.
dx = np.sin(phi)#/100.
x, y = np.float32(0), np.float32(0)
while True:
x += dx
y += dy
ii = int(round(i+x))
jj = int(round(j+y))
if (ii < 0 or ii >= a.shape[0] or
jj < 0 or jj >= a.shape[1] or
value != a[ii,jj]):
dist = np.sqrt(x*x + y*y)
dst[i,j,k] = dist
break
return dst
geom2d._py_star_dist = jit(_py_star_dist_modified) This version performs about the same as the cpp one:
Hmm you're right that I cannot rule out that it's a training thing. Will definitely train again with a dataset in which I have many shapes represented. Thanks! |
Wow, I'd heard of numba but had never tried it. That's awesome! I suppose a vectorized version of the ray shooting would also speed things up - but seems unnecessary to go to the trouble now. Thanks again! |
Hi @m-albert,
Hard to see why it should deviate so much for the images you show, which should be rather easy to segment...would have too look further into it (maybe cells are too large, gridsize too small etc)
Thanks (and @GFleishman too) for bringing that up! Indeed the stardist calculations are a bit rough for small objects and we never bothered to refine them correctly. Inspired by this thread I took another look at them: Instead of decreasing the stepsize (which would probably be too slow) one can directly compute the "overshoot" distance after the label switches. That way, distances for small objects should be now more correct. I've put it in the
Let me know if that helps and thanks for all the feedback and input! |
@maweigert Computing the overshoot sounds more efficient - but just fyi, I was able to recompile the cpp version with new step sizes (dx/4 and dy/4) and it was not perceptibly slower than the original size. Obviously it is slower because it's 4x more compute - but it was still basically real time when done in the cpp. |
Interesting, thanks for trying that out! Looks like the extra label array accesses are costing basically no time, as most of them are already on the cache. |
Coming back to this after a while. I was going through the stardist 3d paper (as someone in the lab was planning to use it!) and read this statement:
What would be your explanation for why segmentations are not necessarily pixel accurate? I guess for me it's not super clear how the output of a u-net behaves when needing to precisely reconstruct the ray lengths.
At the end the segmentations I'm getting are not bad at all, but since that project involves cell shape characterisations I was wondering whether there would be some nice trick to improve the accuracies of the boundary. For the images I showed I used the default settings from the notebook, i.e. the shown image resolution (objects of around 70 pixels in diameter) and Cheers! |
I think this probably refers to the general idea that the Stardist representation is only perfectly pixel accurate - theoretically, and for arbitrarily high resolution - with an infinite number of rays. In the discrete case, it's only pixel accurate if your number of rays equals the number of boundary pixels of the cell (assuming it is unambiguously clear where this boundary is to begin with). You're ultimately representing an object whose boundary is a rough discrete approximation to a smooth continuous surface with a (somewhat sparse, even with 128+ rays) polygon - like an octagon around a circle. Also to clarify - the UNet does not reconstruct the polygons - it just maps the input image to the probability + ray distances feature space. The probability thresholding + non maximum suppression is what ultimately gives the polygons - and then those are reconstructed in the code you pointed out to me before.. One approach to refining the boundary segmentations would be to begin with the Stardist segmentation as an initialization, and then do graph based segmentation to refine boundaries locally [1] [2]. I think that would be really cool - but also a whole project unto itself and potentially a lot of work. Depending on how complex the shape you're trying to segment, it's probably not worth it. |
To be able to reproduce the boundary more properly I was able to use stardist markers and then do watershed transform using them on the probability map coming out of UNet + a binary mask for doing and operation on the watershed image to be able to reproduce boundaries more accurately especially for large cells. Maybe this helps: |
Hey @GFleishman :)
You're right, probably they meant that the segmentation is of course limited by the chosen resolution of rays. However I wouldn't think that taking 32 rays vs taking the number of rays needed for a pixel dense boundary reconstruction would explain too much of the suboptimal shape segmentations I'm observing. With 32 rays you're probably almost 'pixel accurate' in most cases:
I'm aware of this! However if I understand properly, the final segmentation polygon is defined by the rays of the pixel with the highest object probability. Therefore the shape of the segmentation outcome is directly drawn from distance values encoded in the unet. And I'm wondering how well the ray distances generalise for unseen images. In my case the training data looks better than the test data, however this could of course also come from suboptimal training parameters. |
Yeah that's a good idea, it would be a way to enhance stardist's nice object separation with high quality segmentation masks. However as you say implementing something like this might be quite some work, therefore I'm wondering how well stardist would perform out of the box (and because I think it's a super cool and fun method!). Hey @kapoorlab thanks for your suggestion that goes along the line of what @GFleishman was suggesting to postprocess the stardist results. The improved segmentations in your notebook look pretty good :) Interestingly, also here the not perfect overlap between ground truth (by eye) and the obtained stardist instances doesn't seem to be limited by having 'only' 128 rays (128 assumed from the filename) but must come from somewhere else. |
Hi @m-albert
Can you show me an example of such an suboptimal result? And how different are train vs test images? |
Ok, that was fast! :) (your post appeared 2 seconds after I commented) :) For the images shown, I would in fact even consider stardist to work pretty well. The segments itself wont be traced out perfectly as i) they contain sharply curved segements hard to approximate with finite rays, and ii) the cells are rather large making it harder to correctly predict pixel perfect distances from the center. |
Hi @maweigert ,
Sorry, for clarification that's a screenshot from the notebook by @kapoorlab :) I think he does in deed perform a watershed after getting the seeds using stardist (not sure if from the probability map as potential, that's a good suggestion!).
My suboptimal output would be as posted at the beginning of this (now rather long) thread #33 (comment). |
I tried this with 256 rays as well but results do not change much.
Yes this is exactly what it shows, the seed points are obtained after doing the NMS step so as not to oversegment. |
Sorry, too tired apparently :) Did you try to play around with the grid size (e.g. setting it to |
Cool! :) |
@maweigert No problem, I hadn't made it too clear that it wasn't my data :)
Yeah I could try reducing the resolution, so far I was using (2,2) because that seemed like a reasonable choice considering the data and detail of segmentation I want to recover. I guess I'm also a bit curious regarding how precise the polygon reconstructions could get in principle with optimised training and whether there'd be a clear way of thinking about it. In practical terms for now the segmentation is nicely reliable and of sufficient quality so I'll stick to that for the data of my collaborator! Thanks a lot for the thoughts everyone and feel free to close the issue :) |
this question seems unrelated to the topic in this thread and also not specific to StarDist. Sorry for the late reply. Best, |
Hi,
first of all thanks for such a great tool! No more problems with object boundaries :) Also appreciate the great jupyter notebooks to get started.
I was considering using stardist for 2d cell segmentation with the aim to analyse cell shapes (fluorescent label). In a first try I noticed that the segmentation worked pretty well and the cells got nicely separated from each other, which is incredibly useful. However the exact cell shapes seem not super precisely recovered (despite them being clearly star-convex).
If I understood the method correctly the object boundaries are defined by the rays of the single highest score pixel within the cell (after NMS), which at least in my case might be not too precise. Which do you think would be a good way to optimise the final shapes I get out within the stardist framework? E.g., would you expect the ray distances to the borders become reasonably precise with extensive training (so far I have 20 cells labeled and use 128 rays with a (2,2) grid)? Otherwise of course there would be the option to use the stardist output as watershed seeds..
Thanks for any hints or suggestions :)
Cheers,
Marvin
The text was updated successfully, but these errors were encountered: