Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object boundary reconstruction quality #33

Open
m-albert opened this issue Feb 6, 2020 · 25 comments
Open

Object boundary reconstruction quality #33

m-albert opened this issue Feb 6, 2020 · 25 comments

Comments

@m-albert
Copy link

m-albert commented Feb 6, 2020

Hi,

first of all thanks for such a great tool! No more problems with object boundaries :) Also appreciate the great jupyter notebooks to get started.

I was considering using stardist for 2d cell segmentation with the aim to analyse cell shapes (fluorescent label). In a first try I noticed that the segmentation worked pretty well and the cells got nicely separated from each other, which is incredibly useful. However the exact cell shapes seem not super precisely recovered (despite them being clearly star-convex).

If I understood the method correctly the object boundaries are defined by the rays of the single highest score pixel within the cell (after NMS), which at least in my case might be not too precise. Which do you think would be a good way to optimise the final shapes I get out within the stardist framework? E.g., would you expect the ray distances to the borders become reasonably precise with extensive training (so far I have 20 cells labeled and use 128 rays with a (2,2) grid)? Otherwise of course there would be the option to use the stardist output as watershed seeds..

Thanks for any hints or suggestions :)

Cheers,
Marvin

@maweigert
Copy link
Member

Hi Marvin,

Glad it's helpful to you :)

Could you post an example picture? And do you observe the same when applying the model to images from the training set?

@m-albert
Copy link
Author

m-albert commented Feb 6, 2020

Hey Martin, thanks for your quick response :)
This would be an example:

image

The cell on the right should be star convex, however the segmentation boundary is a rather smoothed version of the cell outline...

Here's another example of a more round cell:

image

While in the previous example the boundary is simply very smooth, here it seems to be slightly off.

This data is actually taken from http://celltrackingchallenge.net/2d-datasets (GFP-GOWT1 mouse stem cells), as it looks very similar to mine. I trained the network on their training data with the default settings from the training notebook plus 128 rays.

@m-albert
Copy link
Author

m-albert commented Feb 6, 2020

And do you observe the same when applying the model to images from the training set?

Oh, I forgot to mention that the images shown above come from the test data. The boundaries in the training dataset seem to look better.

@GFleishman
Copy link

GFleishman commented Feb 6, 2020

Hi Martin, Uwe, and Marvin -

Also - thanks for a great tool and very helpful examples and documentation.

I'm also having an issue with stardist representations at the boundary; it is somewhat different from Marvin's but thought I would start here - if you prefer me to open a separate issue please ask.

I have 2D manually segmented nuclei from a single z-slice of a 3D confocal image of a zebrafish larva. The image sampling rate is lower than the examples (~0.5x0.5um in x/y) and the nuclei are quite crowded and small. So, the average nucleus area (in terms of number of voxels) is quite small compared to the provided example data.

I have not trained a model yet - I've only reconstructed the manual labels in accordance with the data ipynb example. The reconstructed stardist representations are quite noisy along the boundaries, and the mean IOU scores saturate at 0.75 and even decline for ray counts greater than 128 (have tested up to 512).

The original labels:
original_labels

Recons by ray count:
recons

Mean IOU scores:
mean_ious

Do you guys have any insight as to why reconstructions for the ground truth labels might be so rough at the edges?

@m-albert
Copy link
Author

m-albert commented Feb 7, 2020

@GFleishman Good point, I had also noticed that!

It seems that ray propagation during the reconstruction is kind of coarse:
https://github.com/mpicbg-csbd/stardist/blob/38454feca61804049a3afae34add804e0b91517c/stardist/geometry/geom2d.py#L46-L52

leading to sth like:
image

Decreasing the stepsize to dx = np.sin(phi)/100.; dy = np.cos(phi)/100. gives a better result:
image

This might be important when dealing with small objects. Not sure decreasing the step size will help with my boundaries but I'll retrain and give it a try, potentially being off at the boundaries could confuse the distance predictions (?)

@GFleishman
Copy link

GFleishman commented Feb 7, 2020

Thanks Marvin, that's super helpful, I think you're probably right that this is the relevant code to modify for my data.

However, this loopy python code is not practical to actually run on my data, where I have several hundred cells per image. It seems the default for my local installation is to run the compiled cpp version:
https://github.com/mpicbg-csbd/stardist/blob/38454feca61804049a3afae34add804e0b91517c/stardist/lib/stardist2d.cpp#L53-L59

I'd like to modify this and rebuild - but I'm unsure how to do it, since the entire build was part of the pip install stardist and there is no documentation on building from source. @maweigert @uschmidt83 Can you guys provide any information about how to rebuild the cpp libraries in stardist? I'm using MacOS and gcc-9, but will need to repeat this process later on red hat linux.

Edit: This turned out to be easy. I just made my modifications to the cpp code and reran python setup.py sdist from the source directory, then python setup.py install For MacOSX you need CC=gcc-9 CXX=g++-9 python setup.py install

Since you've been so helpful, I'd like to try and help with your issue as well - but since it occurs primarily on reconstructions of the test data polygons, it's probably something different. Can you rule out that it's just a phenomenon of the data, i.e. different intensity characteristics at the boundaries of training/testing data - so the model just learned to produce more smooth/conservative boundaries?

Edit: I forced the python versions to run with a 4 in the denominator for the step size; it is significantly slower than the CPP code and not practically useful, but it does definitely solve the roughness problem for small label areas:

Screen Shot 2020-02-07 at 11 17 10 AM

Screen Shot 2020-02-07 at 11 17 12 AM

@m-albert
Copy link
Author

m-albert commented Feb 7, 2020

@GFleishman True, the python version is extremely slow!

I'd like to modify this and rebuild - but I'm unsure how to do it, since the entire build was part of the pip install stardist and there is no documentation on building from source.

I think the package gets built when installing with pip, on the Mac I had to follow these instructions by @maweigert: #21. I also have stardist installed on a Ubuntu 18 machine for the training and there was no problem.

Otherwise, you could speed up the python loops using numba.jit e.g. directly from your jupyter notebook like this:

from stardist.geometry import geom2d
from numba import jit

def _py_star_dist_modified(a, n_rays=32):
#     (np.isscalar(n_rays) and 0 < int(n_rays)) or _raise(ValueError())
    n_rays = int(n_rays)
    a = a.astype(np.uint16,copy=False)
    dst = np.empty(a.shape+(n_rays,),np.float32)

    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            value = a[i,j]
            if value == 0:
                dst[i,j] = 0
            else:
                st_rays = np.float32((2*np.pi) / n_rays)
                for k in range(n_rays):
                    phi = np.float32(k*st_rays)
                    dy = np.cos(phi)#/100.
                    dx = np.sin(phi)#/100.
                    x, y = np.float32(0), np.float32(0)

                    while True:
                        x += dx
                        y += dy
                        ii = int(round(i+x))
                        jj = int(round(j+y))
                        if (ii < 0 or ii >= a.shape[0] or
                            jj < 0 or jj >= a.shape[1] or
                            value != a[ii,jj]):
                            dist = np.sqrt(x*x + y*y)
                            dst[i,j,k] = dist
                            break
    return dst

geom2d._py_star_dist = jit(_py_star_dist_modified)

This version performs about the same as the cpp one:

%timeit relabel_image_stardist(test_lbl, n_rays=32, mode='python')
452 µs ± 7.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit relabel_image_stardist(test_lbl, n_rays=32, mode='cpp')
493 µs ± 8.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Can you rule out that it's just a phenomenon of the data, i.e. different intensity characteristics at the boundaries of training/testing data - so the model just learned to produce more smooth/conservative boundaries?

Hmm you're right that I cannot rule out that it's a training thing. Will definitely train again with a dataset in which I have many shapes represented. Thanks!

@GFleishman
Copy link

Wow, I'd heard of numba but had never tried it. That's awesome! I suppose a vectorized version of the ray shooting would also speed things up - but seems unnecessary to go to the trouble now. Thanks again!

@maweigert
Copy link
Member

maweigert commented Feb 8, 2020

Hi @m-albert,

While in the previous example the boundary is simply very smooth, here it seems to be slightly off.

Hard to see why it should deviate so much for the images you show, which should be rather easy to segment...would have too look further into it (maybe cells are too large, gridsize too small etc)

This might be important when dealing with small objects. Not sure decreasing the step size will help with my boundaries but I'll retrain and give it a try, potentially being off at the boundaries could confuse the distance predictions

Thanks (and @GFleishman too) for bringing that up! Indeed the stardist calculations are a bit rough for small objects and we never bothered to refine them correctly. Inspired by this thread I took another look at them: Instead of decreasing the stepsize (which would probably be too slow) one can directly compute the "overshoot" distance after the label switches. That way, distances for small objects should be now more correct. I've put it in the dev branch, so you can try it yourself:

pip install git+https://github.com/mpicbg-csbd/stardist.git@dev

Let me know if that helps and thanks for all the feedback and input!

@GFleishman
Copy link

@maweigert Computing the overshoot sounds more efficient - but just fyi, I was able to recompile the cpp version with new step sizes (dx/4 and dy/4) and it was not perceptibly slower than the original size. Obviously it is slower because it's 4x more compute - but it was still basically real time when done in the cpp.

@maweigert
Copy link
Member

it was not perceptibly slower than the original size. Obviously it is slower because it's 4x more compute

Interesting, thanks for trying that out! Looks like the extra label array accesses are costing basically no time, as most of them are already on the cache.

@m-albert
Copy link
Author

Coming back to this after a while. I was going through the stardist 3d paper (as someone in the lab was planning to use it!) and read this statement:

Note that [21] (stardist 2d paper) sits somewhere in between object detection and instance segmentation because the predicted shapes are of relatively high fidelity, but are not pixel-accurate.

What would be your explanation for why segmentations are not necessarily pixel accurate? I guess for me it's not super clear how the output of a u-net behaves when needing to precisely reconstruct the ray lengths.

Hard to see why it should deviate so much for the images you show, which should be rather easy to segment...would have too look further into it (maybe cells are too large, gridsize too small etc)

At the end the segmentations I'm getting are not bad at all, but since that project involves cell shape characterisations I was wondering whether there would be some nice trick to improve the accuracies of the boundary. For the images I showed I used the default settings from the notebook, i.e. the shown image resolution (objects of around 70 pixels in diameter) and grid=(2,2).

Cheers!

@GFleishman
Copy link

Note that [21] (stardist 2d paper) sits somewhere in between object detection and instance segmentation because the predicted shapes are of relatively high fidelity, but are not pixel-accurate.

I think this probably refers to the general idea that the Stardist representation is only perfectly pixel accurate - theoretically, and for arbitrarily high resolution - with an infinite number of rays. In the discrete case, it's only pixel accurate if your number of rays equals the number of boundary pixels of the cell (assuming it is unambiguously clear where this boundary is to begin with). You're ultimately representing an object whose boundary is a rough discrete approximation to a smooth continuous surface with a (somewhat sparse, even with 128+ rays) polygon - like an octagon around a circle.

Also to clarify - the UNet does not reconstruct the polygons - it just maps the input image to the probability + ray distances feature space. The probability thresholding + non maximum suppression is what ultimately gives the polygons - and then those are reconstructed in the code you pointed out to me before..

One approach to refining the boundary segmentations would be to begin with the Stardist segmentation as an initialization, and then do graph based segmentation to refine boundaries locally [1] [2]. I think that would be really cool - but also a whole project unto itself and potentially a lot of work. Depending on how complex the shape you're trying to segment, it's probably not worth it.

@kapoorlab
Copy link

To be able to reproduce the boundary more properly I was able to use stardist markers and then do watershed transform using them on the probability map coming out of UNet + a binary mask for doing and operation on the watershed image to be able to reproduce boundaries more accurately especially for large cells. Maybe this helps:

https://github.com/kapoorlab/SegmentationTerminator/blob/master/Segmentation2D/Ozge/StardistTimeLapseImagePrediction.ipynb

@m-albert
Copy link
Author

m-albert commented Mar 26, 2020

Hey @GFleishman :)

I think this probably refers to the general idea that the Stardist representation is only perfectly pixel accurate - theoretically, and for arbitrarily high resolution - with an infinite number of rays.

You're right, probably they meant that the segmentation is of course limited by the chosen resolution of rays. However I wouldn't think that taking 32 rays vs taking the number of rays needed for a pixel dense boundary reconstruction would explain too much of the suboptimal shape segmentations I'm observing. With 32 rays you're probably almost 'pixel accurate' in most cases:
image

Also to clarify - the UNet does not reconstruct the polygons - it just maps the input image to the probability + ray distances feature space.

I'm aware of this! However if I understand properly, the final segmentation polygon is defined by the rays of the pixel with the highest object probability. Therefore the shape of the segmentation outcome is directly drawn from distance values encoded in the unet. And I'm wondering how well the ray distances generalise for unseen images. In my case the training data looks better than the test data, however this could of course also come from suboptimal training parameters.

@m-albert
Copy link
Author

One approach to refining the boundary segmentations would be to begin with the Stardist segmentation as an initialization, and then do graph based segmentation to refine boundaries locally [1] [2]. I think that would be really cool - but also a whole project unto itself and potentially a lot of work. Depending on how complex the shape you're trying to segment, it's probably not worth it.

Yeah that's a good idea, it would be a way to enhance stardist's nice object separation with high quality segmentation masks. However as you say implementing something like this might be quite some work, therefore I'm wondering how well stardist would perform out of the box (and because I think it's a super cool and fun method!).

Hey @kapoorlab thanks for your suggestion that goes along the line of what @GFleishman was suggesting to postprocess the stardist results. The improved segmentations in your notebook look pretty good :)

Interestingly, also here the not perfect overlap between ground truth (by eye) and the obtained stardist instances doesn't seem to be limited by having 'only' 128 rays (128 assumed from the filename) but must come from somewhere else.
image

@maweigert
Copy link
Member

Hi @m-albert

However I wouldn't think that taking 32 rays vs taking the number of rays needed for a pixel dense boundary reconstruction would explain too much of the suboptimal shape segmentations I'm observing.

Can you show me an example of such an suboptimal result? And how different are train vs test images?
Depending on the specific problems, there might be different strategies to mitigate (e.g. data augmentation or using the probability map as watershed potential...)

@maweigert
Copy link
Member

Ok, that was fast! :) (your post appeared 2 seconds after I commented) :)

For the images shown, I would in fact even consider stardist to work pretty well. The segments itself wont be traced out perfectly as i) they contain sharply curved segements hard to approximate with finite rays, and ii) the cells are rather large making it harder to correctly predict pixel perfect distances from the center.
So using the predicted stardist polygons as seed masks for a watershed with the predicted probability as potential should yield good results. I reckon this is what "SmartSeeds" shows?

@m-albert
Copy link
Author

Hi @maweigert ,

Ok, that was fast! :) (your post appeared 2 seconds after I commented) :)

Sorry, for clarification that's a screenshot from the notebook by @kapoorlab :) I think he does in deed perform a watershed after getting the seeds using stardist (not sure if from the probability map as potential, that's a good suggestion!).

Can you show me an example of such an suboptimal result? And how different are train vs test images?

My suboptimal output would be as posted at the beginning of this (now rather long) thread #33 (comment).

@kapoorlab
Copy link

kapoorlab commented Mar 26, 2020

@m-albert

Interestingly, also here the not perfect overlap between ground truth (by eye) and the obtained >stardist instances doesn't seem to be limited by having 'only' 128 rays (128 assumed from the >filename) but must come from somewhere else.

I tried this with 256 rays as well but results do not change much.

@maweigert

So using the predicted stardist polygons as seed masks for a watershed with the predicted >probability as potential should yield good results. I reckon this is what "SmartSeeds" shows?

Yes this is exactly what it shows, the seed points are obtained after doing the NMS step so as not to oversegment.

@maweigert
Copy link
Member

Sorry, for clarification that's a screenshot from the notebook by @kapoorlab :)

Sorry, too tired apparently :)

Did you try to play around with the grid size (e.g. setting it to (4,4))?

@maweigert
Copy link
Member

Yes this is exactly what it shows, the seed points are obtained after doing the NMS step so as not to oversegment.

Cool! :)

@m-albert
Copy link
Author

Sorry, too tired apparently :)

@maweigert No problem, I hadn't made it too clear that it wasn't my data :)

Did you try to play around with the grid size (e.g. setting it to (4,4))?

Yeah I could try reducing the resolution, so far I was using (2,2) because that seemed like a reasonable choice considering the data and detail of segmentation I want to recover.

I guess I'm also a bit curious regarding how precise the polygon reconstructions could get in principle with optimised training and whether there'd be a clear way of thinking about it.

In practical terms for now the segmentation is nicely reliable and of sufficient quality so I'll stick to that for the data of my collaborator!

Thanks a lot for the thoughts everyone and feel free to close the issue :)

@dipanjanbhattacharya
Copy link

hi guys , is it possible to mask some objects after segmentation?
suppose we have image like this and we wanna mask one of those polygons.
could you please share your idea.

org_1 dapi 20x

thank you in advance

@uschmidt83
Copy link
Member

Hi @dipanjanbhattacharya,

this question seems unrelated to the topic in this thread and also not specific to StarDist.
Can you please ask it in the image.sc forum? (Using tag stardist).
The likelihood of getting quick help is much higher over there. Also, #91 might be relevant for you.

Sorry for the late reply.

Best,
Uwe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants