Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay custom segmentation using cell segmentation resolver #55

Open
EST09 opened this issue Apr 17, 2024 · 12 comments
Open

Overlay custom segmentation using cell segmentation resolver #55

EST09 opened this issue Apr 17, 2024 · 12 comments

Comments

@EST09
Copy link

EST09 commented Apr 17, 2024

Hi,

Thank you for SOPA, it looks really helpful. I'd love to use the cell segmentation resolver. However, I'm struggling a little with how best to implement.

I have custom segmentation for a particular cell type in H&E. I've added this as a shape to my spatialdata object in the same coordinate system as cell boundaries (the default Xenium segmentation). I'd love to overlay this custom segmentation onto the default Xenium cell segmentation (cell boundaries) so that any cells/transcripts in the default under the custom segmentation get merged into the custom segmentation but the other cells from the default are kept.

Is there a way to do this in SOPA at all please?

Best wishes,
Emily

@EST09 EST09 changed the title [Feature] Overlay custom segmentation using cell segmentation resolver Apr 17, 2024
@quentinblampey
Copy link
Collaborator

Hello @EST09,

Thanks for your interest in Sopa. I have a few questions to ensure I understand correctly what you want to do:

  1. The overall result should be one segmentation composed of your custom segmentation + the default segmentation for the other cells, right? I mean, you don't want to show both segmentation in the explorer separately, but, instead, it should be merged into one segmentation?
  2. Have you already aligned your H&E data to the DAPI image? Else, you'll need to use the Xenium Explorer to align both images, because they are not in the same coordinate system by default.
  3. For every cell in the resulting "resolved segmentation", I guess you want to know if it comes from the custom segmentation or from the default segmentation, right?
  4. For the custom cells, do you want to aggregate the transcripts/channels inside their boundaries to have a cell-by-gene table?

In any case, this will be possible with Sopa, I'm trying to answer your question as accurately as possible!

@EST09
Copy link
Author

EST09 commented Apr 18, 2024

Hi,

Thank you so much for getting back. That's really great to hear!

  1. Yes, I was hoping it would be one segmentation. I have both objects as shapes in the spatialdata object so can show them separately atm anyway.
  2. Yes, all already aligned so I have a geodataframe containing the original segmentation and a smaller geodataframe containing the custom segmentation, both in the dapi "global" coordinate space. The geometry column for each is a list of polygons corresponding to each cell.
  3. That would be nice but not crucial as the custom segmentation is for a particular cell type that is easily identifiable in H&E anyway. The Xenium segmentation just fails on them so I want to replace it for these cells.
  4. Yes please, this would be amazing. Xenium fails on segmenting these cells so the reason for using the custom segmentation derived from H&E was to be able to use them in anndata etc going forward.

I made a small schematic to help,

Screenshot 2024-04-18 at 08 28 45

It would be really great if the functionalities already in SOPA, thank you so much!

Best wishes,
Emily

@quentinblampey
Copy link
Collaborator

Thanks for providing these details! I confirm that this can be done with Sopa, but it requires combining a few different functions from the API

I'll add a new function to make it easy to use (actually, I also need this for some of my projects, so I needed to do implement it anyway).
I'll let you know, but this shouldn't take too much time, and it will be released in sopa==1.0.11

@EST09
Copy link
Author

EST09 commented Apr 19, 2024

Thank you so much, that would be brilliant if possible!

@EST09
Copy link
Author

EST09 commented Apr 19, 2024

Hi,

I managed to do a quick workaround for now using (leaves small artefact cells - was going to try and merge with cell with closest nuclei centroid later)

overlaid = sdata["cell_boundaries"].overlay(sdata["custom_segmentation"], how='difference')
frames = [overlaid, sdata["custom_segmentation"]
result = pd.concat(frames)
new_cell_boundaries_gdf = gpd.GeoDataFrame(result, columns=["geometry"])
sdata.shapes["new_cell_boundaries"] = ShapesModel.parse(new_cell_boundaries_gdf)

and then used the api reference you gave to update the table (thank you!)

aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_mip", shapes_key="new_cell_boundaries")
aggregator.update_table(gene_column="feature_name", average_intensities=False)

I'd like to export this back into Xenium explorer so everyone in the group can assess how well the segmentation is doing easily. I tried

sopa.io.explorer.write("file_path", sdata=sdata, shapes_key="new_cell_boundaries", image_key="x6", gene_column="feature_name")
#x6 is my h&E image

but I get the error

[INFO] (sopa.io.explorer.table) Writing table with 541 columns
[INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide
[INFO] (sopa.io.explorer.shapes) Writing 4493 cell polygons

.......

AttributeError: 'MultiPolygon' object has no attribute 'exterior'

My sdata["transcripts"] looks like this

Screenshot 2024-04-19 at 17 47 44

Please may I just check if it's expecting something else? Or if there's something else I'm doing that's causing the error that you can see?

Thank you very much for all your help

Best wishes,
Emily

quentinblampey added a commit that referenced this issue Apr 22, 2024
@quentinblampey
Copy link
Collaborator

quentinblampey commented Apr 22, 2024

Hello @EST09, I think it's because at least one of your shapes is a MultiPolygon, while I expect to have only Polygon. This probably happened during your overlay, which can indeed create artefacts.

Meanwhile, I implemented the feature, and you can test it on master (it will soon be released, but I want to perform more checks).

If you want to try it, here is some code to help you:

import sopa.io
from sopa.segmentation import overlay_segmentation

gene_column = "feature_name"

overlay_segmentation(sdata, shapes_key="custom_segmentation", gene_column=gene_column)

NB: all cells with a high overlap with the custom segmentation are removed in favour of the new segmentation (use area_ratio_threshold in the overlay_segmentation function to control this behaviour). Also, I only count transcripts for the new cells, which should be much faster than re-computing all transcripts again.

You should then have a new shapes_key inside sdata, and you can run the function below to update the Xenium Explorer. Note that I provide mode="+cob", in order to only update the transcript counts (c), the cells obs (o), and the boundaries (b): it will be much faster to compute.

shapes_key = ... # something like "cell_boundaries+custom_segmentation"

sopa.io.write(
    "file_path", sdata=sdata, shapes_key=shapes_key, gene_column=gene_column, mode="+cob"
)

quentinblampey added a commit that referenced this issue Apr 22, 2024
@EST09
Copy link
Author

EST09 commented Apr 23, 2024

Thank you so much! That's fantastic, esp. the updating transcripts in only new cells! I'll try it out tomorrow. You were totally right about Multipolygons, ended up using geopandas explode to solve it, this seems a lot quicker though). Thanks!

@quentinblampey
Copy link
Collaborator

Hello @EST09, just wanted to let you know that I released sopa==1.0.11, which contains this feature. You can read the documentation of this function here.
Concerning the Explorer update, you can refer to this function.
Hope it solves your issue :)

@EST09
Copy link
Author

EST09 commented May 9, 2024

Sorry for the delay in getting back, thank you a lot for all your help with this!

@quentinblampey
Copy link
Collaborator

Hello @EST09, is everything working as expected? Should I close the issue?

@EST09
Copy link
Author

EST09 commented May 13, 2024

Hi,

I actually ended up using a slightly different segmentation in the end, to fill in the gaps around the custom cells, but used your aggregator to compile the table.

I did try your segmentation at first and all worked as expected, thank you, although I end up with a slightly different table of counts. I don't know if it's a quirk of my Xenium segmentation but in order to get the aggregator to work (for either your or my changed segmentation) I needed to make the cells valid (using .make_valid()). This transformed some cells into multipolygons and then I chose largest polygon in the multipolygon going forward to represent that cell. After using your aggregator, and checking only the unchanged cells I end up with slightly different counts per cell. I'm putting this down to choosing the largest polygon but I do need to investigate further/not sure if you've come across this issue?

I also had a quick question about which transcripts are used in the aggregation. I've set gene_name as feature_name and this gives me 541 (as it includes the control probes) rather than the 377 genes. I ended up just subsetting the anndata back down to the 377 genes but was wondering if you knew if there was a column that did this directly?

#https://gustaveroussy.github.io/sopa/tutorials/api_usage/
aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_focus", shapes_key="new_cell_boundaries", overwrite=False)
#need to stop it changing the indices
aggregator.compute_table(gene_column="feature_name", average_intensities=False)

Thank you! I think it's basically closable - I think the issues are minor or a quirk of my data I need to investigate but would be interested if you've come across these before - thank you!

Best wishes,
Emily

@quentinblampey
Copy link
Collaborator

Hi,

Indeed it's weird to have a different transcript count. Just to make sure I understand what you did: you tried the two aproaches below, right?

  1. Add the shapes (as polygons) to the spatialdata object and run the Aggregator on these cells
  2. Add the shapes (as polygons) to the spatialdata object and use the overlay_segmentation function

The overlay_segmentation function is also using the aggregator internally so it's unexpected to have different results.

Concerning your last question, no I think there is no way to prevent this for now. So, subsetting the anndata object is needed, even though it may be inconvenient, I agree...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants