Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on concatenate() #541

Open
grst opened this issue Apr 8, 2024 · 5 comments
Open

Feedback on concatenate() #541

grst opened this issue Apr 8, 2024 · 5 comments

Comments

@grst
Copy link
Contributor

grst commented Apr 8, 2024

While I in the end was able to concatenate the data the way I like, the user experience wasn't as great as I had hoped, so wanted to drop some feedback. As I'm not that familiar with spatialdata yet, it might be that there are already better solutions -- please let me know if there are.

Starting situation

I have ~20 Visium Cytassist samples from a clinical trial processed with nf-core/spatialtranscriptomics (using the nf-core/spatialvi#67 branch that already uses spatialdata). The pipeline generates a single .zarr folder for each sample.

Desired outcome

I would like to have all samples in a single SpatialData object. The AnnData table should contain the gene expression from all samples.

Pain points

  • sd.concatenate enforces that the input is a list. Is there a reason this can't accept any Sequence type (e.g. dict_values)?

  • Usually, I pass a dictionary sample_id -> AnnData to anndata.concat, which nicely makes unique obs_names in combination with concat(..., index_unique="_"). This doesn't work with spatialdata.concatenate, which leaves me with either manipulating the obs_names for each object before concatenation, or ugly obs names with numeric sufficies (e.g. AACTCAACCTTGACCA-1_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0). IMO it would be great to support a dict as input to spatialdata.concatenate, too.

  • The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. I currently rename them like this:

    sdatas_vis = {}
    
    for _, row in tqdm(samplesheet.iterrows(), total=samplesheet.shape[0]):
       sample = row["sample"]
       tmp_sd = sd.read_zarr(sample_path / sample / "data" / "sdata_processed.zarr")
       tmp_sd.tables["table"].obs = tmp_sd.tables["table"].obs.assign(**row)
       tmp_sd.tables["table"].obs["region"] = sample
       tmp_sd.tables["table"].uns["spatialdata_attrs"]["region"] = sample
       # rename images
       tmp_sd.images[f"{sample}_hires"] = tmp_sd.images["visium_hires_image"]
       tmp_sd.images[f"{sample}_lowres"] = tmp_sd.images["visium_lowres_image"]
       del tmp_sd.images["visium_hires_image"]
       del tmp_sd.images["visium_lowres_image"]
       # rename shapes
       tmp_sd.shapes[f"{sample}"] = tmp_sd.shapes["visium"]
       del tmp_sd.shapes["visium"]
    
       sdatas_vis[sample] = tmp_sd

    which seems a bit cumbersome. I'm wondering if there's a better solution or what's the intended way of handling such cases. It could also be worth adding a process to the nf-core/spatialtranscriptomics pipeline that already does the concatenation step.

@melonora
Copy link
Collaborator

I am a bit swamped at the moment, but I will look into implementing your suggestions. As you said it would be worthwhile to handle dicts.

@wangjiawen2013
Copy link

I have the same issue !
The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. So when I concatenate them, an keyerror occurred: KeyError: 'Images must have unique names across the SpatialData objects to concatenate'

@wangjiawen2013
Copy link

And it's better to have a way to retrieve (subset) each objects from the concatenated objects.

@LucaMarconato
Copy link
Member

@wangjiawen2013 does SpatialData.subset() works for your use case or you would improve something?

@wangjiawen2013
Copy link

wangjiawen2013 commented Aug 7, 2024

What I mean is how to concatenate multi spatialdata objects and subset each objects from the concatenated objects according to the sample names (each object have a unique name) again. The SpatialData objects from xenium all have the same names for images, shapes and coordinate systems, so I cannot concatenate them because KeyError: Images must have unique names across the SpatialData objects to concatenate.
SpatialData.subset() can only get elements, not objects.
We can concatenate and subset anndata objects well, what i mean is to concatenate and subset spatialdata objects like anndata objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants