Feedback on concatenate() #541

grst · 2024-04-08T10:00:28Z

While I in the end was able to concatenate the data the way I like, the user experience wasn't as great as I had hoped, so wanted to drop some feedback. As I'm not that familiar with spatialdata yet, it might be that there are already better solutions -- please let me know if there are.

Starting situation

I have ~20 Visium Cytassist samples from a clinical trial processed with nf-core/spatialtranscriptomics (using the nf-core/spatialvi#67 branch that already uses spatialdata). The pipeline generates a single .zarr folder for each sample.

Desired outcome

I would like to have all samples in a single SpatialData object. The AnnData table should contain the gene expression from all samples.

Pain points

sd.concatenate enforces that the input is a list. Is there a reason this can't accept any Sequence type (e.g. dict_values)?
Usually, I pass a dictionary sample_id -> AnnData to anndata.concat, which nicely makes unique obs_names in combination with concat(..., index_unique="_"). This doesn't work with spatialdata.concatenate, which leaves me with either manipulating the obs_names for each object before concatenation, or ugly obs names with numeric sufficies (e.g. AACTCAACCTTGACCA-1_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0). IMO it would be great to support a dict as input to spatialdata.concatenate, too.

The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. I currently rename them like this:

sdatas_vis = {}

for _, row in tqdm(samplesheet.iterrows(), total=samplesheet.shape[0]):
   sample = row["sample"]
   tmp_sd = sd.read_zarr(sample_path / sample / "data" / "sdata_processed.zarr")
   tmp_sd.tables["table"].obs = tmp_sd.tables["table"].obs.assign(**row)
   tmp_sd.tables["table"].obs["region"] = sample
   tmp_sd.tables["table"].uns["spatialdata_attrs"]["region"] = sample
   # rename images
   tmp_sd.images[f"{sample}_hires"] = tmp_sd.images["visium_hires_image"]
   tmp_sd.images[f"{sample}_lowres"] = tmp_sd.images["visium_lowres_image"]
   del tmp_sd.images["visium_hires_image"]
   del tmp_sd.images["visium_lowres_image"]
   # rename shapes
   tmp_sd.shapes[f"{sample}"] = tmp_sd.shapes["visium"]
   del tmp_sd.shapes["visium"]

   sdatas_vis[sample] = tmp_sd

which seems a bit cumbersome. I'm wondering if there's a better solution or what's the intended way of handling such cases. It could also be worth adding a process to the nf-core/spatialtranscriptomics pipeline that already does the concatenation step.

The text was updated successfully, but these errors were encountered:

melonora · 2024-04-15T13:12:30Z

I am a bit swamped at the moment, but I will look into implementing your suggestions. As you said it would be worthwhile to handle dicts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback on concatenate() #541

Feedback on concatenate() #541

grst commented Apr 8, 2024

melonora commented Apr 15, 2024

Feedback on concatenate() #541

Feedback on concatenate() #541

Comments

grst commented Apr 8, 2024

Starting situation

Desired outcome

Pain points

melonora commented Apr 15, 2024