Combining MuData's – concat function #20

ivirshup · 2022-02-23T12:52:08Z

It's possible I'm not seeing them, but should there be concat (like anndata.concat) functionality here?

Maybe also merge (like scverse/anndata#658)? But that could be a separate issue.

The text was updated successfully, but these errors were encountered:

cc36 · 2022-04-21T17:49:29Z

Hello,

I wanted to ask what is the best way to combine several samples in a MuData object and it seems like this existing issue points in that direction.

The approach I usually take for combining multiple AnnData object does not seem to work here:

holder = []

for n in folders:
    holder.append(mu.read_10x_h5("/home/jovyan/data/Multiome/DNAP/"+n+"/filtered_feature_bc_matrix.h5"))

adata = holder[0].concatenate(holder[1:], join='outer', index_unique=None)

Any help would be highly appreciated.

Thanks!

ivirshup · 2022-04-21T19:08:27Z

I think the general approach would be to deconstruct the MuData into its constituent AnnData's, concatenate those with anndata.concat, and then put those into a new MuData.

@bio-la, did you have a function working here that you could share?

cc36 · 2022-04-22T17:10:29Z

Thanks. I have tried the approach suggested, i.e. deconstructing into the constituent AnnData objects and concatenating those and it works well except that the AnnData.uns['files'] and AnnData.uns['atac'] information is lost in the concatenation.

I have tried using the uns_merge argument from the AnnData.concatenate function (https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html#anndata.AnnData.concatenate) but it does not seem to help in this case.

Do you have any suggestion for this?

Thank you in advance!

ivirshup · 2022-04-22T18:21:28Z

I think this gets a bit more complicated. I'm unsure if there's going to be a good way to do this that plays well with muon.atac, though @mffrank or @gtca would be able to comment better.

I'm assuming you want to use the data in those fields downstream. How would you want those fields to be merged?

bio-la · 2022-04-24T12:42:10Z

@cc36 why are you trying to concatenate multiple atac anndata/mudata? I'm assuming you are talking about the atac.uns.xxx slots that are filled with fragments and peaks files by reading any single multiome 10x run with mu.read_10x_h5, but unless you have called peaks together on the original samples it doesn't make sense to concatenate peaks and files from separate folders.
i am not sure what would be the analytical tool that lets you call peaks from multiple samples using the same background fragment distribution and still output separate 10x-folders (samples). normally at the end of the aggregation step (joint peak calling) you would have one count matrix, one fragment matrix, one peak file and so on.

so, the behaviour you describe (losing those peaks and fragment files) is actually preventing you from doing something that would give you a false peak distribution per sample.
it may be that I'm missing something here, could you please expand on what exactly are you trying to do by concatenating multiple atac (multiome) anndata/mudata?
thanks!

cc36 · 2022-04-25T15:43:03Z

@bio-la Thanks for your reply. You are right, I need to use the joint peak calling output, which I have not done and will now do. You can resolve this issue. Thanks a lot for your help!

Zethson · 2022-05-25T08:17:46Z

(Fat fingers, sorry)

sruthi-hub · 2022-11-18T00:29:09Z

I am new to working with scATACseq. Would appreciate if @cc36 @bio-la @ivirshup one of you could share a few lines of code that ensures that there's no false peak distribution. Thanks!

gtca · 2022-12-08T10:02:12Z

@sruthi-hub Hey, if this question is still relevant, could you elaborate on what the false peak distribution actually means?
If this is about peak properties, they can be quantified and visualised as for instance shown in this tutorial.

ChaseTaylor939 · 2023-03-10T21:37:35Z

I'm having a similar issue when I try to concatenate two different multiome datasets. The RNA concatenates just fine, but the ATAC loses lots of metadata when I concatenate and the n_vars goes down to 13. I'm sorry, but I do not understand what @bio-la meant in their earlier explanation. Could someone provide some code on how they combine two or more multiome datasets?

Thank you!

gtca · 2023-06-01T20:25:23Z

Hey @ChaseTaylor939,

Concatenation is performed as described with inner join (for features) by default:

mod1 = AnnData(np.random.normal(size=(10,5)))
mod2 = AnnData(np.random.normal(size=(10,3)))
mod2.var_names
# Index(['0', '1', '2'], dtype='object')
anndata.concat([mod1, mod2]).shape
# => (20, 3)

I can assume peaks were called individually for each dataset (m9164_atac and m9412_atac), and 13 is the number of peaks that happen to have exactly the same definitions (chrN:XXX-YYY) across the samples then.
For peak-based analysis, peaks have to be either called jointly or merged across samples with special procedures.

aichander · 2023-06-06T19:10:52Z

+1 to having some inbuilt functionality that lets us concatenate 2 mudata objects with shared indices.

lijxug · 2023-09-13T12:30:11Z

Any progress on this issue? Or should we do what ivirshup sugested?

gtca · 2023-09-13T13:23:47Z

Scheduled for mudata v0.3, which is in progress (#56), @lijxug!

Just to make it clear, this is about concatenation as in anndata.concat, which is not aware of genomic intervals, etc.

gtca · 2024-07-02T01:01:25Z

Concatenation based on anndata.concat should now work since v0.3. But this is a new API so please report any issues with it!

ivirshup added the enhancement New feature or request label Feb 23, 2022

ivirshup mentioned this issue Apr 27, 2022

Combined datasets single-cell-data/mams#5

Open

Zethson closed this as completed May 25, 2022

Zethson reopened this May 25, 2022

gtca mentioned this issue Jul 1, 2022

concatenate muon objects scverse/muon#64

Open

gtca mentioned this issue Jan 3, 2023

How to concatenate 2 mdata? #35

Closed

grst mentioned this issue Apr 6, 2023

Concatenation of multimodal data scverse/scverse-tutorials#49

Merged

gtca added this to the v0.3.0 milestone Sep 13, 2023

gtca mentioned this issue Sep 13, 2023

How can I concatenate multiple muon data samples? scverse/muon#127

Closed

gtca linked a pull request Sep 21, 2023 that will close this issue

mudata.concat() #58

Merged

gtca closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining MuData's – concat function #20

Combining MuData's – concat function #20

ivirshup commented Feb 23, 2022

cc36 commented Apr 21, 2022 •

edited

Loading

ivirshup commented Apr 21, 2022

cc36 commented Apr 22, 2022

ivirshup commented Apr 22, 2022

bio-la commented Apr 24, 2022 •

edited

Loading

cc36 commented Apr 25, 2022

Zethson commented May 25, 2022

sruthi-hub commented Nov 18, 2022

gtca commented Dec 8, 2022 •

edited

Loading

ChaseTaylor939 commented Mar 10, 2023 •

edited

Loading

gtca commented Jun 1, 2023

aichander commented Jun 6, 2023

lijxug commented Sep 13, 2023

gtca commented Sep 13, 2023

gtca commented Jul 2, 2024

Combining MuData's – concat function #20

Combining MuData's – concat function #20

Comments

ivirshup commented Feb 23, 2022

cc36 commented Apr 21, 2022 • edited Loading

ivirshup commented Apr 21, 2022

cc36 commented Apr 22, 2022

ivirshup commented Apr 22, 2022

bio-la commented Apr 24, 2022 • edited Loading

cc36 commented Apr 25, 2022

Zethson commented May 25, 2022

sruthi-hub commented Nov 18, 2022

gtca commented Dec 8, 2022 • edited Loading

ChaseTaylor939 commented Mar 10, 2023 • edited Loading

gtca commented Jun 1, 2023

aichander commented Jun 6, 2023

lijxug commented Sep 13, 2023

gtca commented Sep 13, 2023

gtca commented Jul 2, 2024

cc36 commented Apr 21, 2022 •

edited

Loading

bio-la commented Apr 24, 2022 •

edited

Loading

gtca commented Dec 8, 2022 •

edited

Loading

ChaseTaylor939 commented Mar 10, 2023 •

edited

Loading