-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining MuData's – concat function #20
Comments
Hello, I wanted to ask what is the best way to combine several samples in a MuData object and it seems like this existing issue points in that direction. The approach I usually take for combining multiple AnnData object does not seem to work here:
Any help would be highly appreciated. Thanks! |
I think the general approach would be to deconstruct the MuData into its constituent @bio-la, did you have a function working here that you could share? |
Thanks. I have tried the approach suggested, i.e. deconstructing into the constituent AnnData objects and concatenating those and it works well except that the AnnData.uns['files'] and AnnData.uns['atac'] information is lost in the concatenation. I have tried using the uns_merge argument from the AnnData.concatenate function (https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html#anndata.AnnData.concatenate) but it does not seem to help in this case. Do you have any suggestion for this? Thank you in advance! |
@cc36 why are you trying to concatenate multiple atac anndata/mudata? I'm assuming you are talking about the atac.uns.xxx slots that are filled with fragments and peaks files by reading any single multiome 10x run with mu.read_10x_h5, but unless you have called peaks together on the original samples it doesn't make sense to concatenate peaks and files from separate folders. so, the behaviour you describe (losing those peaks and fragment files) is actually preventing you from doing something that would give you a false peak distribution per sample. |
@bio-la Thanks for your reply. You are right, I need to use the joint peak calling output, which I have not done and will now do. You can resolve this issue. Thanks a lot for your help! |
(Fat fingers, sorry) |
@sruthi-hub Hey, if this question is still relevant, could you elaborate on what the false peak distribution actually means? |
I'm having a similar issue when I try to concatenate two different multiome datasets. The RNA concatenates just fine, but the ATAC loses lots of metadata when I concatenate and the n_vars goes down to 13. I'm sorry, but I do not understand what @bio-la meant in their earlier explanation. Could someone provide some code on how they combine two or more multiome datasets? Thank you! |
Hey @ChaseTaylor939, Concatenation is performed as described with inner join (for features) by default: mod1 = AnnData(np.random.normal(size=(10,5)))
mod2 = AnnData(np.random.normal(size=(10,3)))
mod2.var_names
# Index(['0', '1', '2'], dtype='object')
anndata.concat([mod1, mod2]).shape
# => (20, 3) I can assume peaks were called individually for each dataset ( |
+1 to having some inbuilt functionality that lets us concatenate 2 mudata objects with shared indices. |
Any progress on this issue? Or should we do what ivirshup sugested? |
Concatenation based on |
It's possible I'm not seeing them, but should there be
concat
(like anndata.concat) functionality here?Maybe also
merge
(like scverse/anndata#658)? But that could be a separate issue.The text was updated successfully, but these errors were encountered: