Read multiple 10X files #267

cartal · 2018-09-21T14:50:47Z

Hi,

Maybe this is somewhere in the manual and I just don't see it. But is there a way to read multiple 10X samples (either multiple .h5 or the matrix/genes/barcodes) in the same way that Seurat does with its Read10X() function?

falexwolf · 2018-09-26T00:41:26Z

I don't know how they do it Seurat, but I'd simply do

filenames = ['name0.h5', 'name1.h5', 'name2.h5']
adatas = [sc.read_10x_h5(filename) for filename in filenames]
adata = adatas[0].concatenate(adatas[1:])

Does this help?

cartal · 2018-09-26T04:58:15Z

Hi, thanks for the reply.

This example helps already. Thanks. I was thinking more about importing multiple samples from 10X where for each sample you have a folder containing the three files (matrix, barcodes, genes). But I guess I can do something to convert those into .h5 prior to read them into scanpy.

falexwolf · 2018-09-26T12:40:57Z

You can do the same as above using sc.read_10x_mtx, which is not in a release yet but on GitHub's Master branch. In .concatenate() you have the option to pass how you want to name your batches/samples by passing batch_categories.

PS: Note that I edited the example above to show sc.read_10x_h5.

cartal · 2018-09-26T15:40:00Z

Many thanks!!!

elfore · 2019-04-26T03:53:18Z

Hi falexwolf,

I try to use concatenate to read multiple 10X mtx and put them together.
But it seems like if I concatenate more than 15 mtx(already stored and read from cache), it becomes very slow. Do you have any advice?
Thanks for any information you may provide.

aditisk · 2020-03-24T02:06:25Z

Hi @falexwolf, thanks for the solution you provided above for reading multiple files. I tried it and it worked when I had just 2 files. I am trying the same code with 23 files and I am getting an error message in the concatenation step. Any idea on how to fix this ? Thanks.

AttributeError Traceback (most recent call last)
in
12 adatas.obs['cell_names'] = pd.read_csv(path + sample + 'barcodes.tsv.gz', header=None)[0].values
13
---> 14 adata = adatas[0].concatenate(adatas[1:])

/Applications/anaconda3/lib/python3.7/site-packages/anndata/core/anndata.py in concatenate(self, join, batch_key, batch_categories, index_unique, *adatas)
1908
1909 if any_sparse:
-> 1910 sparse_format = all_adatas[0].X.getformat()
1911 X = X.asformat(sparse_format)
1912

AttributeError: 'numpy.ndarray' object has no attribute 'getformat'

aditisk · 2020-03-24T17:24:04Z

Hi @elfore, were you able to concatenate your files successfully ? If yes, could you please share the code you used for concatenation ? Thanks.

taopeng1100 · 2020-05-01T05:02:59Z

If I do this : adata = adata1.concatenate (adata2, adata3). How can I keep the original sample names in adata? Thx!

ivirshup · 2020-05-01T05:55:38Z

@taopeng1100, this should work:

adata = adata1.concatenate(adata2, adata3, index_unique=None)

BrianLohman · 2022-01-21T22:24:40Z

Hello,

I am having problems with reading in multiple h5 files using the code snipped that was posted by falexwolf. I am doing:

filenames = ['./a.h5', './b.h5', './c.h5', './d.h5']
adatas = [sc.read_10x_h5(filename, gex_only = True) for filename in filenames]
adata = adatas[0].concatenate(adatas[1:], batch_key='gene_ids', batch_categories=filenames)

With or without the batch_key and batch_categories arguments I get the same error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-e23ba2ca6e37> in <module>
      1 filenames = ['./a.h5', './b.h5', './c.h5', './d.h5']
      2 adatas = [sc.read_10x_h5(filename, gex_only = True) for filename in filenames]
----> 3 adata = adatas[0].concatenate(adatas[1:], batch_key='gene_ids')

~/anaconda3/lib/python3.7/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1764             fill_value=fill_value,
   1765             index_unique=index_unique,
-> 1766             pairwise=False,
   1767         )
   1768 

~/anaconda3/lib/python3.7/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    817     # Annotation for other axis
    818     alt_annot = merge_dataframes(
--> 819         [getattr(a, alt_dim) for a in adatas], alt_indices, merge
    820     )
    821 

~/anaconda3/lib/python3.7/site-packages/anndata/_core/merge.py in merge_dataframes(dfs, new_index, merge_strategy)
    529     dfs: Iterable[pd.DataFrame], new_index, merge_strategy=merge_unique
    530 ) -> pd.DataFrame:
--> 531     dfs = [df.reindex(index=new_index) for df in dfs]
    532     # New dataframe with all shared data
    533     new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)

~/anaconda3/lib/python3.7/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    529     dfs: Iterable[pd.DataFrame], new_index, merge_strategy=merge_unique
    530 ) -> pd.DataFrame:
--> 531     dfs = [df.reindex(index=new_index) for df in dfs]
    532     # New dataframe with all shared data
    533     new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)

~/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    310         @wraps(func)
    311         def wrapper(*args, **kwargs) -> Callable[..., Any]:
--> 312             return func(*args, **kwargs)
    313 
    314         kind = inspect.Parameter.POSITIONAL_OR_KEYWORD

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in reindex(self, *args, **kwargs)
   4174         kwargs.pop("axis", None)
   4175         kwargs.pop("labels", None)
-> 4176         return super().reindex(**kwargs)
   4177 
   4178     def drop(

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4810         # perform the reindex on the axes
   4811         return self._reindex_axes(
-> 4812             axes, level, limit, tolerance, method, fill_value, copy
   4813         ).__finalize__(self, method="reindex")
   4814 

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4021         if index is not None:
   4022             frame = frame._reindex_index(
-> 4023                 index, method, copy, level, fill_value, limit, tolerance
   4024             )
   4025 

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
   4043             copy=copy,
   4044             fill_value=fill_value,
-> 4045             allow_dups=False,
   4046         )
   4047 

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4881                 fill_value=fill_value,
   4882                 allow_dups=allow_dups,
-> 4883                 copy=copy,
   4884             )
   4885             # If we've made a copy once, no need to make another one

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice)
   1299         # some axes don't allow reindexing with dups
   1300         if not allow_dups:
-> 1301             self.axes[axis]._can_reindex(indexer)
   1302 
   1303         if axis >= self.ndim:

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
   3475         # trying to reindex on an axis with duplicates
   3476         if not self._index_as_unique and len(indexer):
-> 3477             raise ValueError("cannot reindex from a duplicate axis")
   3478 
   3479     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

ValueError: cannot reindex from a duplicate axis

Loading a single h5 file works and produces expected output:

a = sc.read_10x_h5('./a.h5', gex_only = True)
a
AnnData object with n_obs × n_vars = 7474 × 31053
    var: 'gene_ids', 'feature_types', 'genome'

So the input files appear to be valid I just can't get them to concatenate to a single object.

Any ideas would be welcome.

xiaozhangPrivate · 2022-08-30T13:50:17Z

Hi @BrianLohman , I had the same problem, and this might help you.
Just run adata.var_names_make_unique() before concatenate.
filenames = ["a","b","c","d"] adatas = [] for filename in filenames: adata = sc.read_10x_h5(filename) adata.var_names_make_unique() adatas.append(adata) adata = adatas[0].concatenate(adatas[1:])

dhairya02 · 2023-02-15T14:38:15Z

HI, I tried to do what you suggested but I am getting an error saying ValueError: only one regex group is supported with Index.
I have multiple h5ad files with varying n_obs × n_vars. Here is my code:

batch_names = []
for i in range(len(adatas)):
  adatas[i].var_names_make_unique()
  batch_names.append(filenames[i].split('.')[0])
  print(i,adatas[i])

adata = adatas[0].concatenate(adatas[1:],
                              batch_key = 'ID',
                              uns_merge="unique",
                              index_unique=None,
                              batch_categories=batch_names)

and this produces the above error. Can anyone help?

cartal closed this as completed Sep 26, 2018

aditisk mentioned this issue Mar 26, 2020

Anndata concatenation is giving an error scverse/anndata#347

Closed

grst mentioned this issue Apr 22, 2020

scirpy.read_10x_vdj(path, filtered=True) scverse/scirpy#121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read multiple 10X files #267

Read multiple 10X files #267

cartal commented Sep 21, 2018

falexwolf commented Sep 26, 2018 •

edited

Loading

cartal commented Sep 26, 2018

falexwolf commented Sep 26, 2018 •

edited

Loading

cartal commented Sep 26, 2018

elfore commented Apr 26, 2019

aditisk commented Mar 24, 2020

aditisk commented Mar 24, 2020

taopeng1100 commented May 1, 2020

ivirshup commented May 1, 2020

BrianLohman commented Jan 21, 2022

xiaozhangPrivate commented Aug 30, 2022

dhairya02 commented Feb 15, 2023 •

edited

Loading

Read multiple 10X files #267

Read multiple 10X files #267

Comments

cartal commented Sep 21, 2018

falexwolf commented Sep 26, 2018 • edited Loading

cartal commented Sep 26, 2018

falexwolf commented Sep 26, 2018 • edited Loading

cartal commented Sep 26, 2018

elfore commented Apr 26, 2019

aditisk commented Mar 24, 2020

aditisk commented Mar 24, 2020

taopeng1100 commented May 1, 2020

ivirshup commented May 1, 2020

BrianLohman commented Jan 21, 2022

xiaozhangPrivate commented Aug 30, 2022

dhairya02 commented Feb 15, 2023 • edited Loading

falexwolf commented Sep 26, 2018 •

edited

Loading

falexwolf commented Sep 26, 2018 •

edited

Loading

dhairya02 commented Feb 15, 2023 •

edited

Loading