Skip to content

score_genes does not work on a dataset loaded with backed="r+" #883

@mfiers

Description

@mfiers

Hi - excellent software, thanks! -

but I do have a problem. If i load a disk backed dataset, I cannot run sc.tl.score_genes.

Given these two sets:

ad = sc.read_h5ad('scdataset.h5ad', backed='r+')
ad2 = sc.read_h5ad('scdataset.h5ad')

and

random_genes = list(ad.var_names.to_series().sample(100))

this works perfectly:

sc.tl.score_genes(ad2, random, score_name="random100", random_state=42)

but, this:

sc.tl.score_genes(ad, random, score_name="random100", random_state=42)

yields the following error:

-----------------------------------------------------------------
ValueError                      Traceback (most recent call last)
<ipython-input-113-9cb28e089b25> in <module>
----> 1 sc.tl.score_genes(ad, random, score_name="random100", random_state=42)

~/.pyenv/versions/mfpy372/lib/python3.7/site-packages/scanpy/tools/_score_genes.py in score_genes(adata, gene_list, ctrl_size, gene_pool, n_bins, score_name, random_state, copy, use_raw)
     90     else:
     91         obs_avg = pd.Series(
---> 92             np.nanmean(_adata[:, gene_pool].X, axis=0), index=gene_pool)  # average expression of genes
     93 
     94     obs_avg = obs_avg[np.isfinite(obs_avg)] # Sometimes (and I don't know how) missing data may be there, with nansfor

<__array_function__ internals> in nanmean(*args, **kwargs)

~/.pyenv/versions/mfpy372/lib/python3.7/site-packages/numpy/lib/nanfunctions.py in nanmean(a, axis, dtype, out, keepdims)
    949     cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=keepdims)
    950     tot = np.sum(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
--> 951     avg = _divide_by_count(tot, cnt, out=out)
    952 
    953     isbad = (cnt == 0)

~/.pyenv/versions/mfpy372/lib/python3.7/site-packages/numpy/lib/nanfunctions.py in _divide_by_count(a, b, out)
    216         else:
    217             if out is None:
--> 218                 return a.dtype.type(a / b)
    219             else:
    220                 # This is questionable, but currently a numpy scalar can

ValueError: setting an array element with a sequence.

thanks
Mark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions