Skip to content

Combat populates adata.X with NANs so sc.pp.highly_variable_genes function outputs error #1172

@auesro

Description

@auesro

After running sc.pp.combat(adata, key='sample'), adata.X is full of NaNs and sc.pp.highly_variagle_genes(adata) fails. No issues (and no NaNs) if NOT running the combat correction.

sc.pp.combat(adata, key='sample')
sc.pp.highly_variable_genes(adata)
In [1]: sc.pp.combat(adata, key='sample')
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
  "(https://pypi.org/project/six/).", FutureWarning)
scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0
Standardizing Data across genes.

Found 11 batches

Found 0 numerical variables:
	

Found 3 genes with zero variance.
Fitting L/S model and finding priors

Finding parametric adjustments

Adjusting data

/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: invalid value encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: divide by zero encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())

In [2]: sc.pp.highly_variable_genes(adata)
extracting highly variable genes
Traceback (most recent call last):

  File "<ipython-input-2-7727f5f928cd>", line 1, in <module>
    sc.pp.highly_variable_genes(adata)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 235, in highly_variable_genes
    flavor=flavor,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 65, in _highly_variable_genes_single_batch
    df['mean_bin'] = pd.cut(df['means'], bins=n_bins)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 265, in cut
    duplicates=duplicates,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 381, in _bins_to_cuts
    f"Bin edges must be unique: {repr(bins)}.\n"

ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Versions:

scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions