Harmony 64bit #2655

Intron7 · 2023-09-05T14:15:12Z

This PR makes harmony_integrate run with 64 bit floats. This makes it reproducible for neighbors and therefore downstream clustering

codecov · 2023-09-05T14:35:53Z

Codecov Report

Merging #2655 (9e142c8) into master (3a50e60) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2655   +/-   ##
=======================================
  Coverage   72.16%   72.16%           
=======================================
  Files         108      108           
  Lines       11906    11908    +2     
=======================================
+ Hits         8592     8594    +2     
  Misses       3314     3314

Files Changed	Coverage Δ
scanpy/external/pp/_harmony_integrate.py	`81.81% <100.00%> (+4.04%)`	⬆️

WeipengMO · 2023-11-22T04:36:51Z

Hi @Intron7 ,

Due to the hardware's binary floating-point arithmetic, the result is still not reproducible well.

Can I use the np.round() function to round off the number to the correct number of decimal places:

res = harmony_out.Z_corr.T
res = np.round(res, decimals=5)

Intron7 · 2023-11-22T09:23:37Z

@WeipengMO if you calculate it like this you are right. However when we move from 64Bit to 32Bit for neighbors the results are reproducible at least to the best of my testing. I would still be open to round the results. @flying-sheep what do you think?

WeipengMO · 2023-11-22T10:25:00Z

Thank you for your reply @Intron7 . I tried computing the neighbor (sc.pp.neighbors(adata_merge,use_rep='X_pca_harmony') ) without rounding (np.round()), while the results were not reproducible 🤣. That's why I'm thinking about floating-point arithmetic.

Intron7 · 2023-11-22T12:54:36Z

Can you give me the full code you ran for testing and the results from numpy testing for
np.testing.assert_array_equal(adata.obsm["X_pca_harmony"], adata2.obsm["X_pca_harmony"])
np.testing.assert_array_equal(adata.obsp["connectivities"].data, adata2.obsp["connectivities"].data).
The first one should give you an error. The second one shouldn't. How big is your dataset?
Please note that if you use scanpy 1.9.6 that changes of this PR won't have taken effect yet.

WeipengMO · 2023-11-22T16:22:28Z

The size of my dataset is:

AnnData object with n_obs × n_vars = 19091 × 23315

Here is the full code:

import scanpy as sc
import pandas as pd
import numpy as np

from anndata import AnnData

def harmony_integrate(
    adata: AnnData,
    key: str,
    basis: str = "X_pca",
    adjusted_basis: str = "X_pca_harmony",
    **kwargs,
):
    try:
        import harmonypy
    except ImportError:
        raise ImportError("\nplease install harmonypy:\n\n\tpip install harmonypy")

    X = adata.obsm[basis].astype(np.float64)

    harmony_out = harmonypy.run_harmony(X, adata.obs, key, **kwargs)

    adata.obsm[adjusted_basis] = harmony_out.Z_corr.T

adata = sc.read_h5ad('adata.h5ad')

adata_merge = adata.copy()
adata_merge.X = adata_merge.layers['counts']
sc.experimental.pp.highly_variable_genes(adata_merge, n_top_genes=3000, batch_key='batch')

adata_merge = adata_merge[:, adata_merge.var['highly_variable']].copy()
sc.experimental.pp.normalize_pearson_residuals(adata_merge)
adata_merge.layers['apr'] = adata_merge.X.copy()
sc.tl.pca(adata_merge, svd_solver="arpack")
adata_merge.obsm['X_pca_30'] = adata_merge.obsm['X_pca'][:, :30]

adata1 = adata_merge.copy()
adata2 = adata_merge.copy()

The frist test:

# scanpy 1.9.6 that changes of this PR won't have taken effect yet.
# I copy the harmony_integrate from https://github.com/scverse/scanpy/blob/75cb4e750efaccc1413cb204ffa49d21db017079/scanpy/external/pp/_harmony_integrate.py
harmony_integrate(adata1, key='batch', basis='X_pca_30')
harmony_integrate(adata2, key='batch', basis='X_pca_30')
np.testing.assert_array_equal(adata1.obsm["X_pca_harmony"], adata2.obsm["X_pca_harmony"])

It raised the Error:

AssertionError: 
Arrays are not equal

Mismatched elements: 567291 [/](https://vscode-remote+ssh-002dremote-002bnansha.vscode-resource.vscode-cdn.net/) 572730 (99.1%)
Max absolute difference: 1.20792265e-12
Max relative difference: 4.37537551e-09
 x: array([[-0.954048, -7.21621 , -1.601975, ...,  0.059509, -0.436056,
         0.564897],
       [-1.145477, 10.185449,  4.414117, ..., -0.087394, -1.327791,...
 y: array([[-0.954048, -7.21621 , -1.601975, ...,  0.059509, -0.436056,
         0.564897],
       [-1.145477, 10.185449,  4.414117, ..., -0.087394, -1.327791,...

The second test:

sc.pp.neighbors(adata1, n_pcs=30, use_rep='X_pca_harmony')
sc.pp.neighbors(adata2, n_pcs=30, use_rep='X_pca_harmony')
np.testing.assert_array_equal(adata1.obsp["connectivities"].data, adata2.obsp["connectivities"].data)

It raised the Error:

AssertionError: 
Arrays are not equal

Mismatched elements: 268636 [/](https://vscode-remote+ssh-002dremote-002bnansha.vscode-resource.vscode-cdn.net/) 434492 (61.8%)
Max absolute difference: 0.99820393
Max relative difference: 810.4644
 x: array([0.158963, 0.206843, 0.234457, ..., 0.095996, 0.179325, 1.      ],
      dtype=float32)
 y: array([0.158963, 0.206843, 0.234457, ..., 0.095996, 0.179324, 1.      ],
      dtype=float32)

This is my session_info:

Click to view session information
-----
anndata             0.9.2
loguru              0.7.2
matplotlib          3.8.0
numpy               1.26.0
pandas              1.4.3
scanpy              1.9.6
seaborn             0.12.2
session_info        1.0.0
-----
Click to view modules imported as dependencies
PIL                 9.4.0
argcomplete         NA
asttokens           NA
attr                23.1.0
awkward             2.4.2
awkward_cpp         NA
backcall            0.2.0
cffi                1.15.1
comm                0.1.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.8.0
decorator           5.1.1
dot_parser          NA
etils               1.4.1
exceptiongroup      1.1.3
executing           1.2.0
get_annotations     NA
gmpy2               2.1.2
h5py                3.9.0
harmonypy           NA
igraph              0.10.8
importlib_metadata  NA
importlib_resources NA
ipykernel           6.25.2
ipywidgets          8.1.1
jax                 0.4.20
jaxlib              0.4.20
jedi                0.19.0
joblib              1.2.0
kiwisolver          1.4.4
leidenalg           0.10.1
llvmlite            0.41.1
ml_dtypes           0.2.0
mpl_toolkits        NA
mpmath              1.3.0
natsort             8.4.0
numba               0.58.1
nvfuser             NA
opt_einsum          v3.0.0
packaging           23.1
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.10.0
prompt_toolkit      3.0.39
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pyarrow             13.0.0
pycparser           2.21
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pydot               1.4.2
pygments            2.16.1
pynndescent         0.5.5
pyparsing           3.0.9
pytz                2023.3.post1
rich                NA
scipy               1.10.1
setuptools          68.0.0
six                 1.16.0
sklearn             1.3.2
sparse              0.14.0
stack_data          0.6.2
statsmodels         0.13.1
sympy               1.11.1
texttable           1.6.7
threadpoolctl       3.2.0
torch               2.0.1
tornado             6.3.3
tqdm                4.66.1
traitlets           5.10.0
typing_extensions   NA
umap                0.5.3
wcwidth             0.2.6
yaml                6.0
zipp                NA
zmq                 25.1.1
-----
IPython             8.15.0
jupyter_client      8.3.1
jupyter_core        5.3.1
-----
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
Linux-6.2.0-36-generic-x86_64-with-glibc2.35
-----
Session information updated at 2023-11-23 00:08

I uploaded the ipynb file as attachment. 👇

harmony_test.ipynb.zip

Intron7 added 2 commits September 5, 2023 16:05

harmony runs with 64 bit

d6a5aab

add harmony

afdc1db

Intron7 added the Area – External Dealing with external tools label Sep 5, 2023

Intron7 added this to the 1.10.0 milestone Sep 5, 2023

Intron7 added 2 commits September 5, 2023 16:16

Update 1.10.0.md

df1c96f

Merge branch 'master' into harmony-64

9e142c8

Intron7 requested a review from Zethson September 5, 2023 14:17

Intron7 merged commit 414092f into master Sep 14, 2023
10 checks passed

Intron7 deleted the harmony-64 branch September 14, 2023 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmony 64bit #2655

Harmony 64bit #2655

Intron7 commented Sep 5, 2023

codecov bot commented Sep 5, 2023

WeipengMO commented Nov 22, 2023 •

edited

Intron7 commented Nov 22, 2023

WeipengMO commented Nov 22, 2023

Intron7 commented Nov 22, 2023

WeipengMO commented Nov 22, 2023 •

edited

Harmony 64bit #2655

Harmony 64bit #2655

Conversation

Intron7 commented Sep 5, 2023

codecov bot commented Sep 5, 2023

Codecov Report

WeipengMO commented Nov 22, 2023 • edited

Intron7 commented Nov 22, 2023

WeipengMO commented Nov 22, 2023

Intron7 commented Nov 22, 2023

WeipengMO commented Nov 22, 2023 • edited

WeipengMO commented Nov 22, 2023 •

edited

WeipengMO commented Nov 22, 2023 •

edited