Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmony 64bit #2655

Merged
merged 4 commits into from Sep 14, 2023
Merged

Harmony 64bit #2655

merged 4 commits into from Sep 14, 2023

Conversation

Intron7
Copy link
Member

@Intron7 Intron7 commented Sep 5, 2023

This PR makes harmony_integrate run with 64 bit floats. This makes it reproducible for neighbors and therefore downstream clustering

@Intron7 Intron7 added the Area – External Dealing with external tools label Sep 5, 2023
@Intron7 Intron7 added this to the 1.10.0 milestone Sep 5, 2023
@Intron7 Intron7 requested a review from Zethson September 5, 2023 14:17
@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Merging #2655 (9e142c8) into master (3a50e60) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2655   +/-   ##
=======================================
  Coverage   72.16%   72.16%           
=======================================
  Files         108      108           
  Lines       11906    11908    +2     
=======================================
+ Hits         8592     8594    +2     
  Misses       3314     3314           
Files Changed Coverage Δ
scanpy/external/pp/_harmony_integrate.py 81.81% <100.00%> (+4.04%) ⬆️

@Intron7 Intron7 merged commit 414092f into master Sep 14, 2023
10 checks passed
@Intron7 Intron7 deleted the harmony-64 branch September 14, 2023 08:48
@WeipengMO
Copy link

WeipengMO commented Nov 22, 2023

Hi @Intron7 ,

Due to the hardware's binary floating-point arithmetic, the result is still not reproducible well.

d92f6a78d8bfd589eaea16e2bd03112

Can I use the np.round() function to round off the number to the correct number of decimal places:

res = harmony_out.Z_corr.T
res = np.round(res, decimals=5)
6517988fc076f499cbcbce5d7880458

@Intron7
Copy link
Member Author

Intron7 commented Nov 22, 2023

@WeipengMO if you calculate it like this you are right. However when we move from 64Bit to 32Bit for neighbors the results are reproducible at least to the best of my testing. I would still be open to round the results. @flying-sheep what do you think?

@WeipengMO
Copy link

Thank you for your reply @Intron7 . I tried computing the neighbor (sc.pp.neighbors(adata_merge,use_rep='X_pca_harmony') ) without rounding (np.round()), while the results were not reproducible 🤣. That's why I'm thinking about floating-point arithmetic.

@Intron7
Copy link
Member Author

Intron7 commented Nov 22, 2023

Can you give me the full code you ran for testing and the results from numpy testing for
np.testing.assert_array_equal(adata.obsm["X_pca_harmony"], adata2.obsm["X_pca_harmony"])
np.testing.assert_array_equal(adata.obsp["connectivities"].data, adata2.obsp["connectivities"].data).
The first one should give you an error. The second one shouldn't. How big is your dataset?
Please note that if you use scanpy 1.9.6 that changes of this PR won't have taken effect yet.

@WeipengMO
Copy link

WeipengMO commented Nov 22, 2023

The size of my dataset is:

AnnData object with n_obs × n_vars = 19091 × 23315

Here is the full code:

import scanpy as sc
import pandas as pd
import numpy as np

from anndata import AnnData

def harmony_integrate(
    adata: AnnData,
    key: str,
    basis: str = "X_pca",
    adjusted_basis: str = "X_pca_harmony",
    **kwargs,
):
    try:
        import harmonypy
    except ImportError:
        raise ImportError("\nplease install harmonypy:\n\n\tpip install harmonypy")

    X = adata.obsm[basis].astype(np.float64)

    harmony_out = harmonypy.run_harmony(X, adata.obs, key, **kwargs)

    adata.obsm[adjusted_basis] = harmony_out.Z_corr.T

adata = sc.read_h5ad('adata.h5ad')

adata_merge = adata.copy()
adata_merge.X = adata_merge.layers['counts']
sc.experimental.pp.highly_variable_genes(adata_merge, n_top_genes=3000, batch_key='batch')

adata_merge = adata_merge[:, adata_merge.var['highly_variable']].copy()
sc.experimental.pp.normalize_pearson_residuals(adata_merge)
adata_merge.layers['apr'] = adata_merge.X.copy()
sc.tl.pca(adata_merge, svd_solver="arpack")
adata_merge.obsm['X_pca_30'] = adata_merge.obsm['X_pca'][:, :30]

adata1 = adata_merge.copy()
adata2 = adata_merge.copy()

The frist test:

# scanpy 1.9.6 that changes of this PR won't have taken effect yet.
# I copy the harmony_integrate from https://github.com/scverse/scanpy/blob/75cb4e750efaccc1413cb204ffa49d21db017079/scanpy/external/pp/_harmony_integrate.py
harmony_integrate(adata1, key='batch', basis='X_pca_30')
harmony_integrate(adata2, key='batch', basis='X_pca_30')
np.testing.assert_array_equal(adata1.obsm["X_pca_harmony"], adata2.obsm["X_pca_harmony"])

It raised the Error:

AssertionError: 
Arrays are not equal

Mismatched elements: 567291 [/](https://vscode-remote+ssh-002dremote-002bnansha.vscode-resource.vscode-cdn.net/) 572730 (99.1%)
Max absolute difference: 1.20792265e-12
Max relative difference: 4.37537551e-09
 x: array([[-0.954048, -7.21621 , -1.601975, ...,  0.059509, -0.436056,
         0.564897],
       [-1.145477, 10.185449,  4.414117, ..., -0.087394, -1.327791,...
 y: array([[-0.954048, -7.21621 , -1.601975, ...,  0.059509, -0.436056,
         0.564897],
       [-1.145477, 10.185449,  4.414117, ..., -0.087394, -1.327791,...

The second test:

sc.pp.neighbors(adata1, n_pcs=30, use_rep='X_pca_harmony')
sc.pp.neighbors(adata2, n_pcs=30, use_rep='X_pca_harmony')
np.testing.assert_array_equal(adata1.obsp["connectivities"].data, adata2.obsp["connectivities"].data)

It raised the Error:

AssertionError: 
Arrays are not equal

Mismatched elements: 268636 [/](https://vscode-remote+ssh-002dremote-002bnansha.vscode-resource.vscode-cdn.net/) 434492 (61.8%)
Max absolute difference: 0.99820393
Max relative difference: 810.4644
 x: array([0.158963, 0.206843, 0.234457, ..., 0.095996, 0.179325, 1.      ],
      dtype=float32)
 y: array([0.158963, 0.206843, 0.234457, ..., 0.095996, 0.179324, 1.      ],
      dtype=float32)

This is my session_info:

Click to view session information
-----
anndata             0.9.2
loguru              0.7.2
matplotlib          3.8.0
numpy               1.26.0
pandas              1.4.3
scanpy              1.9.6
seaborn             0.12.2
session_info        1.0.0
-----
Click to view modules imported as dependencies
PIL                 9.4.0
argcomplete         NA
asttokens           NA
attr                23.1.0
awkward             2.4.2
awkward_cpp         NA
backcall            0.2.0
cffi                1.15.1
comm                0.1.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.8.0
decorator           5.1.1
dot_parser          NA
etils               1.4.1
exceptiongroup      1.1.3
executing           1.2.0
get_annotations     NA
gmpy2               2.1.2
h5py                3.9.0
harmonypy           NA
igraph              0.10.8
importlib_metadata  NA
importlib_resources NA
ipykernel           6.25.2
ipywidgets          8.1.1
jax                 0.4.20
jaxlib              0.4.20
jedi                0.19.0
joblib              1.2.0
kiwisolver          1.4.4
leidenalg           0.10.1
llvmlite            0.41.1
ml_dtypes           0.2.0
mpl_toolkits        NA
mpmath              1.3.0
natsort             8.4.0
numba               0.58.1
nvfuser             NA
opt_einsum          v3.0.0
packaging           23.1
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.10.0
prompt_toolkit      3.0.39
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pyarrow             13.0.0
pycparser           2.21
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pydot               1.4.2
pygments            2.16.1
pynndescent         0.5.5
pyparsing           3.0.9
pytz                2023.3.post1
rich                NA
scipy               1.10.1
setuptools          68.0.0
six                 1.16.0
sklearn             1.3.2
sparse              0.14.0
stack_data          0.6.2
statsmodels         0.13.1
sympy               1.11.1
texttable           1.6.7
threadpoolctl       3.2.0
torch               2.0.1
tornado             6.3.3
tqdm                4.66.1
traitlets           5.10.0
typing_extensions   NA
umap                0.5.3
wcwidth             0.2.6
yaml                6.0
zipp                NA
zmq                 25.1.1
-----
IPython             8.15.0
jupyter_client      8.3.1
jupyter_core        5.3.1
-----
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
Linux-6.2.0-36-generic-x86_64-with-glibc2.35
-----
Session information updated at 2023-11-23 00:08

I uploaded the ipynb file as attachment. 👇

harmony_test.ipynb.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area – External Dealing with external tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants