Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale #2457

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Scale #2457

wants to merge 13 commits into from

Conversation

Sakshi-2797
Copy link

@Sakshi-2797 Sakshi-2797 commented Mar 24, 2023

This contribution holds modifications to the existing 'scale function' which turns out to be a faster implementation than the original one. We have introduced flavors - default and use_fastpp , where use_fastpp is our faster implementation of the scale function.The scale function modification provides an overall speedup of approx ~2x in comparison to the already existing 'scale' function.
Usage : sc.pp.scale(adata,max_value=10,flavor='use_fastpp')

@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Merging #2457 (e749022) into master (f03c5b4) will decrease coverage by 0.12%.
The diff coverage is 25.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2457      +/-   ##
==========================================
- Coverage   71.88%   71.76%   -0.12%     
==========================================
  Files          98       98              
  Lines       11510    11537      +27     
==========================================
+ Hits         8274     8280       +6     
- Misses       3236     3257      +21     
Impacted Files Coverage Δ
scanpy/preprocessing/_simple.py 74.59% <25.00%> (-3.52%) ⬇️

@Zethson
Copy link
Member

Zethson commented Apr 17, 2023

@Sakshi-2797 is there a reason why you'd introduce a flavor here instead of outright replacing the implementation?

@Sakshi-2797
Copy link
Author

Hi @Zethson ,
The reason why we introduced flavors here is that we wanted the traditional implementation to be present in case anyone wanted to use it. We just introduced our implementation as a faster alternative to the already available one. In case replacing the code is required, we can do that as well.

@Zethson
Copy link
Member

Zethson commented May 3, 2023

Unfortunately, I run into

__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________

flavor = 'use_fastpp'

    @pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
    def test_scale(flavor):
        adata = pbmc68k_reduced()
        adata.X = adata.raw.X
        v = adata[:, 0 : adata.shape[1] // 2]
        # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
        assert v.is_view
        with pytest.warns(Warning, match="view"):
>           sc.pp.scale(v, flavor=flavor)

scanpy/tests/test_preprocessing.py:127: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
scanpy/preprocessing/_simple.py:888: in scale_anndata
    X, adata.var["mean"], adata.var["std"] = do_scale(
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args
    error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n')
issue_type = 'typing'

    def error_rewrite(e, issue_type):
        """
        Rewrite and raise Exception `e` with help supplied based on the
        specified issue_type.
        """
        if config.SHOW_HELP:
            help_msg = errors.error_extras[issue_type]
            e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
        if config.FULL_TRACEBACKS:
            raise e
        else:
>           raise e.with_traceback(None)
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           non-precise type pyobject
E           During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763)
E           
E           File "scanpy/preprocessing/_simple.py", line 763:
E           def do_scale(X, maxv, nthr):
E               <source elided>
E               # t0= time.time()
E               s = np.zeros((nthr, X.shape[1]))
E               ^ 
E           
E           This error may have been caused by the following argument(s):
E           - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'>

../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingError

When trying to use the new flavor with the existing test.

@Sakshi-2797
Copy link
Author

Unfortunately, I run into

__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________

flavor = 'use_fastpp'

    @pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
    def test_scale(flavor):
        adata = pbmc68k_reduced()
        adata.X = adata.raw.X
        v = adata[:, 0 : adata.shape[1] // 2]
        # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
        assert v.is_view
        with pytest.warns(Warning, match="view"):
>           sc.pp.scale(v, flavor=flavor)

scanpy/tests/test_preprocessing.py:127: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
scanpy/preprocessing/_simple.py:888: in scale_anndata
    X, adata.var["mean"], adata.var["std"] = do_scale(
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args
    error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n')
issue_type = 'typing'

    def error_rewrite(e, issue_type):
        """
        Rewrite and raise Exception `e` with help supplied based on the
        specified issue_type.
        """
        if config.SHOW_HELP:
            help_msg = errors.error_extras[issue_type]
            e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
        if config.FULL_TRACEBACKS:
            raise e
        else:
>           raise e.with_traceback(None)
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           non-precise type pyobject
E           During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763)
E           
E           File "scanpy/preprocessing/_simple.py", line 763:
E           def do_scale(X, maxv, nthr):
E               <source elided>
E               # t0= time.time()
E               s = np.zeros((nthr, X.shape[1]))
E               ^ 
E           
E           This error may have been caused by the following argument(s):
E           - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'>

../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingError

When trying to use the new flavor with the existing test.

Hi @Zethson ,
We are not able to see this issue with the latest commit. Can you please retry with the latest commit in scale branch.

Signed-off-by: zethson <lukas.heumos@posteo.net>
@Zethson
Copy link
Member

Zethson commented May 9, 2023

Please adapt the corresponding test to:

@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
def test_scale(flavor):
    adata = pbmc68k_reduced()
    adata.X = adata.raw.X
    v = adata[:, 0 : adata.shape[1] // 2]
    # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
    assert v.is_view
    with pytest.warns(Warning, match="view"):
        sc.pp.scale(v, flavor=flavor)
    assert not v.is_view
    assert_allclose(v.X.var(axis=0), np.ones(v.shape[1]), atol=0.01)
    assert_allclose(v.X.mean(axis=0), np.zeros(v.shape[1]), atol=0.00001)

It fails for me with FAILED scanpy/tests/test_preprocessing.py::test_scale[use_fastpp] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)

@Zethson
Copy link
Member

Zethson commented May 10, 2023

I pushed to your branch. It failed yesterday while Github was having issues. The test should pass.

@Zethson
Copy link
Member

Zethson commented May 24, 2023

@Sakshi-2797 were you able to find the fix to make it work or do you need more information? I'm happy to take over your branch as soon as it works to glue things together

@Sakshi-2797
Copy link
Author

@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])

Hi @Zethson ,
I tried running the modified testcase mentioned above , but it seems it is failing because sparse matrix is being passed in it as a parameter. As of now, our scale function is not implemented for the sparse matrices. It is expected that these tests will fail.

@Zethson
Copy link
Member

Zethson commented May 30, 2023

Sparse matrices are absolutely essential for single-cell to ensure that RAM usage is kept low. Is there a way to make your implementation work with sparse matrices?

@sanchit-misra
Copy link

Sorry for the delay in responding here. Our resources got pulled into other projects. We are looking to make it work with sparse matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants