Skip to content

perf: memory usage for dask pca#4126

Open
ilan-gold wants to merge 3 commits into
mainfrom
ig/parallel_pca
Open

perf: memory usage for dask pca#4126
ilan-gold wants to merge 3 commits into
mainfrom
ig/parallel_pca

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented May 15, 2026

This should give a very nice memory performance by not having the intermediate product of x.T @ x in memory. I am not sure whether the upper-triangular performance benefit is real or not - It seems to appear real on the tahoe dataset but I'm not 100% sure given how large the dataset is and how reliant this pipeline is on i/o.

The failing test here is a warning because the kernel is not parallel so I see two options there:

  1. silence in fau
  2. Use @njit(nogil=True, nopython=True) manually (since this is only ever run from dask anyway)
  • Closes #
  • Tests included or not required because:

@ilan-gold ilan-gold added this to the 1.12.2 milestone May 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2317 1 2316 460
View the top 1 failed test(s) by shortest run time
tests/test_pca.py::test_pca_transform_randomized[dask_array_sparse-1d_chunked-csr_matrix]
Stack Traces | 0.354s run time
array_type = <function gen_csr_csc_params_wrapper.<locals>.wrapper at 0x7f4afc246d40>

    #x1B[0m#x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_pca_transform_randomized#x1B[39;49;00m(array_type):#x1B[90m#x1B[39;49;00m
        adata = AnnData(array_type(A_list).astype(#x1B[33m"#x1B[39;49;00m#x1B[33mfloat32#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m))#x1B[90m#x1B[39;49;00m
        a_pca_abs = np.abs(A_pca)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(adata.X, DaskArray) #x1B[95mand#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(adata.X._meta, CSBase):#x1B[90m#x1B[39;49;00m
            ctx = pytest.warns(#x1B[90m#x1B[39;49;00m
                #x1B[96mUserWarning#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
                match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring svd_solver=#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33m when using a sparse dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m
        #x1B[94melif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(adata.X, CSBase):#x1B[90m#x1B[39;49;00m
            ctx = pytest.warns(#x1B[96mUserWarning#x1B[39;49;00m, match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring.*#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        #x1B[94melse#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            ctx = nullcontext()#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        warnings.filterwarnings(#x1B[33m"#x1B[39;49;00m#x1B[33merror#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
>       #x1B[94mwith#x1B[39;49;00m ctx:#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE       numba.core.errors.NumbaPerformanceWarning: #x1B[0m
#x1B[1m#x1B[31mE       The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.#x1B[0m
#x1B[1m#x1B[31mE       #x1B[0m
#x1B[1m#x1B[31mE       To find out why, try turning on parallel diagnostics, see https://numba.readthedocs..../stable/user/parallel.html#diagnostics for help.#x1B[0m
#x1B[1m#x1B[31mE       #x1B[0m
#x1B[1m#x1B[31mE       File ".../preprocessing/_pca/_kernels.py", line 15:#x1B[0m
#x1B[1m#x1B[31mE       @njit#x1B[0m
#x1B[1m#x1B[31mE       def _csr_gram_upper_triangular(#x1B[0m
#x1B[1m#x1B[31mE       ^#x1B[0m

#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:255: NumbaPerformanceWarning

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@ilan-gold ilan-gold requested a review from flying-sheep May 15, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant