Skip to content

feat: remove ligrec parallelize#1125

Open
selmanozleyen wants to merge 15 commits intoscverse:mainfrom
selmanozleyen:feat/remove-ligrec-parallelize
Open

feat: remove ligrec parallelize#1125
selmanozleyen wants to merge 15 commits intoscverse:mainfrom
selmanozleyen:feat/remove-ligrec-parallelize

Conversation

@selmanozleyen
Copy link
Member

@selmanozleyen selmanozleyen commented Feb 23, 2026

benchmark code

main results
pr results

Results compared:

python benchmarks/bench_ligrec.py --compare benchmarks/results


scenario         n_jobs   main (s)     PR (s)  speedup   change
---------------------------------------------------------------
large                 1      2.548      3.865    0.66x  -51.7%
large                 4      1.561      1.259    1.24x +  19.3%
large                 8      1.706      1.122    1.52x +  34.2%
many_perms            1      1.620      2.247    0.72x  -38.7%
many_perms            4      0.924      0.637    1.45x +  31.1%
many_perms            8      0.966      0.647    1.49x +  33.1%
medium                1      0.261      0.425    0.61x  -62.8%
medium                4      0.581      0.183    3.18x +  68.5%
medium                8      0.687      0.173    3.97x +  74.8%
xlarge                1      8.139      9.259    0.88x  -13.8%
xlarge                4     10.498      3.077    3.41x +  70.7%
xlarge                8     10.188      2.768    3.68x +  72.8%

both faster and cleaner code. this removes parallelize.

update: the reason main is faster when n_jobs=1 is because main sets also numba_parallel=True so it's because it's still numba parallel even though it's one process.

@selmanozleyen selmanozleyen force-pushed the feat/remove-ligrec-parallelize branch from 9fe8f25 to 4a60ef3 Compare March 2, 2026 11:02
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 64.47368% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.90%. Comparing base (9690a55) to head (165544d).

Files with missing lines Patch % Lines
src/squidpy/gr/_ligrec.py 63.23% 25 Missing ⚠️
src/squidpy/_utils.py 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1125      +/-   ##
==========================================
- Coverage   74.05%   73.90%   -0.16%     
==========================================
  Files          39       39              
  Lines        6495     6510      +15     
  Branches     1122     1122              
==========================================
+ Hits         4810     4811       +1     
- Misses       1230     1249      +19     
+ Partials      455      450       -5     
Files with missing lines Coverage Δ
src/squidpy/_utils.py 57.94% <75.00%> (+0.72%) ⬆️
src/squidpy/gr/_ligrec.py 74.18% <63.23%> (-3.41%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@selmanozleyen selmanozleyen force-pushed the feat/remove-ligrec-parallelize branch from d1f752c to 6230aed Compare March 11, 2026 14:00
@selmanozleyen selmanozleyen changed the title Feat/remove ligrec parallelize feat: remove ligrec parallelize Mar 11, 2026
@selmanozleyen selmanozleyen marked this pull request as ready for review March 11, 2026 14:25
@selmanozleyen selmanozleyen requested a review from timtreis March 11, 2026 14:25
@selmanozleyen selmanozleyen marked this pull request as draft March 11, 2026 14:28
@selmanozleyen selmanozleyen removed the request for review from timtreis March 11, 2026 14:28
@selmanozleyen selmanozleyen marked this pull request as ready for review March 11, 2026 14:55
)


@njit(nogil=True, cache=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not parallel=True + prange? Because this is being run in a thread pool? Why not just make every individual step parallel?
https://numba.pydata.org/numba-doc/dev/user/parallel.html?highlight=njit#explicit-parallel-loops

Would this require rewriting into a reduction of some sort to prevent overlapping writes?

I see in the benchmarks that the speedups with more jobs is not really scaling linearly, which is not what I would expect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the benchmarks that the speedups with more jobs is not really scaling linearly, which is not what I would expect.

Thats a good point worth investigating

Comment on lines +716 to +733
def _worker(t: int) -> NDArrayA:
local_counts = np.zeros((n_inter, n_cpairs), dtype=np.int64)
rs = np.random.RandomState(None if seed is None else t + seed)
perm = clustering.copy()
for _ in range(chunk_sizes[t]):
rs.shuffle(perm)
_score_permutation(
data_arr,
perm,
inv_counts,
mean_obs,
interactions,
interaction_clusters,
valid,
local_counts,
)
pbar.update(1)
return local_counts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this also be numba-ified with an outer-loop of some sort? Why do we still need a thread pool? I thought "one giant kernel" was the goal

Is shuffling not parallelizable? Certainly there are ways around this like argsort + randomindices or somethign? Other than that, I don't really see why therange(chunk_sizes[t]) couldn't be parallelized. Is it the validity of local_counts? Seems like there should be ways around this

Copy link
Member Author

@selmanozleyen selmanozleyen Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to have a responsive progress bar and to have the same shuffling results as old version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit more

  1. Why is the "same results" thing a hard blocker? clustering seems small so copy+shuffle should be cheap as a pre-processing step i.e., do all the "shuffle" stuff ahead of time / outside numba
  2. Would you expect a giant kernel to be faster? My gut is "yes" given Severin's experience/our experience with co_occurrence but I'm all ears

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants