Speed up import scanning by 400-500% by duzumaki · Pull Request #142 · python-grimp/grimp

duzumaki · 2024-02-12T01:15:58Z

Tested on a large mono repo (35,000 modules) using a mac m1 (8 cores). The modules are divided amongst the cores on the system it runs.
so each core is doing an even share of ast walking

profiled using Cprofile, pstats(to read the profile dump) and snakeviz(for the visualisation)

Current(~37s):

after parallelisation(~8s):

duzumaki · 2024-02-12T01:33:00Z

tests/functional/test_error_handling.py

+def test_syntax_error_terminates_executor_pool():
+    with pytest.raises(BrokenProcessPool):


I guess this is one disadvantage of concurrency in python. using the .map() method in the executor results in the inability to capture errors raised from individual tasks

i've also tried using executor.submit(), with my own chunking logic in order to try capture the exception as well but any exception that isn't on the top level function get_imports_by_module, won't propagate to the context object:

with ProcessPoolExecutor() as executor:

you only get this generic
BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.') error

one way around this might be to change the raises in the tasks to returns, then check the result accordingly

It would be a shame to lose that exception - returning any syntax errors (they could even be exception objects) sounds like it is at least worth a try.

seddonym · 2024-02-12T15:49:30Z

src/grimp/application/usecases.py

-                )
-            imports_by_module[module] = direct_imports
+        with ProcessPoolExecutor() as executor:
+            chunk_size = ceil(len(found_package.module_files) / executor._max_workers) or 1


Would be helpful to have a comment here explaining our reasoning for the chunk size.

seddonym · 2024-02-12T15:50:23Z

Very exciting!

Have left a couple of initial comments, also the tests are failing which would be good to sort out.

seddonym · 2024-02-13T11:11:31Z

src/grimp/application/usecases.py

+    import_scanner: AbstractImportScanner,
+    exclude_type_checking_imports: bool,
+    cache: caching.Cache,
+):


Missing type annotation on return value.

wesleykendall · 2024-07-01T10:54:27Z

I checked out this branch and installed it locally. It dramatically slowed down graph building time on a large test repo (0.6 seconds -> 35 seconds).

On a Mac M2. I verified that my local installation of the master branch on this repo yielded fast results, so I don't think I'm doing anything wrong on the installation.

Seems like something is wrong. When I set max_workers to 1 on the process executor, build time was around 10 seconds. max_workers of 2 slowed down to 14 seconds. Let me know if there is another way I can profile

seddonym · 2025-02-07T09:11:27Z

I'm going to close this branch because we are close to having a Rust implementation of the graph, and so I think the next step will be to move to Rust for building the graph. It'll be easier to parallelise there because we can free ourselves from the GIL and do multithreading.

But thanks for your efforts, much appreciated!

duzumaki marked this pull request as draft February 12, 2024 01:18

duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from a86ed83 to 9ac45a1 Compare February 12, 2024 01:24

duzumaki marked this pull request as ready for review February 12, 2024 01:31

duzumaki commented Feb 12, 2024

View reviewed changes

seddonym reviewed Feb 12, 2024

View reviewed changes

duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from cddb258 to 914d301 Compare February 12, 2024 19:24

duzumaki added 2 commits February 12, 2024 19:40

Speed up import scanning

f01f901

Update test

cecca20

duzumaki force-pushed the speed_up_build_graph branch from 914d301 to cecca20 Compare February 12, 2024 19:41

seddonym reviewed Feb 13, 2024

View reviewed changes

seddonym closed this Feb 7, 2025

Peter554 mentioned this pull request Apr 4, 2025

Parallel import scanning (python) #198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up import scanning by 400-500%#142

Speed up import scanning by 400-500%#142
duzumaki wants to merge 2 commits intopython-grimp:masterfrom
duzumaki:speed_up_build_graph

duzumaki commented Feb 12, 2024 •

edited

Loading

Uh oh!

duzumaki Feb 12, 2024 •

edited

Loading

Uh oh!

seddonym Feb 12, 2024

Uh oh!

seddonym Feb 12, 2024

Uh oh!

seddonym commented Feb 12, 2024

Uh oh!

seddonym Feb 13, 2024

Uh oh!

wesleykendall commented Jul 1, 2024 •

edited

Loading

Uh oh!

seddonym commented Feb 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		def test_syntax_error_terminates_executor_pool():
		with pytest.raises(BrokenProcessPool):

Conversation

duzumaki commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duzumaki Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seddonym Feb 12, 2024

Choose a reason for hiding this comment

Uh oh!

seddonym Feb 12, 2024

Choose a reason for hiding this comment

Uh oh!

seddonym commented Feb 12, 2024

Uh oh!

seddonym Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

wesleykendall commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seddonym commented Feb 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

duzumaki commented Feb 12, 2024 •

edited

Loading

duzumaki Feb 12, 2024 •

edited

Loading

wesleykendall commented Jul 1, 2024 •

edited

Loading