Skip to content

Use joblib for robust parallel import scanning#209

Merged
seddonym merged 1 commit intopython-grimp:joblib-multiprocessingfrom
Peter554:joblib
Apr 23, 2025
Merged

Use joblib for robust parallel import scanning#209
seddonym merged 1 commit intopython-grimp:joblib-multiprocessingfrom
Peter554:joblib

Conversation

@Peter554
Copy link
Collaborator

https://joblib.readthedocs.io/en/stable/parallel.html

Joblib takes some things for us. Relevant here:

  • Robust calculation for number of available CPUs.
  • Sequential calculation when n_jobs = 1.

And likely other minor things I don't even understand (joblib has put a lot of thought into multi-processing, so we don't have to).


# This is an arbitrary number, but setting it too low slows down our functional tests considerably.
MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50
MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPLE_PROCESSES = 64
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐼🐼🐼 My OCD => 64 is a power of two, just feels nicer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should include this change here, at least not without something in the changelog as it could have an impact on end users. On balance I'd prefer to concentrate on fixing the issue without making another change at the same time.



# This is an arbitrary number, but setting it too low slows down our functional tests considerably.
MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MULTIPLE_PROCESSES instead of MULTIPROCESSING to avoid confusion with multiprocessing module

module_files_tuple = tuple(module_files)

number_of_module_files = len(module_files_tuple)
n_chunks = _decide_number_of_of_processes(number_of_module_files)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo (should really have put that in a separate commit...)

# Don't incur the overhead of multiple processes.
return 1
return min(multiprocessing.cpu_count(), number_of_module_files)
return min(joblib.cpu_count(), number_of_module_files)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -260 to -263
if number_of_processes == 1:
# No need to spawn a process if there's only one chunk.
[chunk] = chunks
return _scan_chunk(import_scanner, exclude_type_checking_imports, chunk)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this special number_of_processes == 1 case anymore - joblib will do this automatically. https://joblib.readthedocs.io/en/stable/generated/joblib.Parallel.html#joblib.Parallel

If 1 is given, no parallel computing code is used at all, and the behavior amounts to a simple python for loop.

@Peter554 Peter554 marked this pull request as ready for review April 14, 2025 14:28
@codspeed-hq
Copy link

codspeed-hq bot commented Apr 14, 2025

CodSpeed Instrumentation Performance Report

Merging #209 will not alter performance

Comparing Peter554:joblib (d56af94) with master (6b2dd86)

Summary

✅ 22 untouched benchmarks

https://joblib.readthedocs.io/en/stable/parallel.html

Joblib takes some things for us. Relevant here:

* Robust calculation for number of available CPUs.
* Sequential calculation when n_jobs = 1.

And likely other minor things I don't even understand.
@Peter554 Peter554 changed the title Use joblib for more robust parallel import scanning Use joblib for robust parallel import scanning Apr 14, 2025
@seddonym seddonym changed the base branch from master to joblib-multiprocessing April 23, 2025 11:45
@seddonym seddonym merged commit 417753f into python-grimp:joblib-multiprocessing Apr 23, 2025
18 checks passed
Copy link
Collaborator

@seddonym seddonym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - I've merged into a branch in my repo and will make a couple of tweaks before merging to master.


# This is an arbitrary number, but setting it too low slows down our functional tests considerably.
MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50
MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPLE_PROCESSES = 64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should include this change here, at least not without something in the changelog as it could have an impact on end users. On balance I'd prefer to concentrate on fixing the issue without making another change at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants