[Draft] FEAT: extend profiling to child processes#431
Open
TTsangSC wants to merge 44 commits intopyutils:mainfrom
Open
[Draft] FEAT: extend profiling to child processes#431TTsangSC wants to merge 44 commits intopyutils:mainfrom
TTsangSC wants to merge 44 commits intopyutils:mainfrom
Conversation
Collaborator
Author
|
Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens. |
2cd2ed4 to
f9a37af
Compare
- `line_profiler/curated_profiling.py`
New module for setting up profiling in a curated environment
- `ClassifiedPreimportTargets.from_targets()`
Method for creating a `ClassifiedPreimportTargets` instance,
facilitating writing pre-import modules in a replicable and portable
manner
- `ClassifiedPreimportTargets.write_preimport_module()`
Method for writing a pre-import module based on an instance;
also fixed bug where the body of the written module was intercepted
without appearing in the debug output
- `kernprof.py`
- `_gather_preimport_targets()`
Migrated to `line_profiler.curated_profiling`
- `_write_preimports()`
Now using the new `ClassifiedPreimportTargets` class, moving esp.
the logic to the `write_preimport_module()` method
- `kernprof.py::_manage_profiler` `line_profiler/curated_profiling.py::CuratedProfilerContext` New context-manager classes for handling profiler setup and teardown - `kernprof.py::_pre_profile()` Refactored into the above context managers and other private functions (`_prepare_profiler()`, `_prepare_exec_script()`)
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
New class for passing info onto child processes so that profiling
can resume there
line_profiler/pth_hook.py
New submodule for the .pth-file-based solution to propagating
profiling into child processes:
write_pth_hook()
In the main process, write the temporary .pth file to be loaded
in child processes
load_pth_hook()
Called by the .pth in child process, loading the cache and
setting up profiling based thereon
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
Added new `.profile_imports` attribute to correspond to `kernprof`'s
`--prof-imports` flag
line_profiler/_child_process_profiling/meta_path_finder.py
New submodule defining the `RewritingFinder` class, a meta path
finder which rewrites a single module on import
line_profiler/_child_process_profiling/pth_hook.py
write_pth_hook()
Now also handling the `os.fork()` patching/wrapping
_setup_in_child_process()
Now creating a `RewritingFinder` to mirror what
`~.autoprofile.autoprofile.run()` does in the main process
.
line_profiler/_child_process_profiling/cache::LineProfilingCache
Refactored `.load()`
line_profiler/_child_process_profiling/multiprocessing_patches.py
New submodule for applying patches to the `multiprocessing`
package, so that profiling is automatically set up in child
processes created by it
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
<general>
Added debug logging to various methods
gather_stats()
New method for gathering profiling stats from child processes
inject_env_vars()
New method for injecting `.environ` into `os.environ`
line_profiler/line_profiler.py::LineStats
get_empty_instance()
New convenience method for creating an empty instance
from_files()
Added new argument `on_defective` to allow for processing a
group of files that cannot all be correctly read
line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]
Added new key-value pair `prof-child-procs` for the default value
of `kernprof --prof-child-procs`
kernprof.py
- New boolean flags
`[--prof-child-procs[=...] | --no-prof-child-procs]` for
controlling whether to set up profiling in child processes
- Fixed bug in `_manage_profiler.__exit__()` where
`CuratedProfilerContext.uninstall()` can be skipped if the
preceding code raises an error
kernprof.py::_prepare_child_profiling_cache()
- Now respecting ${LINE_PROFILER_KEEP_TEMPDIRS}
- Now setting `LineProfilingCache.debug`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
- Added new attributes `.debug` and `._debug_log`
- Now diverting debug messages to log files in `.cache_dir`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
add_cleanup()
Now deferring to a `._add_cleanup()` method which allows for
cleanup-function prioritization
_debug_output()
Fixed type-checking
line_profiler/_child_process_profiling/multiprocessing_patches.py
::apply()
Added debug output before `_setup_in_child_process()` is called to
help with tracing
line_profiler/_child_process_profiling/pth_hook.py
load_pth_hook()
_wrap_os_fork()
Added debug output before `_setup_in_child_process()` is called
to help with tracing
_setup_in_child_process()
- `wrap_os_fork` now defaults to false
- `prof.dump_stats()` now has increased priority over other
callbacks (doesn't seem to help with the malformed prof files
though...)
- Child-process profiling output now written to a less
randomized filename to facilitate debugging
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
profiler
New attribute for the profiler instance
copy(..., inherit_profiler=...)
New argument for inheriting the `.profiler`
load()
Now keeping track of the loaded instance and returning it in
subsequent calls
line_profiler/_child_process_profiling/multiprocessing_patches.py
::apply(..., lp_cache=None)
- If the `LineProfilingCache.load()`-ed instance is consistent with
that loaded from `cache_path`, the former is used
- Added more debugging output
line_profiler/_child_process_profiling/pth_hook.py
load_pth_hook()
Added more debugging output
_wrap_os_fork()
Updated debugging output
_setup_in_child_process()
- Now returning a boolean (whether setup has been newly done)
- Now setting `.profiler` of the cache instance
- Added moew debugging output
kernprof.py::_manage_profiler.__enter__()
Updated so that the created `LineProfilingCache` instance carries a
`.rewrite_module`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
Added an optional `.rewrite_module` attribute
line_profiler/_child_process_profiling/import_machinery.py
::RewritingFinder.find_spec()
Now looking at `.lp_cache.rewrite_module` (where available) to check
for specs to return
line_profiler/_child_process_profiling/
cache.py::LineProfilingCache
_replace_loaded_instance()
New convenience method for an instance in a fork to replace
the instance to be `.load()`-ed
_consistent_with_loaded_instance
New attribute for checking whether the instance is
consistent with what would have been `.load()`-ed
multiprocessing_patches.py
bootstrap(..., lp_cache=...)
Can now be `None`, which defers the `.load()`-ing of the
cache instance
apply()
- Streamlined logic for retrieving the loaded instance
- Now using the above deferred loading whenever appropriate,
so that cleanup and profiling is preserved in forked
processes
pth_hook.py::_wrap_os_fork()
Now using `._replace_loaded_instance()`, so that future calls to
`.load()` in the forked process retrieves the newly-created
instance
kernprof.py::_prepare_child_profiling_cache()
- Updated call to `[...].multiprocessing_patches.apply()`
- Now always setting up the created instance as the one returned by
further calls to `.load()`
line_profiler/_child_process_profiling/multiprocessing_patches.py
PickleHook
- Refactored to contain no instance variables
- Now always using `LineProfilingCache.load()` to retrieve the
appropriate cache instance
bootstrap()
Removed argument `lp_cache`
get_preparation_data()
Removed arguemnt `cache_path`
apply()
- Removed argument `cache_path`
- Argument `lp_cache` now required
- Simplified implementation
line_profiler/_child_process_profiling/import_machinery.py
Removed
line_profiler/_child_process_profiling/pth_hook.py
::_setup_in_child_process()
No longer set up the `RewritingFinder` because messing with the
import system doesn't help with propagating autoprofiling rewrites
to child processes...
kernprof.py
_dump_filtered_stats()
Fixed bug where if no tempfile remains, the `extra_line_stats`
are not merged into the dumped stats
_prepare_child_profiling_cache()
Now setting the `.profiler` of the returned cache object
line_profiler/_child_process_profiling/multiprocessing_patches.py
::_apply_mp_patches()
- Added debugging output for the patches
- Now patching the copy of `runpy` imported by
`multiprocessing.runpy`
line_profiler/_child_process_profiling/pth_hook.py
_wrap_os_fork()
No longer creating a new `LineProfiler` instance (helps with
handling forked processes)
_setup_in_child_process(..., prof=...)
New argument for avoiding instantiating a new profiler when not
necessary (e.g. in a forked process)
line_profiler/_child_process_profiling/runpy_patches.py
New submodule for the aforementioned patching of `runpy`
tests/test_child_procs.py
test_running_multiproc_literal_code()
New test paralleling `test_running_multiproc_{script,module}` to
test `kernprof -c ...`
test_multiproc_script_sanity_check()
- Refactored parameters for better `pytest` output
- Added testing for running the code with `python -c ...`
<Misc>
- Added CLI argument `--local` to the profiled module to toggle
between a locally-defined summing function and an imported one
- Refactored how the test modules are injected
- Added debugging output to `subprocess.run()` calls
- Added provisional support for examining the profiling data
tests/test_child_procs.py::test_multiproc_script_sanity_check()
Now parametrized to test passing the function defined in the test
module itself to `multiprocessing`
tests/test_child_procs.py
test_running_multiproc_{module,literal_code}()
Integrated into `test_running_multiproc_script()`
test_running_multiproc_script()
Extended parametrization
tests/test_child_procs.py
test_profiling_multiproc_script()
Test parallel to `test_running_multiproc_script()`, checking
whether we are correctly profiling the child processes
<General>
- Added more docs
- Updated dummy parameter names
_ext_module, _test_module
- Refactored how the fixtures are set up
- Module names now randomized and clash-proof via `uuid.uuid4()`
_run_subproc()
- Moved code outputting captured streams from
`_run_test_module()` to here
- Added timing code
tests/test_child_procs.py
TEST_MODULE_BODY, [_]test_module()
Added CLI flag to select `multiprocessing` start methods
_Params
New convenience class for test parametrization
test_multiproc_script_sanity_check()
- Streamlined parametrization (15 subtests -> 10)
- Added subtests for various `multiprocessing` start methods
test_multiproc_script_sanity_check()
- Streamlined parametrization (24 subtests -> 21)
- Added subtests for various `multiprocessing` start methods
tests/test_child_procs.py
test_module(), ext_module()
Updated so that we can toggle for the function sent to
`multiprocessing` to raise an error with the `--force-failure`
CLI flag
_run_test_module()
- Now raising a new `ResultMismatch` error class (instead of
using base assertions) for:
- If `test_module()` writes the wrong number to stdout
- If `nhits` are provided and the profiling results differ
therefrom
- Added argument `fail` for using the aforementioned
`--force-failure` flag
test_multiproc_script_sanity_check()
Now also chceking the cases where the test module is run with
`--force-failure`
test_profiling_multiproc_script()
Now also chceking the cases where the test module is run with
`--force-failure`
(FIXME: profiling bugged when the function errors out, and
doesn't fail with a consistent pattern)
line_profiler/_child_process_profiling/multiprocessing_patches.py
@cleanup_wrapper, @setup_wrapper
get_target_property(), log_method_call()
Removed
_Poller
New helper class for polling a callable
wrap_{start,terminate}()
New method wrappers for wrapping the eponymous methods of
`multiprocessing.process.BaseProcess`;
this fixes the bug where if the parallel function errors out in
the child process, it may be terminated before profiling data
can be gathered
wrap_bootstrap()
Refactored from `bootstrap()`
_apply_mp_patches()
Cleaned up testing code
tests/test_child_procs.py::test_profiling_multiproc_script()
Removed XFAIL-ing for cases where the profiled function fails
(because the bug has been fixed)
tests/test_child_procs.py::test_profiling_bare_python()
New test for checking the profiling of child processes created
outside of `multiprocessing` (e.g. `subprocess.run()`,
`os.system()`)
line_profiler/_child_process_profiling/
cache.py::LineProfilingCache.make_tempfile()
New convenience method for creating tempfiles with `mkstemp()`
multiprocessing_patches.py::wrap_start()
pth_hook.py::_setup_in_child_process()
Simplfied implementations to just use
`LineProfilingCache.make_tempfile()`
line_profiler/_child_process_profiling/multiprocessing_patches.py
_Poller
__init__()
- Updated default and typing for `cooldown`
- New arguments `timeout` and `on_timeout` for controlling
timeout duration and behaviors
with_timeout()
New method creating a new instance a la `.with_cooldown()`
__enter__()
Added timeout handling
Timeout
New `RuntimeError` subclass raised if `timeout` is positive
and reached
wrap_terminate()
Now only allowing the `BaseProcess.terminate()` call to be
blocked by at most 1 s by the lock file, before issuing a
warning and proceeding anyway
tests/test_child_procs.py::test_profiling_multiproc_script()
Now timing out the `kernprof` process after 5 s in case the lock
files caused a deadlock
line_profiler/_child_process_profiling/
- The functions `pth_hook.py::_setup_in_child_process()` and
`::_wrap_os_fork()` have been relocated to eponymous instance
methods of `cache.py::LineProfilingCache`
- The implementations of `pth_hook.py::{write,load}_pth_hook()` and
`multiprocessing_patches.py::PickleHook.__setstate__()` are
updated accordingly
tests/test_child_procs.py::_run_subproc()
- If any of the output streams is captured, we call
`subprocess.run(..., check=False)` to get the chance to intercept
and print the output, and only call `.check_returncode()` on the
`CompletedProcess` afterwards
- Fixed bug where if `text=False` we attempt to format the captured
stream-content bytes as strings
kernprof.py
_manage_profiler.__exit__()
Now gathering the debug logs of the child processes and writing
them to the main logger
_prepare_child_profiling_cache()
Simplified implementation
line_profiler/_child_process_profiling/cache.py
_CallbackRepr
New `reprlib.Repr` subclass for handling reprs of the cleanup
callbacks (mostly for truncating the repr of `os.environ`, and
that of objects it appears in)
LineProfilingCache
[_add_]cleanup()
Updated to use `_CallbackRepr` to represent the callbacks
_dump_debug_logs()
New method for gathering debug log files from child
processes
_debug_output()
Added timestamps to the messages
_setup_in_main_process()
New method consisting mostly of code relocated from
`kernprof.py::_prepare_child_profiling_cache()`
line_profiler/_child_process_profiling/pth_hook.py::write_pth_hook()
No longer wrapping `os.fork()` (relocated to
`LineProfilingCache._setup_in_main_process()`)
line_profiler/_child_process_profiling/
cache.py, pth_hook.py
Relocated shared code snippets to `misc_utils`
misc_utils.py
New submodules for util functions
line_profiler/_child_process_profiling/_cache_logging.py
::CacheLoggingEntry
New object for IO to/from `LineProfilingCache._debug_log`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
cleanup()
More verbose logging messages to help with:
- Distinguishing cases where no callback has been registered
- Indicating whether the cleanup callbacks have been
successfully exhausted (e.g. may not be the case for
child processes managed by `multiprocessing`)
_gather_debug_log_entries()
New method for reading in `CacheLoggingEntry`-s from all log
files of the session
_debug_message_{header,timestamp}
Refactored away (functionalities absorbed into
`CacheLoggingEntry`)
kernprof.py
_add_core_parser_arguments()
Added temporary/undocumented flag `--debug-log=...` for
gathering the cache logs and writing them to one place
_manage_profiler
Now registering a delayed cleanup callback for writing the
debug log on `.__enter__()`, so that the gathering of log
entries happens as late as possible (right before the cache dir
is wiped) and includes most of the cleanup in the main process
_prepare_child_profiling_cache()
Now explicitly deferring the deletion of the cache dir to the
very end
line_profiler/_child_process_profiling/cache.py
::LineProfilingCache.cleanup()
Made debug log messages more verbose, indicating the number of
callbacks made as the result of each is reported
tests/test_child_procs.py
::test_profiling_{bare_python,multiproc_script}()
Now gathering cache logs via `--debug-log` and printing them out
XXX: do we drop these functionalities when/if the bugs are fixed?
f9a37af to
aca4e2c
Compare
Collaborator
Author
|
Unfortunately there's too little context to determine why the tests are failing on other platforms. Heck I can't even replicate the macOS failures on my machine with matching dep versions. Just wrote in more code for extracting the debug outputs, force-pushed, and hopefully I will have more clues for what to work on. |
line_profiler/_child_process_profiling/cache.py::_CallbackRepr
__doc__
Updated so that the parts that test `.indent` is skipped on
Python < 3.12
indent
No longer a property; private methods now use `._get_indent()`
to indirectly access it
_get_indent()
Wrapper around `.indent`; falls back to `None` on legacy
versions (< 3.12) without the attribute
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for
kernprofto profile code execution in child Python processes, building on ongoing work (see Credits).Usage
The new flags
--no-prof-child-procsand--prof-child-procs[=...]are added to kernprof. By setting--prof-child-procsto true, child Python processes created by the profiled process are also profiled:1Note how the
sum_worker()calls are profiled:Highlights
os.system()andsubprocess.run()multiprocessing2multiprocessing"start methods" ('fork','forkserver', and'spawn') tested to be compatible, where available on the platform--preimportsor via test-code rewriting) replicated in child processesExplanation
line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for--prod-modand--preimports) so that profiling can be replicated in child processes..pthfile is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.os.fork()(where available) is patched with a wrapper which ensures consistent global states.coverage.multiproc, variousmultiprocessingcomponents are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and explicitly cleanup before exiting. Patches are re-applied in the child processes by the use of object-unpickling side effects.multiprocessingchild processes are allowed to fully clean up and write their profiling, even when the parallel workload errors out,3 additional patches are made tomultiprocessing.kernprofthen gather and merge with the profiling result in main process.Code changes
New code (click to expand)
line_profiler/curated_profiling.pyNew submodule containing mostly relocated code from
kernprof, so that child processes can more easily reestablish profiling:ClassifiedPreimportTargets:Object resolving and classifying the
--prof-mods, and writing a corresponding preimport moduleCuratedProfilerContext:Context manager managing the state of the
LineProfiler, e.g. by slipping it intoline_profiler.profileon startup and purging its.enable_counts on teardownline_profiler/_child_process_profiling/New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:
cache.py::LineProfilingCache:"Session state" object. It:
Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.
Injects the following environment variables, which are inherited by child processes:
${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directoryFrom the combination of both, child processes can retrieve the cache by calling
.load().multiprocessing_patches.py::apply():Apply patches to these
multiprocessingmodule components so that profiling results are properly gathered on child-process exit:Process(read:multiprocessing.process.BaseProcess):.start():Wrapped call so that in the main process, it stores a handle to a "lock file" on the
Processobject, which is inherited by its clone in the child process.._bootstrap():Wrapped call to
.touch()the lock-file handle on startup and delete it on exit..terminate():Wrapped call to poll on the lock file, and soft-block (with a timeout) until it is deleted.
spawn.get_preparation_data():Wrapped call to insert a
~.PickleHookobject (seecoverage.multiproc.Stowaway), which when unpickled in the child process will automatically retrieve theLineProfilingCacheand perform setup.spawn.runpy:Replaced with a localized, patched clone of
runpy(seerunpy_patches.pybelow). This is necessary for profiling to function in non-eager-preimports mode (--no-preimports).pth_hooks.py:Facilities for effecting profiling-code execution in child processes by injecting a temporary
.pthfile into the current venv. This module is kept as minimal as possible to minimize the amount of startup code run as the mere result of having said.pthfile.write_pth_hook():Set up a
.pthfile under the directorysysconfig.get_path('purelib')which callsload_pth_hook()(see below). The.pthfile will be cleaned up by the supplied cache object.load_pth_hook():For processes inheriting a matching "parent PID" from the environment (see
LineProfilingCacheabove), load the cache and set up theLineProfilerinstance used, like how the mainkernprofprocess does.runpy_patches.py::create_runpy_wrapper():Make a clone of the
runpymodule which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities thatline_profiler.autoprofile.autoprofile.run()uses to set up profiling.tests/test_child_procs.py_ModuleFixture:Helper object which handles:
tests/test_cython.py::propose_name()) to avoid clashes; andsys.pathandos.environ['PYTHONPATH']._Params:Helper object which handles concatenation and Cartesian products of parametrizations.
ext_module:New
_ModuleFixturerepresenting a module defining the sum function used bytest_modulewhen run without the--localflag._run_subproc():New wrapper around
subprocess.run()which provide extra debugging output (standard streams, timing info, etc.)test_profiling_multiproc_script():"Main" new test for running the
test_module(see Modified Code) withkernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:run_func: execution modes (kernprof <script>,kernprof -m <module>, andkernprof -c <code>)prof_child_procs; whether to use child-process profiling (--[no-]prof-child-procs)preimports: eager vs. on-import profiling (--[no-]preimports)use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modulesfail: whether the parallel workload errors outstart_method:multiprocessing"start methods" ('fork','forkserver', and'spawn')test_profiling_bare_python():New test for profiling child processes where the code run by
kernprof --prof-child-procsdoesn't directly invokemultiprocessing, but spins up another Python process that does (viaos.system()orsubprocess.run()).Modified code (click to expand)
line_profiler/line_profiler.py::LineStats.get_empty_instance():New convenience class method for creating an instance with no profiling data and the platform-appropriate
.unit..from_files():Added new argument
on_defective: Literal['ignore', 'warn', 'error'], allowing for passing over bad files (e.g. empty ones) with optional warnings. The old behavior (on_defective='error') remains the default.line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]New key-value pair
prof-child-procsfor the default of thekernprof --[no-]prof-child-procsflag.kernprof.py_add_core_parser_arguments():Now adding the new
--[no-]prof-child-procsflags to the parser._write_preimports():Refactored to use the new/relocated facilities at
line_profiler.curated_profiling._dump_filtered_stats():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes)._dump_filtered_line_stats()which it now calls._manage_profiler:Context manager refactored from the old
_pre_profile()for more Pythonic handling of setups and teardowns._prepare_child_profiling_cache()._prepare_profiler(),_prepare_exec_script())._post_profile()on context exit so that we no longer have to explicitlytry: ... finally: ...in_main_profile()._post_profile():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).line_profiler.curated_profiling.tests/test_child_procs.pytest_module:Pathfixture into a_ModuleFixture(see above in New Code).--start-methodselects a specificmultiprocessing"start method".--localtoggles between using a sum function defined locally intest_moduleor the one defined externally inext_module(see New Code).--force-failuretoggles whether the sum function should return normally or raise an error._run_as_{script,module}():_run_as_literal_code()to also testkernprof -c ....test_moduleas a_ModuleFixtureinstead of a path, and handling its installation._run_test_module():run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing ofkernprofexecution modes as test parametrization.profiled_code_is_tempfile: boolhelps with constructing thekernprofcommand line in cases where the code is anonymous (kernprof -c ...).use_local_func: bool,fail: bool, andstart_method: Literal['fork', 'forkserver', 'spawn'] | Noneallows for fuzzing code execution with the aforementionedtest_moduleCLI flags (resp.--local,--force-failure, and--start-method).nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with--prof-child-procs, only those in the main process without).failis true, thekernprofsubprocess should fail..pthfiles created bykernprof --prof-child-procsshould be cleaned up.nhits(where available).test_multiproc_script_sanity_check():use_local_func,fail, andstart_method, to ensure that the test script is fully functional in vanilla Python.as_module: boolwithrun_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ...,python -m ..., and the newpython -cadded via the aforementioned_run_as_literal_code()).test_running_multiproc_script():New parametrization
run_funcallows for absorbing the oldtest_running_multiproc_module()into the same test as additional parametrization, as well as testingkernprof -c.Caveats
.pthfile created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of.pthfiles should be called out, given their recent spotlight in a CVE vulnerability..pthfile is written tosys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the system Python will have to import and runline_profiler._child_process_profiling.load_pth_hook(). Though the function itself should quit rather quickly when we're not in a child process, it may still entail loading a significant portion ofline_profilerintosys.modules..pthtrick above, the use of unpickling side effects to execute "arbitrary" code (the patching ofmultiprocessing) may raise some eyebrows.Ideally speaking, we should listen to the lock file instead of polling for its (non-)existence in a loop. (To avoid uncontrolled hot-looping I'm just using a 1/32-s cooldown.) Maybe a tool like
watchdogwould do the job, but I don't want to introduce a new dependency unless we really needed it.After all, there's a reason
Process.terminate()justSIGTERMs the child process with reckless regard – children are sporadically stuck and the polling may enter a deadlock. To guard against that I added:SIGTERMis sent anyway; andkernprof --prof-child-procs.This seems to be enough to both get rid of the deadlocks in tests and preserve profiling data... but the problem is that for child processes to deadlock AT ALL, their cleanup routines must have (of yet) failed to complete, and thus there is still a risk of profiling data not being written. So there's probably either some race conditions hidden by the delays, or an error in how the lock files are detected.
coveragegets by alright by only patchingProcess._bootstrap(), without the above termination issue. Gotta figure out why...TODO
line_profiler._child_process_profilingto become public API?Credits
pytest-autoprofile, which in turn was based on the solution implemented incoverage.multiproc.LineStatsaggregation #380kernprofto always run code insys.modules['__main__']'s namespace #423Notes
Welp. This took way longer than I expected. The main friction points were that:
line_profiler._child_process_profiling.cache.LineProfilingCacheclass tackles this issue.pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Had to do some rather hacky stuff to circumvent that (see Caveats)...3Footnotes
Note however that the equivalent vanilla Python command (
python -c ...) would error out, because functions sent tomultiprocessingmust be pickle-able and thus reside in a physical file. This is sidestepped bykernprof's always writing code received bykernprof -c ...and... | kernprof -to a tempfile (ENH: auto-profilestdinor literal snippets #338). ↩In the test suite we're only testing process creation with the most common
multiprocessing[.get_context(...)].Pool. However, since none of the patched components are specific tomultiprocessing.pool, it should also work with other model of parallelism built with the components ofmultiprocessing. ↩From the docs for
mulitprocessing.Process.terminate():Note that exit handlers and finally clauses, etc., will not be executed. Normally this doesn't matter, but if the parallelly-executed function errors out,multiprocessinghas a bad habit of just.terminate()-ing child processes without allowing for enough time to run cleanup, leading to incomplete profiling data. Hence the only workaround seems to be interceptingProcess.terminate()calls and blocking them where appropriate. ↩ ↩2 ↩3