[Draft] FEAT: extend profiling to child processes by TTsangSC · Pull Request #431 · pyutils/line_profiler

TTsangSC · 2026-04-14T16:13:41Z

This PR adds support for kernprof to profile code execution in child Python processes, building on ongoing work (see Credits).

Usage

The new flags --no-prof-child-procs and --prof-child-procs[=...] are added to kernprof. By setting --prof-child-procs to true, child Python processes created by the profiled process are also profiled:¹

$ kernprof -lv --prof-child-procs -c "if True:
    import itertools
    import multiprocessing
    from collections.abc import Collection

    def sum_worker(nums: Collection[int]) -> int:
        result = 0
        for x in nums:
            result += x
        return result

    def sum_parallel(nums: Collection[int], nprocs: int) -> int:
        size_ = len(nums) / nprocs
        size = int(size_)
        if size_ > size:
            size += 1
        with multiprocessing.Pool(nprocs) as pool:
            sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
            pool.close()
            pool.join()
        return sum_worker(sub_sums)

    if __name__ == '__main__':
        print(sum_parallel(range(1, 1001), 3))"
500500
Wrote profile results to 'kernprof-command-<...>.lprof'
Timer unit: 1e-06 s

Total time: 0.000312 s
File: <...>/kernprof-command.py
Function: sum_worker at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                               def sum_worker(nums: Collection[int]) -> int:
     7         4          3.0      0.8      1.0          result = 0
     8      1007        155.0      0.2     49.7          for x in nums:
     9      1003        153.0      0.2     49.0              result += x
    10         4          1.0      0.2      0.3          return result

Total time: 0.100223 s
File: <...>/kernprof-command.py
Function: sum_parallel at line 12

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                               def sum_parallel(nums: Collection[int], nprocs: int) -> int:
    13         1          1.0      1.0      0.0          size_ = len(nums) / nprocs
    14         1          1.0      1.0      0.0          size = int(size_)
    15         1          0.0      0.0      0.0          if size_ > size:
    16         1          0.0      0.0      0.0              size += 1
    17         2      21685.0  10842.5     21.6          with multiprocessing.Pool(nprocs) as pool:
    18         1      68692.0  68692.0     68.5              sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
    19         1         27.0     27.0      0.0              pool.close()
    20         1       9800.0   9800.0      9.8              pool.join()
    21         1         17.0     17.0      0.0          return sum_worker(sub_sums)

Note how the sum_worker() calls are profiled:

The main process contributes 1 call and 3 loops summing the sub-sums.
The 3 child processes each contributes 1 call, and they loop over all 1000 of the items combined.

Highlights

Children created by (including but not limited to) these methods can be profiled:
- os.system() and subprocess.run()
- multiprocessing²
All three multiprocessing "start methods" ('fork', 'forkserver', and 'spawn') tested to be compatible, where available on the platform
Profiling unaffected by whether the profiled function run in child processes:
- Is locally defined in the profiled code or imported
- Executes cleanly or errors out
Mode of profiling (with eager --preimports or via test-code rewriting) replicated in child processes

Explanation

A serializable cache object (line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for --prod-mod and --preimports) so that profiling can be replicated in child processes.
In the main process, environment variables are injected, so that it and its children would have access to its PID and the cache-directory location.
A temporary .pth file is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.
os.fork() (where available) is patched with a wrapper which ensures consistent global states.
As with coverage.multiproc, various multiprocessing components are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and explicitly cleanup before exiting. Patches are re-applied in the child processes by the use of object-unpickling side effects.
To make sure that multiprocessing child processes are allowed to fully clean up and write their profiling, even when the parallel workload errors out,³ additional patches are made to multiprocessing.
When properly set up, child processes write profiling output on exit to the session cache directory, which kernprof then gather and merge with the profiling result in main process.

Code changes

New code (click to expand)

`line_profiler/curated_profiling.py`

New submodule containing mostly relocated code from kernprof, so that child processes can more easily reestablish profiling:

ClassifiedPreimportTargets:
Object resolving and classifying the --prof-mods, and writing a corresponding preimport module
CuratedProfilerContext:
Context manager managing the state of the LineProfiler, e.g. by slipping it into line_profiler.profile on startup and purging its .enable_counts on teardown

`line_profiler/_child_process_profiling/`

New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:

cache.py::LineProfilingCache:
"Session state" object. It:
- Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.
- Injects the following environment variables, which are inherited by child processes:
  - ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID
  - ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directory
  From the combination of both, child processes can retrieve the cache by calling .load().
multiprocessing_patches.py::apply():
Apply patches to these multiprocessing module components so that profiling results are properly gathered on child-process exit:
- Process (read: multiprocessing.process.BaseProcess):
  - .start():
    Wrapped call so that in the main process, it stores a handle to a "lock file" on the Process object, which is inherited by its clone in the child process.
  - ._bootstrap():
    Wrapped call to .touch() the lock-file handle on startup and delete it on exit.
  - .terminate():
    Wrapped call to poll on the lock file, and soft-block (with a timeout) until it is deleted.
- spawn.get_preparation_data():
  Wrapped call to insert a ~.PickleHook object (see coverage.multiproc.Stowaway), which when unpickled in the child process will automatically retrieve the LineProfilingCache and perform setup.
- spawn.runpy:
  Replaced with a localized, patched clone of runpy (see runpy_patches.py below). This is necessary for profiling to function in non-eager-preimports mode (--no-preimports).
pth_hooks.py:
Facilities for effecting profiling-code execution in child processes by injecting a temporary .pth file into the current venv. This module is kept as minimal as possible to minimize the amount of startup code run as the mere result of having said .pth file.
- write_pth_hook():
  Set up a .pth file under the directory sysconfig.get_path('purelib') which calls load_pth_hook() (see below). The .pth file will be cleaned up by the supplied cache object.
- load_pth_hook():
  For processes inheriting a matching "parent PID" from the environment (see LineProfilingCache above), load the cache and set up the LineProfiler instance used, like how the main kernprof process does.
runpy_patches.py::create_runpy_wrapper():
Make a clone of the runpy module which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities that line_profiler.autoprofile.autoprofile.run() uses to set up profiling.

`tests/test_child_procs.py`

_ModuleFixture:
Helper object which handles:
- Module-name mangling (à la tests/test_cython.py::propose_name()) to avoid clashes; and
- Installation into and cleanup from sys.path and os.environ['PYTHONPATH'].
_Params:
Helper object which handles concatenation and Cartesian products of parametrizations.
ext_module:
New _ModuleFixture representing a module defining the sum function used by test_module when run without the --local flag.
_run_subproc():
New wrapper around subprocess.run() which provide extra debugging output (standard streams, timing info, etc.)
test_profiling_multiproc_script():
"Main" new test for running the test_module (see Modified Code) with kernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:
- run_func: execution modes (kernprof <script>, kernprof -m <module>, and kernprof -c <code>)
- prof_child_procs; whether to use child-process profiling (--[no-]prof-child-procs)
- preimports: eager vs. on-import profiling (--[no-]preimports)
- use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modules
- fail: whether the parallel workload errors out
- start_method: multiprocessing "start methods" ('fork', 'forkserver', and 'spawn')
test_profiling_bare_python():
New test for profiling child processes where the code run by kernprof --prof-child-procs doesn't directly invoke multiprocessing, but spins up another Python process that does (via os.system() or subprocess.run()).

Modified code (click to expand)

`line_profiler/line_profiler.py::LineStats`

.get_empty_instance():
New convenience class method for creating an instance with no profiling data and the platform-appropriate .unit.
.from_files():
Added new argument on_defective: Literal['ignore', 'warn', 'error'], allowing for passing over bad files (e.g. empty ones) with optional warnings. The old behavior (on_defective='error') remains the default.

`line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]`

New key-value pair prof-child-procs for the default of the kernprof --[no-]prof-child-procs flag.

`kernprof.py`

_add_core_parser_arguments():
Now adding the new --[no-]prof-child-procs flags to the parser.
_write_preimports():
Refactored to use the new/relocated facilities at line_profiler.curated_profiling.
_dump_filtered_stats():
- New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
- Partially split off into the new _dump_filtered_line_stats() which it now calls.
_manage_profiler:
Context manager refactored from the old _pre_profile() for more Pythonic handling of setups and teardowns.
- Added setup for the session cache via calling _prepare_child_profiling_cache().
- The old function body is split off into smaller components (_prepare_profiler(), _prepare_exec_script()).
- Now calling _post_profile() on context exit so that we no longer have to explicitly try: ... finally: ... in _main_profile().
_post_profile():
- New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
- Simplified because some of the cleanup is relocated to line_profiler.curated_profiling.

`tests/test_child_procs.py`

test_module:
- Refactored from a Path fixture into a _ModuleFixture (see above in New Code).
- Added the following command-line flags to the code:
  - --start-method selects a specific multiprocessing "start method".
  - --local toggles between using a sum function defined locally in test_module or the one defined externally in ext_module (see New Code).
  - --force-failure toggles whether the sum function should return normally or raise an error.
_run_as_{script,module}():
- Now joined by a _run_as_literal_code() to also test kernprof -c ....
- Now taking test_module as a _ModuleFixture instead of a path, and handling its installation.
_run_test_module():
- New convenience wrappers run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing of kernprof execution modes as test parametrization.
- New arguments:
  - profiled_code_is_tempfile: bool helps with constructing the kernprof command line in cases where the code is anonymous (kernprof -c ...).
  - use_local_func: bool, fail: bool, and start_method: Literal['fork', 'forkserver', 'spawn'] | None allows for fuzzing code execution with the aforementioned test_module CLI flags (resp. --local, --force-failure, and --start-method).
  - nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with --prof-child-procs, only those in the main process without).
- Added checks:
  - If fail is true, the kernprof subprocess should fail.
  - Temporary .pth files created by kernprof --prof-child-procs should be cleaned up.
  - Profiling output is consistent with the provided nhits (where available).
test_multiproc_script_sanity_check():
- Now fuzzing the parametrizations use_local_func, fail, and start_method, to ensure that the test script is fully functional in vanilla Python.
- Superseded the argument as_module: bool with run_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ..., python -m ..., and the new python -c added via the aforementioned _run_as_literal_code()).
test_running_multiproc_script():
New parametrization run_func allows for absorbing the old test_running_multiproc_module() into the same test as additional parametrization, as well as testing kernprof -c.

Caveats

The temporary .pth file created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of .pth files should be called out, given their recent spotlight in a CVE vulnerability.
Since the .pth file is written to sys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the system Python will have to import and run line_profiler._child_process_profiling.load_pth_hook(). Though the function itself should quit rather quickly when we're not in a child process, it may still entail loading a significant portion of line_profiler into sys.modules.
Like the .pth trick above, the use of unpickling side effects to execute "arbitrary" code (the patching of multiprocessing) may raise some eyebrows.
The lock-file system³ is a bit dodgy:
- Ideally speaking, we should listen to the lock file instead of polling for its (non-)existence in a loop. (To avoid uncontrolled hot-looping I'm just using a 1/32-s cooldown.) Maybe a tool like watchdog would do the job, but I don't want to introduce a new dependency unless we really needed it.
- After all, there's a reason Process.terminate() just SIGTERMs the child process with reckless regard – children are sporadically stuck and the polling may enter a deadlock. To guard against that I added:
  - A 1-s timeout for the polling, after which we just issue a warning and the SIGTERM is sent anyway; and
  - A 2-s timeout for the tests running kernprof --prof-child-procs.
  This seems to be enough to both get rid of the deadlocks in tests and preserve profiling data... but the problem is that for child processes to deadlock AT ALL, their cleanup routines must have (of yet) failed to complete, and thus there is still a risk of profiling data not being written. So there's probably either some race conditions hidden by the delays, or an error in how the lock files are detected.
Apparently coverage gets by alright by only patching Process._bootstrap(), without the above termination issue. Gotta figure out why...

TODO

Add documentation on this new feature.
Maybe we should indicate this feature to be experimental...
Would it make more sense for any of the content in line_profiler._child_process_profiling to become public API?

Credits

Loosely based on similar work I did for pytest-autoprofile, which in turn was based on the solution implemented in coverage.multiproc.
Related PRs:

Notes

Welp. This took way longer than I expected. The main friction points were that:

There isn't a pre-existing "global-ish" state object that I can leverage, and which can be easily replicated in subprocesses. The new line_profiler._child_process_profiling.cache.LineProfilingCache class tackles this issue.
I had a very hard time trying to make profiling results consistent even when the parallelly-executed function errors out. Would have thought that I already took care of that in the other project (see pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Had to do some rather hacky stuff to circumvent that (see Caveats)...³

Note however that the equivalent vanilla Python command (python -c ...) would error out, because functions sent to multiprocessing must be pickle-able and thus reside in a physical file. This is sidestepped by kernprof's always writing code received by kernprof -c ... and ... | kernprof - to a tempfile (ENH: auto-profile stdin or literal snippets #338). ↩
In the test suite we're only testing process creation with the most common multiprocessing[.get_context(...)].Pool. However, since none of the patched components are specific to multiprocessing.pool, it should also work with other model of parallelism built with the components of multiprocessing. ↩
From the docs for mulitprocessing.Process.terminate(): Note that exit handlers and finally clauses, etc., will not be executed. Normally this doesn't matter, but if the parallelly-executed function errors out, multiprocessing has a bad habit of just .terminate()-ing child processes without allowing for enough time to run cleanup, leading to incomplete profiling data. Hence the only workaround seems to be intercepting Process.terminate() calls and blocking them where appropriate. ↩ ↩² ↩³

TTsangSC · 2026-04-15T03:46:59Z

Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens.

- `line_profiler/curated_profiling.py` New module for setting up profiling in a curated environment - `ClassifiedPreimportTargets.from_targets()` Method for creating a `ClassifiedPreimportTargets` instance, facilitating writing pre-import modules in a replicable and portable manner - `ClassifiedPreimportTargets.write_preimport_module()` Method for writing a pre-import module based on an instance; also fixed bug where the body of the written module was intercepted without appearing in the debug output - `kernprof.py` - `_gather_preimport_targets()` Migrated to `line_profiler.curated_profiling` - `_write_preimports()` Now using the new `ClassifiedPreimportTargets` class, moving esp. the logic to the `write_preimport_module()` method

- `kernprof.py::_manage_profiler` `line_profiler/curated_profiling.py::CuratedProfilerContext` New context-manager classes for handling profiler setup and teardown - `kernprof.py::_pre_profile()` Refactored into the above context managers and other private functions (`_prepare_profiler()`, `_prepare_exec_script()`)

line_profiler/_child_process_profiling/cache.py::LineProfilingCache New class for passing info onto child processes so that profiling can resume there line_profiler/pth_hook.py New submodule for the .pth-file-based solution to propagating profiling into child processes: write_pth_hook() In the main process, write the temporary .pth file to be loaded in child processes load_pth_hook() Called by the .pth in child process, loading the cache and setting up profiling based thereon

line_profiler/_child_process_profiling/cache.py::LineProfilingCache Added new `.profile_imports` attribute to correspond to `kernprof`'s `--prof-imports` flag line_profiler/_child_process_profiling/meta_path_finder.py New submodule defining the `RewritingFinder` class, a meta path finder which rewrites a single module on import line_profiler/_child_process_profiling/pth_hook.py write_pth_hook() Now also handling the `os.fork()` patching/wrapping _setup_in_child_process() Now creating a `RewritingFinder` to mirror what `~.autoprofile.autoprofile.run()` does in the main process .

line_profiler/_child_process_profiling/cache::LineProfilingCache Refactored `.load()` line_profiler/_child_process_profiling/multiprocessing_patches.py New submodule for applying patches to the `multiprocessing` package, so that profiling is automatically set up in child processes created by it

line_profiler/_child_process_profiling/cache.py::LineProfilingCache <general> Added debug logging to various methods gather_stats() New method for gathering profiling stats from child processes inject_env_vars() New method for injecting `.environ` into `os.environ` line_profiler/line_profiler.py::LineStats get_empty_instance() New convenience method for creating an empty instance from_files() Added new argument `on_defective` to allow for processing a group of files that cannot all be correctly read

line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof] Added new key-value pair `prof-child-procs` for the default value of `kernprof --prof-child-procs` kernprof.py - New boolean flags `[--prof-child-procs[=...] | --no-prof-child-procs]` for controlling whether to set up profiling in child processes - Fixed bug in `_manage_profiler.__exit__()` where `CuratedProfilerContext.uninstall()` can be skipped if the preceding code raises an error

kernprof.py::_prepare_child_profiling_cache() - Now respecting ${LINE_PROFILER_KEEP_TEMPDIRS} - Now setting `LineProfilingCache.debug` line_profiler/_child_process_profiling/cache.py::LineProfilingCache - Added new attributes `.debug` and `._debug_log` - Now diverting debug messages to log files in `.cache_dir`

line_profiler/_child_process_profiling/cache.py::LineProfilingCache add_cleanup() Now deferring to a `._add_cleanup()` method which allows for cleanup-function prioritization _debug_output() Fixed type-checking line_profiler/_child_process_profiling/multiprocessing_patches.py ::apply() Added debug output before `_setup_in_child_process()` is called to help with tracing line_profiler/_child_process_profiling/pth_hook.py load_pth_hook() _wrap_os_fork() Added debug output before `_setup_in_child_process()` is called to help with tracing _setup_in_child_process() - `wrap_os_fork` now defaults to false - `prof.dump_stats()` now has increased priority over other callbacks (doesn't seem to help with the malformed prof files though...) - Child-process profiling output now written to a less randomized filename to facilitate debugging

line_profiler/_child_process_profiling/cache.py::LineProfilingCache profiler New attribute for the profiler instance copy(..., inherit_profiler=...) New argument for inheriting the `.profiler` load() Now keeping track of the loaded instance and returning it in subsequent calls line_profiler/_child_process_profiling/multiprocessing_patches.py ::apply(..., lp_cache=None) - If the `LineProfilingCache.load()`-ed instance is consistent with that loaded from `cache_path`, the former is used - Added more debugging output line_profiler/_child_process_profiling/pth_hook.py load_pth_hook() Added more debugging output _wrap_os_fork() Updated debugging output _setup_in_child_process() - Now returning a boolean (whether setup has been newly done) - Now setting `.profiler` of the cache instance - Added moew debugging output

kernprof.py::_manage_profiler.__enter__() Updated so that the created `LineProfilingCache` instance carries a `.rewrite_module` line_profiler/_child_process_profiling/cache.py::LineProfilingCache Added an optional `.rewrite_module` attribute line_profiler/_child_process_profiling/import_machinery.py ::RewritingFinder.find_spec() Now looking at `.lp_cache.rewrite_module` (where available) to check for specs to return

line_profiler/_child_process_profiling/ cache.py::LineProfilingCache _replace_loaded_instance() New convenience method for an instance in a fork to replace the instance to be `.load()`-ed _consistent_with_loaded_instance New attribute for checking whether the instance is consistent with what would have been `.load()`-ed multiprocessing_patches.py bootstrap(..., lp_cache=...) Can now be `None`, which defers the `.load()`-ing of the cache instance apply() - Streamlined logic for retrieving the loaded instance - Now using the above deferred loading whenever appropriate, so that cleanup and profiling is preserved in forked processes pth_hook.py::_wrap_os_fork() Now using `._replace_loaded_instance()`, so that future calls to `.load()` in the forked process retrieves the newly-created instance

kernprof.py::_prepare_child_profiling_cache() - Updated call to `[...].multiprocessing_patches.apply()` - Now always setting up the created instance as the one returned by further calls to `.load()` line_profiler/_child_process_profiling/multiprocessing_patches.py PickleHook - Refactored to contain no instance variables - Now always using `LineProfilingCache.load()` to retrieve the appropriate cache instance bootstrap() Removed argument `lp_cache` get_preparation_data() Removed arguemnt `cache_path` apply() - Removed argument `cache_path` - Argument `lp_cache` now required - Simplified implementation

line_profiler/_child_process_profiling/import_machinery.py Removed line_profiler/_child_process_profiling/pth_hook.py ::_setup_in_child_process() No longer set up the `RewritingFinder` because messing with the import system doesn't help with propagating autoprofiling rewrites to child processes...

kernprof.py _dump_filtered_stats() Fixed bug where if no tempfile remains, the `extra_line_stats` are not merged into the dumped stats _prepare_child_profiling_cache() Now setting the `.profiler` of the returned cache object

line_profiler/_child_process_profiling/multiprocessing_patches.py ::_apply_mp_patches() - Added debugging output for the patches - Now patching the copy of `runpy` imported by `multiprocessing.runpy` line_profiler/_child_process_profiling/pth_hook.py _wrap_os_fork() No longer creating a new `LineProfiler` instance (helps with handling forked processes) _setup_in_child_process(..., prof=...) New argument for avoiding instantiating a new profiler when not necessary (e.g. in a forked process) line_profiler/_child_process_profiling/runpy_patches.py New submodule for the aforementioned patching of `runpy`

tests/test_child_procs.py test_running_multiproc_literal_code() New test paralleling `test_running_multiproc_{script,module}` to test `kernprof -c ...` test_multiproc_script_sanity_check() - Refactored parameters for better `pytest` output - Added testing for running the code with `python -c ...` <Misc> - Added CLI argument `--local` to the profiled module to toggle between a locally-defined summing function and an imported one - Refactored how the test modules are injected - Added debugging output to `subprocess.run()` calls - Added provisional support for examining the profiling data

tests/test_child_procs.py::test_multiproc_script_sanity_check() Now parametrized to test passing the function defined in the test module itself to `multiprocessing`

tests/test_child_procs.py test_running_multiproc_{module,literal_code}() Integrated into `test_running_multiproc_script()` test_running_multiproc_script() Extended parametrization

tests/test_child_procs.py test_profiling_multiproc_script() Test parallel to `test_running_multiproc_script()`, checking whether we are correctly profiling the child processes <General> - Added more docs - Updated dummy parameter names _ext_module, _test_module - Refactored how the fixtures are set up - Module names now randomized and clash-proof via `uuid.uuid4()` _run_subproc() - Moved code outputting captured streams from `_run_test_module()` to here - Added timing code

tests/test_child_procs.py TEST_MODULE_BODY, [_]test_module() Added CLI flag to select `multiprocessing` start methods _Params New convenience class for test parametrization test_multiproc_script_sanity_check() - Streamlined parametrization (15 subtests -> 10) - Added subtests for various `multiprocessing` start methods test_multiproc_script_sanity_check() - Streamlined parametrization (24 subtests -> 21) - Added subtests for various `multiprocessing` start methods

tests/test_child_procs.py test_module(), ext_module() Updated so that we can toggle for the function sent to `multiprocessing` to raise an error with the `--force-failure` CLI flag _run_test_module() - Now raising a new `ResultMismatch` error class (instead of using base assertions) for: - If `test_module()` writes the wrong number to stdout - If `nhits` are provided and the profiling results differ therefrom - Added argument `fail` for using the aforementioned `--force-failure` flag test_multiproc_script_sanity_check() Now also chceking the cases where the test module is run with `--force-failure` test_profiling_multiproc_script() Now also chceking the cases where the test module is run with `--force-failure` (FIXME: profiling bugged when the function errors out, and doesn't fail with a consistent pattern)

line_profiler/_child_process_profiling/multiprocessing_patches.py @cleanup_wrapper, @setup_wrapper get_target_property(), log_method_call() Removed _Poller New helper class for polling a callable wrap_{start,terminate}() New method wrappers for wrapping the eponymous methods of `multiprocessing.process.BaseProcess`; this fixes the bug where if the parallel function errors out in the child process, it may be terminated before profiling data can be gathered wrap_bootstrap() Refactored from `bootstrap()` _apply_mp_patches() Cleaned up testing code

tests/test_child_procs.py::test_profiling_multiproc_script() Removed XFAIL-ing for cases where the profiled function fails (because the bug has been fixed)

tests/test_child_procs.py::test_profiling_bare_python() New test for checking the profiling of child processes created outside of `multiprocessing` (e.g. `subprocess.run()`, `os.system()`)

line_profiler/_child_process_profiling/ cache.py::LineProfilingCache.make_tempfile() New convenience method for creating tempfiles with `mkstemp()` multiprocessing_patches.py::wrap_start() pth_hook.py::_setup_in_child_process() Simplfied implementations to just use `LineProfilingCache.make_tempfile()`

line_profiler/_child_process_profiling/multiprocessing_patches.py _Poller __init__() - Updated default and typing for `cooldown` - New arguments `timeout` and `on_timeout` for controlling timeout duration and behaviors with_timeout() New method creating a new instance a la `.with_cooldown()` __enter__() Added timeout handling Timeout New `RuntimeError` subclass raised if `timeout` is positive and reached wrap_terminate() Now only allowing the `BaseProcess.terminate()` call to be blocked by at most 1 s by the lock file, before issuing a warning and proceeding anyway tests/test_child_procs.py::test_profiling_multiproc_script() Now timing out the `kernprof` process after 5 s in case the lock files caused a deadlock

line_profiler/_child_process_profiling/ - The functions `pth_hook.py::_setup_in_child_process()` and `::_wrap_os_fork()` have been relocated to eponymous instance methods of `cache.py::LineProfilingCache` - The implementations of `pth_hook.py::{write,load}_pth_hook()` and `multiprocessing_patches.py::PickleHook.__setstate__()` are updated accordingly

tests/test_child_procs.py::_run_subproc() - If any of the output streams is captured, we call `subprocess.run(..., check=False)` to get the chance to intercept and print the output, and only call `.check_returncode()` on the `CompletedProcess` afterwards - Fixed bug where if `text=False` we attempt to format the captured stream-content bytes as strings

kernprof.py _manage_profiler.__exit__() Now gathering the debug logs of the child processes and writing them to the main logger _prepare_child_profiling_cache() Simplified implementation line_profiler/_child_process_profiling/cache.py _CallbackRepr New `reprlib.Repr` subclass for handling reprs of the cleanup callbacks (mostly for truncating the repr of `os.environ`, and that of objects it appears in) LineProfilingCache [_add_]cleanup() Updated to use `_CallbackRepr` to represent the callbacks _dump_debug_logs() New method for gathering debug log files from child processes _debug_output() Added timestamps to the messages _setup_in_main_process() New method consisting mostly of code relocated from `kernprof.py::_prepare_child_profiling_cache()` line_profiler/_child_process_profiling/pth_hook.py::write_pth_hook() No longer wrapping `os.fork()` (relocated to `LineProfilingCache._setup_in_main_process()`)

line_profiler/_child_process_profiling/ cache.py, pth_hook.py Relocated shared code snippets to `misc_utils` misc_utils.py New submodules for util functions

line_profiler/_child_process_profiling/_cache_logging.py ::CacheLoggingEntry New object for IO to/from `LineProfilingCache._debug_log` line_profiler/_child_process_profiling/cache.py::LineProfilingCache cleanup() More verbose logging messages to help with: - Distinguishing cases where no callback has been registered - Indicating whether the cleanup callbacks have been successfully exhausted (e.g. may not be the case for child processes managed by `multiprocessing`) _gather_debug_log_entries() New method for reading in `CacheLoggingEntry`-s from all log files of the session _debug_message_{header,timestamp} Refactored away (functionalities absorbed into `CacheLoggingEntry`)

kernprof.py _add_core_parser_arguments() Added temporary/undocumented flag `--debug-log=...` for gathering the cache logs and writing them to one place _manage_profiler Now registering a delayed cleanup callback for writing the debug log on `.__enter__()`, so that the gathering of log entries happens as late as possible (right before the cache dir is wiped) and includes most of the cleanup in the main process _prepare_child_profiling_cache() Now explicitly deferring the deletion of the cache dir to the very end line_profiler/_child_process_profiling/cache.py ::LineProfilingCache.cleanup() Made debug log messages more verbose, indicating the number of callbacks made as the result of each is reported tests/test_child_procs.py ::test_profiling_{bare_python,multiproc_script}() Now gathering cache logs via `--debug-log` and printing them out XXX: do we drop these functionalities when/if the bugs are fixed?

TTsangSC · 2026-04-16T21:19:05Z

Unfortunately there's too little context to determine why the tests are failing on other platforms. Heck I can't even replicate the macOS failures on my machine with matching dep versions. Just wrote in more code for extracting the debug outputs, force-pushed, and hopefully I will have more clues for what to work on.

line_profiler/_child_process_profiling/cache.py::_CallbackRepr __doc__ Updated so that the parts that test `.indent` is skipped on Python < 3.12 indent No longer a property; private methods now use `._get_indent()` to indirectly access it _get_indent() Wrapper around `.indent`; falls back to `None` on legacy versions (< 3.12) without the attribute

TTsangSC mentioned this pull request Apr 14, 2026

Update CI - Drop 3.8 / 3.9 #428

Merged

TTsangSC changed the title ~~FEAT: extend profiling to child processes~~ [Draft] FEAT: extend profiling to child processes Apr 14, 2026

TTsangSC force-pushed the profile-child-processes branch from 2cd2ed4 to f9a37af Compare April 15, 2026 03:48

TTsangSC added 26 commits April 15, 2026 20:14

Removed redundant check in kernprof

f2c8565

Renamed submodule

4d6900a

Fixed stat-aggregation bug

a51411f

kernprof.py _dump_filtered_stats() Fixed bug where if no tempfile remains, the `extra_line_stats` are not merged into the dumped stats _prepare_child_profiling_cache() Now setting the `.profiler` of the returned cache object

Streamlined debug output in _child_process_profiling

ab8b6a0

Test local funcs in the profiled script

3288fdf

tests/test_child_procs.py::test_multiproc_script_sanity_check() Now parametrized to test passing the function defined in the test module itself to `multiprocessing`

Consolidated tests

7034479

tests/test_child_procs.py test_running_multiproc_{module,literal_code}() Integrated into `test_running_multiproc_script()` test_running_multiproc_script() Extended parametrization

More debug output, flexible multiprocessing patching

d12b26e

TTsangSC added 17 commits April 15, 2026 20:14

Fix attribute-cleanup logic in multiprocessing_patches.apply()

3529759

WIP: let child processes fully clean up before terminating them

4ea789a

Fix tests/test_child_procs.py

67868aa

tests/test_child_procs.py::test_profiling_multiproc_script() Removed XFAIL-ing for cases where the profiled function fails (because the bug has been fixed)

Test for non-multiprocessing child processes

cc980e1

tests/test_child_procs.py::test_profiling_bare_python() New test for checking the profiling of child processes created outside of `multiprocessing` (e.g. `subprocess.run()`, `os.system()`)

Added check in tests for .pth-file cleanup

c0b93a8

Fix kernprof docstring

0aabec4

CHANGELOG and help-text update

08d0b05

Typing fixes

ff42d21

Factor out common utils

234dbe5

line_profiler/_child_process_profiling/ cache.py, pth_hook.py Relocated shared code snippets to `misc_utils` misc_utils.py New submodules for util functions

TTsangSC force-pushed the profile-child-processes branch from f9a37af to aca4e2c Compare April 16, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] FEAT: extend profiling to child processes#431

[Draft] FEAT: extend profiling to child processes#431
TTsangSC wants to merge 44 commits intopyutils:mainfrom
TTsangSC:profile-child-processes

TTsangSC commented Apr 14, 2026 •

edited

Loading

Uh oh!

TTsangSC commented Apr 15, 2026

Uh oh!

TTsangSC commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TTsangSC commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Highlights

Explanation

Code changes

New code (click to expand)

line_profiler/curated_profiling.py

line_profiler/_child_process_profiling/

tests/test_child_procs.py

Modified code (click to expand)

line_profiler/line_profiler.py::LineStats

line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]

kernprof.py

tests/test_child_procs.py

Caveats

TODO

Credits

Notes

Footnotes

Uh oh!

TTsangSC commented Apr 15, 2026

Uh oh!

TTsangSC commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TTsangSC commented Apr 14, 2026 •

edited

Loading

`line_profiler/curated_profiling.py`

`line_profiler/_child_process_profiling/`

`tests/test_child_procs.py`

`line_profiler/line_profiler.py::LineStats`

`line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]`

`kernprof.py`

`tests/test_child_procs.py`