Streamline analysis pipeline and logging#22
Merged
hariharan-devarajan merged 22 commits intomainfrom Aug 24, 2025
Merged
Conversation
…onfigurations, updating documentation and examples accordingly.
…by zero in Output class
…ilize time granularity correctly across various methods.
…ging capabilities and suppress specific warnings in the DFAnalyzer module.
…or better logging context
…ation and examples
Collaborator
Author
|
The new version runs ~2.25× faster with much lower runtime variance: hyperfine '/.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio' '/.../dfanalyzer-fix-demo/.../python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extra
cted/dftracer-dlio'
Benchmark 1: /.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio
Time (mean ± σ): 122.771 s ± 3.610 s [User: 151.342 s, System: 16.662 s]
Range (min … max): 117.097 s … 129.307 s 10 runs
Benchmark 2: /.../dfanalyzer-fix-demo/.../python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extracted/dftracer-dlio
Time (mean ± σ): 54.466 s ± 1.447 s [User: 78.171 s, System: 12.346 s]
Range (min … max): 53.229 s … 58.322 s 10 runs
Summary
'/.../dfanalyzer-fix-demo/.venv/bin/python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extracted/dftracer-dlio' ran
2.25 ± 0.09 times faster than '/.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the percentile/threshold bottleneck detection logic out of the core analysis pipeline, simplifies the configuration interface, and introduces structured logging and warning suppression for a cleaner user and developer experience. It also improves the codebase by organizing logging, warning, and checkpointing utilities, and enhances the workflow and documentation to match the updated interface.
Configuration & CLI Interface Simplification
percentileandthresholdarguments from the main analysis pipeline, CLI, and configuration, so bottleneck detection is no longer handled as a core parameter. This change is reflected indfanalyzer/__init__.py,dfanalyzer/analyzer.py,.github/workflows/ci.yml, andREADME.md. [1] [2] [3] [4] [5] [6] [7]Logging & Warning Handling Improvements
structlogand context managers (console_block,log_block) for clearer, stepwise logging of major analysis phases. Logging is now configured on both the main process and all Dask workers. [1] [2] [3] [4] [5] [6] [7]warningsusage from the main codebase. [1] [2] [3] [4]Analysis Pipeline & Checkpointing Enhancements
Documentation & Workflow Updates
README.mdand the CI workflow to remove references to the deprecatedpercentileparameter, ensuring consistency with the new interface. [1] [2]Other Minor Improvements
time_granularityunit from microseconds to seconds for consistency and clarity. [1] [2]These changes collectively make the codebase more maintainable, improve user experience, and lay the groundwork for more flexible bottleneck detection strategies in the future.