Skip to content

Streamline analysis pipeline and logging#22

Merged
hariharan-devarajan merged 22 commits intomainfrom
feat/upgrade-zindex
Aug 24, 2025
Merged

Streamline analysis pipeline and logging#22
hariharan-devarajan merged 22 commits intomainfrom
feat/upgrade-zindex

Conversation

@izzet
Copy link
Collaborator

@izzet izzet commented Aug 22, 2025

This pull request refactors the percentile/threshold bottleneck detection logic out of the core analysis pipeline, simplifies the configuration interface, and introduces structured logging and warning suppression for a cleaner user and developer experience. It also improves the codebase by organizing logging, warning, and checkpointing utilities, and enhances the workflow and documentation to match the updated interface.

Configuration & CLI Interface Simplification

  • Removes percentile and threshold arguments from the main analysis pipeline, CLI, and configuration, so bottleneck detection is no longer handled as a core parameter. This change is reflected in dfanalyzer/__init__.py, dfanalyzer/analyzer.py, .github/workflows/ci.yml, and README.md. [1] [2] [3] [4] [5] [6] [7]

Logging & Warning Handling Improvements

  • Introduces structured logging using structlog and context managers (console_block, log_block) for clearer, stepwise logging of major analysis phases. Logging is now configured on both the main process and all Dask workers. [1] [2] [3] [4] [5] [6] [7]
  • Centralizes and improves warning suppression by moving warning filtering into a utility function and removing direct warnings usage from the main codebase. [1] [2] [3] [4]

Analysis Pipeline & Checkpointing Enhancements

  • Refactors the analysis pipeline to add explicit logging blocks for key operations (reading traces, computing metrics, processing views, and checkpointing), making the workflow more transparent and easier to debug. [1] [2] [3] [4]
  • Improves checkpointing by collecting Dask tasks for saving flat views and waiting for their completion in a dedicated step, which enhances reliability and performance. [1] [2]

Documentation & Workflow Updates

  • Updates the example usage in README.md and the CI workflow to remove references to the deprecated percentile parameter, ensuring consistency with the new interface. [1] [2]

Other Minor Improvements

  • Changes the default time_granularity unit from microseconds to seconds for consistency and clarity. [1] [2]

These changes collectively make the codebase more maintainable, improve user experience, and lay the groundwork for more flexible bottleneck detection strategies in the future.

@izzet izzet self-assigned this Aug 22, 2025
@izzet izzet added the enhancement New feature or request label Aug 22, 2025
@izzet izzet requested review from rayandrew and removed request for rayandrew August 22, 2025 03:27
@izzet izzet requested a review from rayandrew August 22, 2025 06:38
@izzet
Copy link
Collaborator Author

izzet commented Aug 22, 2025

The new version runs ~2.25× faster with much lower runtime variance:

hyperfine '/.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio' '/.../dfanalyzer-fix-demo/.../python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extra
cted/dftracer-dlio'
Benchmark 1: /.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio
  Time (mean ± σ):     122.771 s ±  3.610 s    [User: 151.342 s, System: 16.662 s]
  Range (min … max):   117.097 s … 129.307 s    10 runs
 
Benchmark 2: /.../dfanalyzer-fix-demo/.../python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extracted/dftracer-dlio
  Time (mean ± σ):     54.466 s ±  1.447 s    [User: 78.171 s, System: 12.346 s]
  Range (min … max):   53.229 s … 58.322 s    10 runs
 
Summary
  '/.../dfanalyzer-fix-demo/.venv/bin/python -m dfanalyzer analyzer.checkpoint=False analyzer/preset=dlio trace_path=/.../data/extracted/dftracer-dlio' ran
    2.25 ± 0.09 times faster than '/.../dfanalyzer/.../python -m dfanalyzer analyzer=dftracer analyzer.checkpoint=False analyzer/preset=dlio percentile=0.9 trace_path=/.../data/extracted/dftracer-dlio'

Copy link
Collaborator

@rayandrew rayandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you so much @izzet !

@hariharan-devarajan hariharan-devarajan merged commit 16bd02b into main Aug 24, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants