Test `split_duration_records_vectorized` thorougly by izzet · Pull Request #24 · llnl/dfanalyzer

izzet · 2025-08-28T15:07:40Z

This pull request updates the split_duration_records_vectorized function to consistently use the standard int64 dtype for the COL_TIME_RANGE column, and introduces a comprehensive new test suite to verify the function’s correctness across a variety of scenarios, including edge cases and Dask dataframe integration.

Data type consistency:

Changed the dtype of the COL_TIME_RANGE column from 'uint64[pyarrow]' to standard 'int64' in both the shortcut and main vectorization paths within split_duration_records_vectorized in dfanalyzer/analysis_utils.py. This ensures compatibility and avoids potential issues with pyarrow extension types. [1] [2]

Testing improvements:

Added a new test module tests/test_analysis_utils.py that provides extensive coverage for split_duration_records_vectorized, including:
- Parameterized tests for different time granularities and resolutions.
- Validation of correct chunk splitting, handling of zero durations, and preservation of count/size totals.
- Edge case tests for durations not evenly divisible by granularity, large granularity, and microsecond-level events.
- Integration tests for Dask dataframe partitioning, ensuring correct behavior across different partitioning schemes and with/without zero-duration rows.

Test split_duration_records_vectorized thorougly

2b672b6

izzet self-assigned this Aug 28, 2025

izzet added the enhancement New feature or request label Aug 28, 2025

izzet requested a review from hariharan-devarajan August 28, 2025 16:07

izzet merged commit d76b03d into main Aug 28, 2025
3 checks passed

izzet deleted the test/time_slicing branch August 28, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test `split_duration_records_vectorized` thorougly#24

Test `split_duration_records_vectorized` thorougly#24
izzet merged 1 commit intomainfrom
test/time_slicing

izzet commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

izzet commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant