Skip to content

Test split_duration_records_vectorized thorougly#24

Merged
izzet merged 1 commit intomainfrom
test/time_slicing
Aug 28, 2025
Merged

Test split_duration_records_vectorized thorougly#24
izzet merged 1 commit intomainfrom
test/time_slicing

Conversation

@izzet
Copy link
Collaborator

@izzet izzet commented Aug 28, 2025

This pull request updates the split_duration_records_vectorized function to consistently use the standard int64 dtype for the COL_TIME_RANGE column, and introduces a comprehensive new test suite to verify the function’s correctness across a variety of scenarios, including edge cases and Dask dataframe integration.

Data type consistency:

  • Changed the dtype of the COL_TIME_RANGE column from 'uint64[pyarrow]' to standard 'int64' in both the shortcut and main vectorization paths within split_duration_records_vectorized in dfanalyzer/analysis_utils.py. This ensures compatibility and avoids potential issues with pyarrow extension types. [1] [2]

Testing improvements:

  • Added a new test module tests/test_analysis_utils.py that provides extensive coverage for split_duration_records_vectorized, including:
    • Parameterized tests for different time granularities and resolutions.
    • Validation of correct chunk splitting, handling of zero durations, and preservation of count/size totals.
    • Edge case tests for durations not evenly divisible by granularity, large granularity, and microsecond-level events.
    • Integration tests for Dask dataframe partitioning, ensuring correct behavior across different partitioning schemes and with/without zero-duration rows.

@izzet izzet self-assigned this Aug 28, 2025
@izzet izzet added the enhancement New feature or request label Aug 28, 2025
@izzet izzet merged commit d76b03d into main Aug 28, 2025
3 checks passed
@izzet izzet deleted the test/time_slicing branch August 28, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant